Discontinued Magazine Index

The index is gone in case anyone here has used it. I have used this site quite a lot. It will be missed.

http://index.mrmag.com/tm.exe?tmpl=tm_faq

Rich

Web ressource

Rod,

another option would be to use a synonym list from the Internet. It looks like dictionary.com offers an API. I haven't looked into that but this might be interesting...

Maybe somebody else knows a similar service?

Regards

Martin

Synonyms and Spelling Variations (2)

There are three table columns involved in the "translation" or "desynonymization" process: sequence or priority number; input phrases; and output words.

Translations are checked by ascending sequence.

The item text being processed is split into individual words, after extraneous characters (punctuation) are removed.

Each of these words is checked, in turn, against the input phrases for whole-word matches. Each of the input fields returned is parsed into its phrases, using the embedded delimiter (=).

Each of the words in the phrase is searched for in the remaining portion of the item text. If ALL are found within (a match), all instances of the words in the input phrase are removed from the item text being processed (this leaves the 'remaining text'), and the words in the output field are placed in the searchable text. If there is no match, the other phrases in the input are checked. If no match, processing continues with any other row returned for that word.

When all this is done, any words still in the item text are added to the searchable text.

NOTE: The searchable text is an internal field, not displayed to the user.

Clear as a bell, right? Okay, here are a couple of examples. Let's say that the item text contains the words "the pennsylvania" (all processing in lower case). The word "pennsylvania" would pick up the translation row:

priority: "1000"; input: "the prr = prr = pennsylvania railroad = pennsylvania rail road = pennsylvania rr = pennsy = the pennsylvania"; output: "prr pennsy pennsylvania railroad rr rail road"

Each of the phrases in the input is checked against the item text. A match is found in the last phrase, "the pennsylvania", the words in the input are removed from the item text to ensure there are no more matches on them, and all the words in the output are added to the searchable text.

also picked up from the table would be the row:

priority: "20000"; input: "in pennsylvania = to pennsylvania = from pennsylvania = of pennsylvania = through pennsylvania = pennsylvania = pa = penna";, output: "pennsylvania pa penna"

By the time it is processed, the word "pennsylvania" is already gone from the item text, so there would be no match found.

For the second example, assume the item text contains "kitbash a caboose". "caboose" would first pick up the row:

priority: "15000"; input: "bobber caboose = bobber cabooses = bobber = bobbers"; output: "bobbers cabooses cabins waycars vans crummy crummies brake guards"

You will notice that all the input phrases contain the word "bobber", so none will match. Processing will continue with the row:

priority: "20000"; input: "cabin car = cabin cars = caboose = cabooses = van = vans = crummy = crummies = waycar = waycars = way car = way cars = brake van = brake vans = guards van = guards vans"; output: "cabooses cabins cabin cars waycars vans crummy crummies brake guards"

The third phrase in the input consists of the single word "caboose" - a match. All the words in the output of this row will be added to the searchable text. So a fan or a railroad that called their cabooses "vans" or "cabin cars" would find this item. The Brits, who call them "guards vans", would also be able to find the item.

So there, in a nutshell, or possibly a coconut shell, is my description of how synonyms and spelling variations are handled.

If anyone would like to put in an article description as they would enter it into an index, I will pass it through the translator and post the result.

Rod

p.s Just saw Martin's suggestion of dictionary.com. Synonyms for "caboose" are: berth, box, camp, chalet, compartment, cot, cottage, crib, deckhouse, home, hovel, hut, lodge, log house, quarters, room, shack, shanty, shed, shelter, box, bungalow, cabana, cabin, camp, carriage house, chalet, cot, home, hut, lean-to, lodge, ranch, shack, shanty, small house, cookroom, scullery. Their visual thesaurus wouldn't load for me, but it did reference "cabin car". They are not really railroad oriented, and I would not expect them to give us many railroad synonyms.
 

Rod Goodwin
IndexGuy
Skype: IndexGuy1

Developer and moderator of The Railroad Index,
the most effective model railroad index on the Internet!

 

joef's picture

Railroading is pretty specialized

I agree with Rod, railroading is so specialized I don't expect general purpose dictionary APIs will help us much.

Whatever we do, the data behind it we will probably have to build ourselves.

WE ARE the audience for this, and that means WE ARE the ones who care. If you're looking for another undiscovered group of model railroaders out there in the world who have done all this work already, you probably aren't going to find them. WE ARE THEM.

The model railroading hobby is one of the largest of the model making hobby groups, but compared to something like golf or boating, we're a real niche interest.

Joe Fugate​
Publisher, Model Railroad Hobbyist magazine

Joe Fugate's HO Siskiyou Line

Read my blog

Maybe not for railroad specific terms

Joe,

you're probably right for railroad specific terms. But look at everyday expressions like 'barrel':

butt, cask, cylinder, drum, firkin, hogshead, keg, pipe, receptacle, tub, tun, vat, vessel

So it might be useful to combine both methods:

1. search the internal synonym table according to Rod's suggestion

2. if not found, search externally.

Regards

Martin

 

Proposed data model uploaded

Hi,

just minutes ago I uploaded my proposed data model.

Check out the graphical version at: home.vrweb.de/~martin_fischer/overview.html

By clicking on an entity (the rectangular boxes) you'll be taken to a more detailled description.

I didn't add Rod's suggestion for synonyms because of possible copyright issues.

I'll play around with the database for my own use and education now.

Let me know what you think.

Regards

Martin

 

 

Copyright

Anything I put up here is considered to be in the public domain, and is there for the possible good of the hobby as a whole.

Rod

Rod Goodwin
IndexGuy
Skype: IndexGuy1

Developer and moderator of The Railroad Index,
the most effective model railroad index on the Internet!

 

Data model

I looked at the data model just posted and clicked onthe trackplan item. Can the "type" entry have multiple values? I noticed while browsing MR's plans that examples of the type I was looking were classified under several different categories. In other words, I had to search in several categories for what I thought were similar types, like "around the wall", "walk-in" and "peninsula". I know a particular plan isn't likely to have all of those characteristics, but for what I was looking for there were suitable plans in each of those categories. It might be useful if a plan could be "type"ed in multiple categories.

Trackplan types

Ken,

there are two ways to handle the situation you describe:

First and easiest is to give an exact definition what a type means. 'Around the wall' may not have any peninsulas, otherwise it is 'Peninsular'.

Second is to allow multiple types as you suggest.

I suppose we could handle most cases with good definitions. But then there are special properties. Two I can think of are

  • doubledeckers
  • around the wall

So a a shelf layout could have the additional property of being 'around the wall' and / or being a 'doubledecker'.

So to make it short, I think your suggestion is a good one and I'll modify the data model accordingly. My idea would be to classify layouts by shape:

  • O, U, L, E, I for Island, U- and L-shaped, peninsular or straight shelf

and by modification

  • Around the wall (only possible for U and E), multidecker (all shapes) and module (all shapes?)

What do you think?

Regards

Martin

 

 

Proposed data model: ARTICLE

Martin;

Some comment/questions on the data model.

AUTHOR: I called this CONTRIBUTOR, and on an article about a layout story written by other than the owner, I always listed the layout owner as the first contributor. Without his contribution, there would be no article to index. Also, if there was credit given for photos, trackplan, etc., they also got listed.

CATEGORY/SUBCATEGORY: Many articles do not fit neatly into a single category. Would it be possible to have a single article in multiple categories?

One of the problems with categorization is that each of us thinks a little differently, and may see different things within an article. If I input it, I may put it into one category, and if you input it, you may put it into a different category. When I started designing my index, a long time ago, I looked at the NMRA category heirarchy and quickly discarded it. Out of date (at that time) and much too rigid, and therefore prone to errors. I, and like probably 99.99% of other people, searched only by keyword, completely ignoring the categories.

KEYWORD: "taken from a fixed list". The problem with a fixed list is that it is "fixed". No flexibility, not easily adaptable to changes in language or use of terms. Again, when there is a question of which keywords best fit, different people will probably apply different keywords. Also, if the person inputting (<- there's a variation) is not really familiar with the keyword list, there may be a scan of the list required to find the keyword, thereby slowing the process. Also, if a searcher gets nothing back on a search, he may have to go to the list to try to find the keyword(s) that will work for him, compounding the effort required for the search.

Who will be the keeper of the list. If it is a committe, it may be very hard to get keywords added, because each committee member has his/her own ideas. I think it should be one trusted person, because although not everyone will agree with this person's ideas, at least it will be consistent. When a user becomes used to some of the idiosyncronies of the person (which are inevitable), it will work better.

In my index, all words in the title, and in whatever other text is entered, become keywords, along with other keywords as selected. Part of the storage process is displaying to the inputter everything stored as searchable, so that extra keywords that sneak in, keywords that have been left out, or misspellings (<- another variation) may be become visible.

VOTE-: A good idea, but open to abuse. I would suggest an audit trail table (entity), keeping the IP of the voter, which can be analyzed to keep people honest.

I suggest an additional column - URL. If one of the articles from Model Railroad Hobbyist shows up in a search results list and the current user does not have the various issues downloaded, it would be nice to be led to the download page. Also, in my index I have some items which are just photos available on the web. Also indexed are some how-tos on various manufacturers websites.

Rod.

Rod Goodwin
IndexGuy
Skype: IndexGuy1

Developer and moderator of The Railroad Index,
the most effective model railroad index on the Internet!

 

joef's picture

Great observations

Great observations, Rod.

I especially endorse the URL column. We need to be thinking 21st century here, and electronic publications are coming like the proverbial freight train (no bias, of course!) ...

As Rod mentions, this enables indexing some of the better resources on the web as well.

Joe Fugate​
Publisher, Model Railroad Hobbyist magazine

Joe Fugate's HO Siskiyou Line

Read my blog


>> Posts index


Journals/Blogs

Recent Blog posts: