X Instructional Development, Enhancement & Support
November 2001
X Sections
X Interactive Learning
X IVLE
Video Conferencing
X Webcasting
X Multimedia
X Special Highlight
X Announcements
 
X Archived
X FEBRUARY 2003
X OCT 2002
X JULY 2002
X APRIL 2002
X FEBRUARY 2002
X NOVEMBER 2001
X JULY 2001
X MARCH 2001
X JANUARY 2001
 

Track It Down
A guide to different types of search engines at the Internet

Which search engine to use?
Narrowing the field
Start with a metasearch
Choose "advance search" option
How to build better query

Which search engine to use?
Keyword indexes-such as AltaVista, HotBot and Lycos-produce an index of all the text on the sites they examine. Typically, the engine reads at least the first few hundred words on a page, including the title, the HTML "alt text" coded into webpage images, and any keywords or descriptions that the author has built into the page structure. The engine tries to ignore raw HTML code, JavaScript commands and the like, and throws out garbage words such as "and", "the", "by" and "for". The engine assumes whatever words are left are valid page content; it then alphabetises these words (with their associated sites) and places them in an index where they can be searched and retrieved.
This type of search engine usually does no content analysis per se, but will use word placement and frequency to determine how a page ranks among other pages containing the same or similar words. For example, when someone searches for the word 'Pentium', a page with 'Pentium' in its title will appear higher in the search results than a site that doesn't mention 'Pentium' in the title. Likewise, a page with 20 mention of 'Pentium' in the body text will rank higher than a page with one instance of the word.
Keyword indexes tend to be fast and broad; you'll typically get search results in seconds (faster than other kinds of engines). But unless you're careful about how you construct your query, you'll likely to be overwhelmed with data.

Subject directories-such as Galaxy and Yahoo are the card catalogues of the Web: They assign sites to specific topic categories based on the site's content. Usually, human judgement is involved. Some employ a review staff to categorise sties; others allow site owners to categorize and describe their own pages; still others ask random site visitors to rate sites.
The advantage of this approach is that sites are pregrouped and easier to browse than those in a raw keyword index. A human-generated subject directory also allows more nuance and subtlety than machine-generated keyword indexes, and should be able to offer meaningful advice on not only where the content is but how good or bad it is.
However, humans aren't as efficient as machines, and human-generated directories can never be as comprehensive or up-to-date as machine-generated sites. If you happen to think the same way as the site reviewers, you'll find great value in these subject directories. But if you and the reviewer are on different wavelengths, the site's categorization might seem arbitrary and hard to understand - and you might find that their top picks aren't pertinent to your needs.

Metasearch engines-such as Dogpile and MetaCrawler - allow you to search a number of databases and engines simultaneously; some even deliver your search results in a single, integrated, rank-ordered list.
A metasearch's major strengths are convenience and breadth: It's easier to harness the power of multiple search engines simultaneously than to visit them one at a time. The multiple searches also let you sift through a wider range of pages than you could access on any single-engine search.
The downside is that metasearches often use the lowest common search denominators. Different engine parse queries differently, treat upper- and low-case letters in queries differently, allow or disallow natural-language queries and so on. To work with the widest possible number of search engines, metasearches then to use only simple, straightforward search strategies - making it hard to access each search engine's specialized feature. If all you need is a general search, great. If you need a more refined search, a metasearch isn't a good choice.

Narrowing the field
Considering the Web's vastness, if there's a specialty search engine or service that you already know about that's appropriate for your query, it's a good idea to start there rather than with a general search engine. But often the reason you're searching for things is precisely because you don't know where to find the information you need. In that case, your best bet is to start with the general and move to the specific.

Start with a metasearch
Because metasearches instantly tap a number of standalone engines they let you cast your net very widely. If your search target is an uncommon word or phrase, a metasearch might be all you need, delivering a manageable number of narrowly constrained results right from the start. But metasearches aren't very configurable, and if yours is a fairly common search item, you might find youself on the receiving end of a uselessly large flood of search results. If that happens, you should try moving on to a standalone search engine.

Choose a major engine's "advance search" option
If you've bookmarked your favourite engine's home page, change the target to point to the site's advanced search. The advanced option (often called different names by different standalone sites) gives you better defaults, far more flexibility and improved precision. For example, if you query AltaVista's "Simple Search" for something like Windows 2000 Setup, AltaVista delivers a time-wasting list of over 2 million pages. But try entering the same query in AltaVista's "Advanced Search". Without any extra work from you, the engine returns just 26 documents, all relevant to the search topic.

How to build better query

  • Use a site's advanced search option, if available
  • In general, search for many words rather than just one or a few
  • Avoid natural language, even if a site says it can handle it
  • Search not only for your target words, but also for synonyms
  • Include common variants of your search words in you queries
  • Avoid search for standalone letter and numbers (such as NT or 3D); place them in quotes if you must search for them
  • Whenever possible, search for complete phrases; place the phrase in quotes
  • Avoided complex Boolean searches if possible
  • If you must use a complex Boolean query, use parentheses to organize the query and to ensure that the search engine parses the query the way you intend
  • Use plus and minus signs to indicate what must (or must not) be included in your results
  • Read the search engine's help files; they contain a gold mine of useful tips and hints

Note: The author of "Track It Down" is Fred Langa. The article first appeared in WINDOWS Magazine, July 1998, published by CMPnet.

GO TO TOP

 
  Story Index
Camtasia: User review
Track it down - guide to types of search engines
CS1101 - lecturer's share his experience
Commencement 2001 'live'
   
 
Related Links