|
Track
It Down
A
guide to different types of search engines at the Internet
Which
search engine to use?
Narrowing the field
Start with a metasearch
Choose "advance search"
option
How to build better query
Which
search engine to use?
Keyword indexes-such as AltaVista, HotBot and Lycos-produce
an index of all the text on the sites they examine. Typically, the
engine reads at least the first few hundred words on a page, including
the title, the HTML "alt text" coded into webpage images,
and any keywords or descriptions that the author has built into
the page structure. The engine tries to ignore raw HTML code, JavaScript
commands and the like, and throws out garbage words such as "and",
"the", "by" and "for". The engine
assumes whatever words are left are valid page content; it then
alphabetises these words (with their associated sites) and places
them in an index where they can be searched and retrieved.
This type of search engine usually does no content analysis per
se, but will use word placement and frequency to determine how a
page ranks among other pages containing the same or similar words.
For example, when someone searches for the word 'Pentium', a page
with 'Pentium' in its title will appear higher in the search results
than a site that doesn't mention 'Pentium' in the title. Likewise,
a page with 20 mention of 'Pentium' in the body text will rank higher
than a page with one instance of the word.
Keyword indexes tend to be fast and broad; you'll typically get
search results in seconds (faster than other kinds of engines).
But unless you're careful about how you construct your query, you'll
likely to be overwhelmed with data.
Subject
directories-such as Galaxy and Yahoo are the card catalogues
of the Web: They assign sites to specific topic categories based
on the site's content. Usually, human judgement is involved. Some
employ a review staff to categorise sties; others allow site owners
to categorize and describe their own pages; still others ask random
site visitors to rate sites.
The advantage of this approach is that sites are pregrouped and
easier to browse than those in a raw keyword index. A human-generated
subject directory also allows more nuance and subtlety than machine-generated
keyword indexes, and should be able to offer meaningful advice on
not only where the content is but how good or bad it is.
However, humans aren't as efficient as machines, and human-generated
directories can never be as comprehensive or up-to-date as machine-generated
sites. If you happen to think the same way as the site reviewers,
you'll find great value in these subject directories. But if you
and the reviewer are on different wavelengths, the site's categorization
might seem arbitrary and hard to understand - and you might find
that their top picks aren't pertinent to your needs.
Metasearch
engines-such as Dogpile and MetaCrawler - allow you to
search a number of databases and engines simultaneously; some even
deliver your search results in a single, integrated, rank-ordered
list.
A metasearch's major strengths are convenience and breadth: It's
easier to harness the power of multiple search engines simultaneously
than to visit them one at a time. The multiple searches also let
you sift through a wider range of pages than you could access on
any single-engine search.
The downside is that metasearches often use the lowest common search
denominators. Different engine parse queries differently, treat
upper- and low-case letters in queries differently, allow or disallow
natural-language queries and so on. To work with the widest possible
number of search engines, metasearches then to use only simple,
straightforward search strategies - making it hard to access each
search engine's specialized feature. If all you need is a general
search, great. If you need a more refined search, a metasearch isn't
a good choice.
Narrowing
the field
Considering the Web's vastness, if there's a specialty search engine
or service that you already know about that's appropriate for your
query, it's a good idea to start there rather than with a general
search engine. But often the reason you're searching for things
is precisely because you don't know where to find the information
you need. In that case, your best bet is to start with the general
and move to the specific.
Start
with a metasearch
Because metasearches instantly tap a number of standalone engines
they let you cast your net very widely. If your search target is
an uncommon word or phrase, a metasearch might be all you need,
delivering a manageable number of narrowly constrained results right
from the start. But metasearches aren't very configurable, and if
yours is a fairly common search item, you might find youself on
the receiving end of a uselessly large flood of search results.
If that happens, you should try moving on to a standalone search
engine.
Choose
a major engine's "advance search" option
If you've bookmarked your favourite engine's home page, change the
target to point to the site's advanced search. The advanced option
(often called different names by different standalone sites) gives
you better defaults, far more flexibility and improved precision.
For example, if you query AltaVista's "Simple Search"
for something like Windows 2000 Setup, AltaVista delivers a time-wasting
list of over 2 million pages. But try entering the same query in
AltaVista's "Advanced Search". Without any extra work
from you, the engine returns just 26 documents, all relevant to
the search topic.
How
to build better query
- Use
a site's advanced search option, if available
- In
general, search for many words rather than just one or a few
- Avoid
natural language, even if a site says it can handle it
- Search
not only for your target words, but also for synonyms
- Include
common variants of your search words in you queries
- Avoid
search for standalone letter and numbers (such as NT or 3D); place
them in quotes if you must search for them
- Whenever
possible, search for complete phrases; place the phrase in quotes
- Avoided
complex Boolean searches if possible
- If
you must use a complex Boolean query, use parentheses to organize
the query and to ensure that the search engine parses the query
the way you intend
- Use
plus and minus signs to indicate what must (or must not) be included
in your results
- Read
the search engine's help files; they contain a gold mine of useful
tips and hints
Note:
The author of "Track It Down" is Fred
Langa. The article first appeared in WINDOWS Magazine,
July 1998, published by CMPnet.
GO
TO TOP
|