Electric Type

Multimedia

About Us

News

Help

Adding Search to Your Site

Page 3 — Getting Ready

Before you install a search engine (or submit your pages to a Web-wide search engine like HotBot or AltaVista), you must make sure it can find all of the pages on your site. An indexing robot will begin at a given page and then follow the links there to subsequent pages, so you should make sure your main page has good text links or create a site map that the robot can use.

You'll also want to check your robots.txt file. This is a standard file for Web servers that sits at the root of your site and excludes unwelcome indexing robots or restricts them from crawling certain directories. If you run your own server, you control this file; otherwise, your host server administrator controls it. Make sure that this file exists and that it allows at least your search indexer robot to access your directories.

For more information on this topic, see the Robots Exclusion page and Appendix B of the W3C's specification for HTML 4.01.

The other way that page designers can control robots is by using the <META ROBOTS> tags. These are particularly useful if you have a hosted site and don't want to bother your server administrator since you can control this with plain old HTML. To specify that a page should not be indexed by search engines but that it's OK for them to follow links on that page, use this code in the header section:

<meta name="robots" content="noindex,follow">

To tell robots that you don't want them to follow links on a page (for pages that consist of email addresses, for example), use this command:

<meta name="robots" content="index,nofollow">

And, as you might guess, to keep robots from either indexing a page or following links on that page, use both commands together:

<meta name="robots" content="noindex,nofollow>

For more information, see the HTML Author's Guide to the Robots META Tag.

Finally, remember that indexing robots are not very clever about nontext links. They tend to choke on server-side image maps, JavaScript, redirects, letter case mismatches, and so on. If you use Flash or Shockwave, be sure to use the AfterShock options to generate HTML text and links or the robots won't see them at all.

You can test your page by viewing it with a text-only browser like Lynx or with a graphical browser with images and JavaScript and plug-ins turned off. These tests will give you a good idea of what the robots will encounter.

If you have frames, you should also consider adding some navigational links to the <NOFRAMES> section of your <FRAMESET> and in the pages meant to be framed. There's nothing sadder than clicking on a search result and ending up on a plain-text page with no context and no way to get around the site since the navigational frame is missing. This will also happen when a searcher locates your page on the Web-wide search engines. It doesn't matter how good your ranking is in the search engines if you provide a dead-end page when they find you, so be sure to make sure your frames will appear in search results. All the remote search services will follow links to frames pages, so that's one less thing for you to worry about.

To make sure all your local links work, use the link-checking tools in your site management software such as Dreamweaver or GoLive, run a link-checking program such as Big Brother, or use a Web-based service such as NetMechanic.

The good news is that any system you set up to provide your site search-indexing robot with text links to your pages will also work for Web-wide search engines and make your site more accessible to disabled Web surfers. You get three benefits for the price of one; it's hard to beat that!

next page»


Tutorials Home  

CSS Reference  

Regular Expressions  

Image Filtering  

Adding Site Search  

Image Maps  

Browser Detection  

Fundamentals of XSSI  

FTP Tutorial  

HTML 4.0  

User Blogs

Screen Shots

Latest Updates

Contact Us

Valid HTML 4.01!
Valid CSS!

Breadcrumb

© ElectricType
Maintained by My-Hosts.com
Site map | Copyright | Disclaimer
Privacy policy | Acceptable Use Policy
Legal information.