Tuesday, March 28, 2006

Using Robots.txt and Info.txt

Robots.txt
SEO is a stealthy marketing strategy that is manipulated by the SEO tools. Ever tool has its specific function but are oriented towards same goal; high listing, quality traffic and sales. But have you ever pondered over why some web pages rank well on one engine for a particular keyphrase and the same keyphrase may actually plummet your ranking in other search engine?
Some webmasters optimize their web pages for each particular search engine. Though the change is not apparent, but these small changes make considerable changes when it comes to ranking. However, here you stand the risk of getting penalized or banned because search engines don’t like duping or somebody making fool of them by showing what appeals to them the most. Now you might be thinking how you can stop Google from indexing pages that are exclusively optimized for Altavista or some other search engine?
The solution to both these questions lies in incorporating robots.txt file to stop major search engines indexing the pages for other search engines. The robots.txt protocol is a text file that has specific set of instructions for search engine robots about specific content that they are strictly not permitted to index. For Example:

This example allows all robots to visit all files because all robots have been specified by "*"
User-agent: *
Disallow:

If you want to block certain engines from certain files that are not in your root directory you can keep robots out by
User-agent: *
Disallow: /

The following code disallows the crawlers to not to enter into four directories of a website:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/

The robots.txt patterns are coordinated by substring comparisons so before appending matching directories have the final '/' character.


Info.txt

Contemporarily some major Search Engines are encouraging webmasters to create info.txt files in order to find the web page if you trust the source. If you really want them listing just submit them.
Now you might be mulling over where to insert the text of info.txt?
Do not insert the text of info.txt into your index.html page. Upload it the same way you upload any of your files, like index.html or robots.txt as a completely separate file in the same folder where your main index.html page is kept in.
To submit in alexa.com follow the url:
http://www.alexa.com/data/details/contact_info?url=www.mbsautoparts.com/
Enter the info.txt that you can generate on the site on your root and after every change you make to your site, go and apply for re-caching your site by Alexa.


http://www.segnant.com/info.txt
# Contact info submission

url: www.segnant.com/
site_owner: Segnant Inc.
address1: 1431 Greenway Drive
address2: Suite 230
city: Irving
state: TX
country: USA
postal_code: 75038
phone_number: 214-441-1309
display_email: info@segnant.com
site_name: www.segnant.com
site_description: Segnant Technologies provides innovative e-commerce web-sites to companies to sell products directly to consumers and accept payments online. Get custom website specifically built to your business