Friday, June 6, 2008

Wild Card Support ($ and *) - Google, Yahoo and MSN Robots.txt Exclusion Protocol

Hi All SEOs,

I am sure all webmasters (SEO) reading this block know about Robots.txt and how to use it. With robots.txt you can block any url, path or directory that you don’t want search engine to crawl. Also you can even block search crawler to crawl your entire site. Before few weeks all major search engines like Google, Yahoo and MSN announced that they all are now supporting Wild Card. Here I want to discuss about wild card support, what is wild card and how wild card is useful and how to use it?

$ Wild Card Support – This tells crawler to match everything from the end of a url. With $ Wild Card support webmaster can block certain types of urls, so now you don’t need to write every file type you want to block through robots.txt. You can block file types with specific patterns, you can specify special type of file extensions like PDF in your robots.txt file and search engines will not access that page and will not include in their database.

$ sign is used to block certain files types. For example if you want to block a file with .pdf extension then you need to write following syntax in your robots.txt file

User-agent: Googlebot
Disallow: /*.pdf$

* Wild Card Support – This tells crawler to match a sequence of characters. * Wild Card will block certain type of URL patterns like if you don’t want search engines to crawl URLs with session ids or other extraneous parameters. So from now specify the parameters that you don’t want to index by search engine using wild card and you have done, no need to create long list of URLs 

You can use * sign to block URLs with session IDs. For example if you want to block URLs with session IDs then you need to write following syntax in your robots.txt file

User-agent: *
Disallow: /*?

This will block all urls with Session IDs.

Bhavesh Goswami.


