# http://www.robotstxt.org/wc/robots.html # http://www.searchengineworld.com/cgi-bin/robotcheck.cgi # robots.txt file for http://www.intobikes.co.uk/ User-agent: Microsoft URL Control - 6.00.8862 Disallow: / User-agent: Microsoft URL Control Disallow: / # I guess someone, somewhere is linking to us at intobikes.co/intobikes.co.uk!! User-agent: Googlebot Disallow: /intobikes.co.uk # This next line allows Froogle to show the thumbnail images of products Allow: /images/products/*/thumbs User-agent: seekbot Disallow: / User-agent: UbiCrawler Disallow: / User-agent: TwengaBot Disallow: / User-agent: ShopWiki Disallow: / User-agent: Fatbot Disallow: / User-agent: VoilaBot Disallow: / User-agent: Vagabondo Disallow: / User-agent: TurnitinBot Disallow: / User-agent: DotBot Disallow: / User-agent: LexxeBot Disallow: / User-agent: Baiduspider Disallow: / User-agent: Abrave Spider Disallow: / User-agent: Yandex Disallow: / User-agent: * Disallow: /~ Disallow: /css Disallow: /images Disallow: /admin Disallow: /banners Disallow: /Connections Disallow: /includes Disallow: /js Disallow: /P Disallow: /php Disallow: /squ Disallow: /usage Disallow: /settings Crawl-delay: 5 Sitemap: http://www.intobikes.co.uk/google_sitemap_index.php # The first line, starting with '#', specifies a comment. # The next two lines specifies that the Mirago robot has nothing disallowed. This means permission is granted to go anywhere on that site. This is optional, as a robot will assume it has permission to access your site if it is not excluded by any 'disallow' directives. # The next two lines indicates that the robot called 'naughtyrobot' has all relative URLs starting with '/' disallowed. As all relative URL's on a server start with '/', this means the entire site should not be accessed by the robot. N.B. Don't put more than one path on a Disallow line. # The third paragraph indicates that all other robots should not visit URLs starting with /stay_out or /devproject. It should be noted that the '*' is a special token meaning 'all robots' and is not a regular expression. Instead of 'Disallow: /myproject/*' just put 'Disallow: /myproject'. # http://www.conman.org/people/spc/robots2.html has a proposal for extended syntax