I am curious to see some example uses and implementations of robots.txt.. specifically implementations and the reasons behind them for increasing SEO. The robots.txt I am using is based on the example:
SEO with robots.txt
WordPress 2.1 robots.txt
Code:
User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /stats*
Disallow: /tag
Disallow: /category/uncategorized*
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
# disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow all files with? in url
Disallow: /*?
I also use a custom robots.txt for phpBB
phpBB robots.txt
Code:
User-agent: *
# disallow all files with a? in url
Disallow: /*?*
# disallow all files ending in specific extension
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
# disallow these dirs
Disallow: /js/
Disallow: /css/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /includes/
Disallow: /templates/
# disallow these files and dirs
Disallow: /V
Disallow: /stats*
Disallow: /post
Disallow: /member
Disallow: /mx_
# disallow these urls
Disallow: /rss.php
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /common.php
Disallow: /index.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /privmsg.php
Disallow: /viewonline.php
# disallow urls starting with quote
Disallow: /"
but this phpBB forum is different than the default because it has
special optimizations already.
Basically this helps get rid of duplicate content, low-quality content, css, javascript, php, etc.. but does allow search engines to read the articles, find images, find pdfs, etc.
Anyone else have improvements or other robots.txt examples?