While my server frequently sees a decent number of hits from the googlebot crawlers, they have always been very good about spreading out the hits so that they don’t put a large load on the server or network.
So imagine my surprise to find that for the last three hours straight, a single googlebot crawler at crawl-66-249-65-166.googlebot.com has constantly had multiple connections open to my web server continuously downloading file after file with no pauses, using up all my bandwidth. I had to do something I never thought I would have to, and had to put google IPs into the firewall.
Anyone else getting pounded by googlebot today?
I have noticed that in recent months, the googlebot have been increasing their visits to my site (which is very small and not exteremely useful) from 1-2 times a month (as of January) to 4-9 times a month (as of June). Seems like they keep fetching robots.txt hoping it would have changed and then start indexing parts of my site that were not previously indexed by the Googlebot, but they have been by other webbots such as looksmart.com and Yahoo (inktomisearch.com).
I changed my .htaccess to block certain hosts by netblock (don’t need Apache mod_access as I thought I would) or domain:
order allow,deny
deny from 216.174.201.131
# misc cralwers
deny from sv-crawl.looksmart.com
deny from .siteprotect.com
deny from .alexa.com
deny from .inktomisearch.com
deny from 66.194.6.
deny from .thunderstone.com
deny from .envisional.com
deny from .dedicated.abac.net
# whois.sc
deny from 64.246.
deny from 216.145.
deny from 66.228.
# private IPs possibly spoofed
deny from 10.
deny from 172.16.
deny from 192.168.
deny from 63.148.
allow from all
(http://livejournal.com/users/dotforward)
http://www.IPLists.com – IP Addresses of Search Engine Spiders
http://www.iplists.com/google.txt
There are supposedly two types of crawls:
(a) fresh crawl (usually 64.x.y.x)
(b) deep crawl (usually 216.x.y.z)
http://www.internet-advertising-marketing-manual.com/google-dance.htm
http://www.webmasterworld.com/forum3/9213.htm
(http://livejournal.com/users/dotforward)