Googlebot broken?

While my server frequently sees a decent number of hits from the googlebot crawlers, they have always been very good about spreading out the hits so that they don’t put a large load on the server or network.

So imagine my surprise to find that for the last three hours straight, a single googlebot crawler at crawl-66-249-65-166.googlebot.com has constantly had multiple connections open to my web server continuously downloading file after file with no pauses, using up all my bandwidth. I had to do something I never thought I would have to, and had to put google IPs into the firewall.

Anyone else getting pounded by googlebot today?

2 thoughts on “Googlebot broken?”

  1. I have noticed that in recent months, the googlebot have been increasing their visits to my site (which is very small and not exteremely useful) from 1-2 times a month (as of January) to 4-9 times a month (as of June). Seems like they keep fetching robots.txt hoping it would have changed and then start indexing parts of my site that were not previously indexed by the Googlebot, but they have been by other webbots such as looksmart.com and Yahoo (inktomisearch.com).

    I changed my .htaccess to block certain hosts by netblock (don’t need Apache mod_access as I thought I would) or domain:

    order allow,deny
    deny from 216.174.201.131

    # misc cralwers
    deny from sv-crawl.looksmart.com
    deny from .siteprotect.com
    deny from .alexa.com
    deny from .inktomisearch.com
    deny from 66.194.6.
    deny from .thunderstone.com
    deny from .envisional.com
    deny from .dedicated.abac.net

    # whois.sc
    deny from 64.246.
    deny from 216.145.
    deny from 66.228.

    # private IPs possibly spoofed
    deny from 10.
    deny from 172.16.
    deny from 192.168.

    deny from 63.148.

    allow from all

    (http://livejournal.com/users/dotforward)

Leave a Reply

Your email address will not be published. Required fields are marked *