My Bot Trap in Action

Previously I wrote a post about how to list poison email harvesters. Today I discovered that an unknown harvester/scraper bot has stumbled into my one of my traps. Here is the description of the bot:

User agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)

From the log snapshot (image), you can see that the bot had recursively crawl through 14464 pages, harvested anywhere between 5 – 20 fake email addresses per page (that’s about 12 * 14464 = 173,568 emails harvested), and wasted nearly 10 minutes on my site before deciding that it’s done. You can see that the last link the bot visited was something that looks like this:

This is infinite loop at its finest :). Not only that the bot wasted its time spidering my site, the bot has probably added these 170,000 fake email addresses to the master spam list somewhere in the net. Now spam bots referring to that master list in the other end will waste even more time and resources spamming these fake addresses.
If you like the results, please join me in sticking it to these harvesters/spammers by installing my bot trap. Please also let me know when you’ve caught a live one like I did. It’s entertaining to hear these stories.

  1. Yayin Akisi Sinema

    Is there a way to stop these stupid bots. I don’t want my emails to get indexed by harvesters.

  2. Ty Bone

    @Yayin Akisi Sinema
    Thank you for visiting. To keep these bots away, I’ve integrated Project Honey Pot Http:BL look-up into my site. These guys maintain a list of bad bots on the internet and you can use them as a look-up service to block bad bots.

    Currently, if a known bad bots visits my site, it will automatically be redirected to (which is a dead-end page). If it is an undiscovered bad bot, it will stumble into my dead-end page (via an invisible link), and then its activities will get sent to the main Project Honey Pot database to be blacklisted.

    The harvester bot I talked about in this post was a previously undiscovered harvester. It’s activities were probably sent to Project Honey Pot’s main database after it wasted all this time on my site.

    Please stay tune to my posts. In my next post, I will release a WordPress plugin that will let you easily enable Project Honey Pot on your WordPress blog.

  3. Ty Bone

    @Yayin Akisi Sinema
    As promised, I delivered Project Honey Pot Http:BL plugin for easily integrating Project Honey Pot’s Http:BL API with your WordPress blog. Check it out!

  4. Mack

    I hate email harvesters. Why would someone do something like this if not spamming purposes. I even reported some website found in organic searches on Google that sell email harvesters software. But Google doesn't seem interested. I mean, OMG are they help propagate spam or what?

  5. 85 weeks late.

    WOW a never ending list……………… Great…… Seems like you pay $.99 a month for a shared host.

    Just what I want a bot on my server for a day. How about 3 or 4 that would be good.

    Who in the f*ck would want to trap a bot on their site. You freaking idiot.

  6. Ty Bone

    First… please refrain from cussing on this blog. If you read the post carefully and have just a bit of intelligence, you would understand why you would want to trap a bot on their site. Piss off fly.

