
Previously I wrote a post about how to list poison email harvesters. Today I discovered that an unknown harvester/scraper bot has stumbled into my one of my traps. Here is the description of the bot:
IP:82.230.123.141
Host: bne75-7-82-230-123-141.fbx.proxad.net
User agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)
From the log snapshot (image), you can see that the bot had recursively crawl through 14464 pages, harvested anywhere between 5 - 20 fake email addresses per page (that’s about 12 * 14464 = 173,568 emails harvested), and wasted nearly 10 minutes on my site before deciding that it’s done. You can see that the last link the bot visited was something that looks like this:
http://omninoggin.com/suspicious/8864/1530/7374/527/3510/9061/8198/9981/3367/1751/5075/1765/7282/4842/1710/3655/614/9951/3183/3609/3731/9430/7682/6298/2287/683/3370/5633/4187/8842/1852/5984/7767/6037/7675/3984/4646/7823/8462/1793/6556/3054/1362/3111/3407/8182/7374/169/7738/158/2802/5438/7230/9552/1384/7538/index.php
This is infinite loop at its finest :). Not only that the bot wasted its time spidering my site, the bot has probably added these 170,000 fake email addresses to the master spam list somewhere in the net. Now spam bots referring to that master list in the other end will waste even more time and resources spamming these fake addresses.
If you like the results, please join me in sticking it to these harvesters/spammers by installing my bot trap. Please also let me know when you’ve caught a live one like I did. It’s entertaining to hear these stories.



Is there a way to stop these stupid bots. I don’t want my emails to get indexed by harvesters.
@Yayin Akisi Sinema
Thank you for visiting. To keep these bots away, I’ve integrated Project Honey Pot Http:BL look-up into my site. These guys maintain a list of bad bots on the internet and you can use them as a look-up service to block bad bots.
Currently, if a known bad bots visits my site, it will automatically be redirected to http://omninoggin.com/incognito/incognito.php (which is a dead-end page). If it is an undiscovered bad bot, it will stumble into my dead-end page (via an invisible link), and then its activities will get sent to the main Project Honey Pot database to be blacklisted.
The harvester bot I talked about in this post was a previously undiscovered harvester. It’s activities were probably sent to Project Honey Pot’s main database after it wasted all this time on my site.
Please stay tune to my posts. In my next post, I will release a WordPress plugin that will let you easily enable Project Honey Pot on your WordPress blog.
@Yayin Akisi Sinema
As promised, I delivered Project Honey Pot Http:BL plugin for easily integrating Project Honey Pot’s Http:BL API with your WordPress blog. Check it out!