Fight Spambot
Table of Contents
never list your email
- never print your email address in plain text/html
- use javascript or use mywidget to hide your email from spambot
/robots.txt
dont use /robots.txt
dont list your form scripts in robots.txt coz bad bots will go after them.
if you dont want bots to crawl certain part of your sites, better use htpaswd
or, put one of this in your html meta tags:
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET, NOARCHIVE">
# googlebot also understand http header
X-Robots-Tag: noindex
#http://www.google.com/support/webmasters/bin/topic.py?topic=8459
# http://help.yahoo.com/l/us/yahoo/search/webcrawler/index.html
use /robots.txt
trap spambot.
usually bad bots will intentionally follow any disallow in the robots.txt
use that disallow links to a script to add bad bots ip to your htaccess block list
http://www.kloth.net/internet/badbots.php
http://www.kloth.net/internet/bottrap.php
http://devin.com/sugarplum/
http://www.leekillough.com/robots.html
#example /robots.txt
User-agent: *
Disallow: /guestbook #guestbook is the most attacked folder. even if you never have it and never links it, spambot will try to access it
Disallow: /cgi-bin/guestbook.cgi #your guestbook.cgi should be replaced with spambot trap
Disallow: /emailaccounts #use scripts to poison spambot with thousands/millions of fake email addresses
Disallow: /anothertrap
Disallow: /moretrap
.htaccess
- combined with robots.txt to trap and block bad bots ip addresses.
- check referer. use redirect if access to POST your form is not from your own web.
- try redirect to your spambot trap script to block spambot ip.
- block known bad bots
- http://it.dennyhalim.com/search/label/htaccess
- http://evolt.org/article/Using_Apache_to_stop_bad_robots/18/15126/index.html
RewriteEngine On
# Forbid requests for exploits & annoyances
# Bad requests
RewriteCond %{REQUEST_METHOD}!^(GET¦HEAD¦POST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]
# Email
RewriteCond %{REQUEST_URI} (mail.?form¦form¦form.?mail¦mail¦mailto)\.(cgi¦exe¦pl)$ [NC,OR]
# MSOffice
RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR]
# Nimda
RewriteCond %{REQUEST_URI} /(admin¦cmd¦httpodbc¦nsiislog¦root¦shell)\.(dll¦exe) [NC,OR]
# Various
RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
RewriteCond %{REQUEST_URI} /sensepost\.exe [NC]
RewriteRule .* - [F]
# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]
# Banning BOTS bellow
# Address harvesters
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider¦ExtractorPro) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^E?Mail.?(Collect¦Harvest¦Magnet¦Reaper¦Siphon¦Sweeper¦Wolf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DTS.?Agent¦Email.?Extrac) [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC,OR]
# Download managers
RewriteCond %{HTTP_USER_AGENT} ^(Alligator¦DA.?[0-9]¦DC\-Sakura¦Download.?(Demon¦Express¦Master¦Wonder)¦FileHound) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Flash¦Leech)Get [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Fresh¦Lightning¦Mass¦Real¦Smart¦Speed¦Star).?Download(er)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Gamespy¦Go!Zilla¦iGetter¦JetCar¦Net(Ants¦Pumper)¦SiteSnagger¦Teleport.?Pro¦WebReaper) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(My)?GetRight [NC,OR]
# Image-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(AcoiRobot¦FlickBot¦webcollage) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Express¦Mister¦Web).?(Web¦Pix¦Image).?(Pictures¦Collector)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image.?(fetch¦Stripper¦Sucker) [NC,OR]
# "Gray-hats"
RewriteCond %{HTTP_USER_AGENT} ^(Atomz¦BlackWidow¦BlogBot¦EasyDL¦Marketwave¦Sqworm¦SurveyBot¦Webclipping\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (girafa\.com¦gossamer\-threads\.com¦grub\-client¦Netcraft¦Nutch) [NC,OR]
# Site-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(eCatch¦(Get¦Super)Bot¦Kapere¦HTTrack¦JOC¦Offline¦UtilMind¦Xaldon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(Auto¦Cop¦dup¦Fetch¦Filter¦Gather¦Go¦Leach¦Mine¦Mirror¦Pix¦QL¦RACE¦Sauger) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(site.?(eXtractor¦Quester)¦Snake¦ster¦Strip¦Suck¦vac¦walk¦Whacker¦ZIP) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCapture [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
# Tools
RewriteCond %{HTTP_USER_AGENT} ^(curl¦Dart.?Communications¦Enfish¦htdig¦Java¦larbin) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (FrontPage¦Indy.?Library¦RPT\-HTTPClient) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww¦lwp¦PHP¦Python¦www\.thatrobotsite\.com¦webbandit¦Wget¦Zeus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Microsoft¦MFC).(Data¦Internet¦URL¦WebDAV¦Foundation).(Access¦Explorer¦Control¦MiniRedir¦Class) [NC,OR]
# Unknown
RewriteCond %{HTTP_USER_AGENT} ^(Crawl_Application¦Lachesis¦Nutscrape) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[CDEFPRS](Browse¦Eval¦Surf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Demo¦Full.?Web¦Lite¦Production¦Franklin¦Missauga¦Missigua).?(Bot¦Locat) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (efp@gmx\.net¦hhjhj@yahoo\.com¦lerly\.net¦mapfeatures\.net¦metacarta\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Industry¦Internet¦IUFW¦Lincoln¦Missouri¦Program).?(Program¦Explore¦Web¦State¦College¦Shareware) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Mac¦Ram¦Educate¦WEP).?(Finder¦Search) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa¦MSIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC]
RewriteRule .* - [F]
#!/usr/bin/perl -w
$remreq = $ENV{REQUEST_URI};
$remaddr = $ENV{REMOTE_ADDR};
$usragnt = $ENV{HTTP_USER_AGENT} ¦¦ "The UA is blank";
$referer = $ENV{'HTTP_REFERER'} ¦¦ "there is no referer";
$date = scalar localtime(time);
$remmeth = $ENV{REQUEST_METHOD};
$remhost = $ENV{'HTTP_HOST'};
open(MAIL, "¦/usr/sbin/sendmail -t") ¦¦ die "Content-type: text/text\n\nCan't open /usr/sbin/sendmail!";
print MAIL "To: ****\@yyy\.zzz\n";
print MAIL "From: xxx\@yyy\.zzz\n";
print MAIL "Subject: You caught another one!\n\n";
print MAIL "The following 'intruder' was caught by the \"Bot Trap\" and has been added to the ban env in .htaccess:\n\n";
print MAIL "The ip address: $remaddr was listed on $date \n";
print MAIL "The file requested was: $remreq\n";
print MAIL "The method used was: $remmeth\n";
print MAIL "The intruder's user agent was: $usragnt\n";
print MAIL "The document was referred by: $referer\n";
print MAIL "The Host Server is was $remhost\n";
close(MAIL);
exit;
#http://www.webmasterworld.com/forum92/413.htm
secure form
spambots only know html and they do not proccess js/flash nor they have ocr to read from image.
use any combination of these.
- use captcha
- use js instead of pure html to write form (learn why people use js to write their email address)
- use flash form
- your form script should only proccess POST and die on GET
- check referer. die if form is not POST or referer is not your own website.
http://www.javaworld.com/jw-06-1996/jw-06-javascript.html
<form method="POST" action="" onsubmit="proccessbyjs();">
// so, users will need to activate js to be able to post form
1. student rewards credit card
2. what is a student credit card
3. college student credit cards
4. free student credit cards
5. best student credit cards is born
6. student credit cards with no credit
7. best student credit cards
8. best student credit cards
9. credit cards for college students
10. instant approval student credit cards
11. student credit cards with no credit
12. low apr student credit cards
13. credit cards for students
14. low apr student credit card
15. credit cards for students
16. how to text a girl
17. free student credit cards
Post preview:
Close preview