Stopping referrer spam (or die trying)

Referrer spam, to some webmasters something unknown, to others a severe pain in the butt. People visiting your website while faking a refferer aren’t that bad, but people actually using webspiders to massively desecrate your website statistics are very annoying, these people need to be targeted.
Since some people seem to think I appreciate refferer spam on my website’s statistics,  I’ve decided to initiate some counter actions using mod_rewrite.
Having access to your website’s logfiles is a must, it helps you identify the IP addresses of the people generating this referer spam.

Here’s a sample from the statistics from a weblog I run to track the stuff I pulled on my webserver. As you can see, its supposedly been visited by being referred through some site that have nothing to do with mine, but could use with some extra referrers to get their Google PageRank up a few points.

I used to think that simply blocking the referrers using rewrite rules would suffice,
http://www.spywareinfo.com/articles/referer_spam/ and
http://www.joemaller.com/htaccess.txt are a great help in doing so. But this doesn’t seem sufficient. It appears most of these people own a very large number of domains they can spam around with, and these referrer rules simply can’t catch all of these.
And here we can intervene.
I have noticed that these spammers keep coming back to your site as soon as they’ve seen that their evildoings work. Most of them keep coming from the same IP address. I’ve retrieved the top 10 of referrer spammers from my logfiles using the following command:

egrep -e casino -e horny -e bargain -e craps -e hentai \  
 -e office -e pill -e viagra -e poker -e credit -e cialis \  
 access_log | awk ‘{print $1}’ | sort | uniq -c | sort -r | head -n 10

Each ‘-e’ is the trigger for a new keyword to grep from your website’s logfile. These keywords you can lift from the referrers in your statistics page. This gives us the top 10 output of IP addresses and the amount of abusive hits. We can use this for blocking them:

249 195.175.37.8  
 70 82.142.161.86  
 66 148.244.150.58  
 65 196.28.48.100  
 60 80.227.56.46  
 51 80.227.56.42  
 20 200.212.114.3  
 12 82.92.45.10  
 10 211.185.38.61  
 8 82.161.162.56  

Since I already use the rewrite engine to block a big load of unwanted shite, lets make some rules to block these baddies:

# cat .htaccess

RewriteEngine On  
RewriteCond %{REMOTE_ADDR} ^195.175.37.8$ [OR]  
RewriteCond %{REMOTE_ADDR} ^82.142.161.86$ [OR]  
RewriteCond %{REMOTE_ADDR} ^148.244.150.58$ [OR]  
RewriteCond %{REMOTE_ADDR} ^196.28.48.100$ [OR]  
RewriteCond %{REMOTE_ADDR} ^80.227.56.46$ [OR]  
RewriteCond %{REMOTE_ADDR} ^80.227.56.42$ [OR]  
RewriteCond %{REMOTE_ADDR} ^200.212.114.3$ [OR]  
RewriteCond %{REMOTE_ADDR} ^82.92.45.10$ [OR]  
RewriteCond %{REMOTE_ADDR} ^211.185.38.61$ [OR]  
RewriteCond %{REMOTE_ADDR} ^82.161.162.56$  
RewriteRule .* – [F,L]  

This will wil give the visitors originated from one of the listed IPs a very nice Forbidden notice :)

The referrer spammers don’t always originate from the same IP address, sometimes the work from 2 or 3, but almost always from the same subnet. If you want to block an entire subnet, you can replace the last part of the IP address, including the ‘$’, with [0-255] so you’ll get something like this:

RewriteEngine On  
RewriteCond %{REMOTE_ADDR} ^195.175.37.8$ [OR]  
RewriteCond %{REMOTE_ADDR} ^82.142.161.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^148.244.150.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^196.28.48.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^80.227.56.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^80.227.56.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^200.212.114.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^82.92.45.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^211.185.38.[0-255] [OR]  
RewriteCond %{REMOTE_ADDR} ^82.161.162.[0-255]  
RewriteRule .* – [F,L]  

Be careful with blocking entire subnets, this could cause perfectly innocent visitors to be blocked.