Creating Blog Spam Alerts
Many of the various sites I maintain have comment sections. Some have Forum software (typically Kunena), some allow comments on articles, some on images. All of these site have in common that if the configuration is not perfectly right, you end up with tons of spam. And by “tons,” I mean several gigabytes of uploads a month.
I am slowly getting the configuration down, but every software update, every new installation is another potential trap. Then I will look at the stats and realize there is a sudden surge of uploads from a particular location (so far, Russia and Ukraine have been particularly prolific). I will look at the comments, and they will typically be in English and try to sell (for whatever reason) fashion articles.
I assume, from referrer traffic, that this is a ploy to increase the search engine ranking of sites that sell counterfeit wares. If I have a link to a Guggi handbag (popular with spammers!), and I am a “good” site, then Google et al. will rank the link higher.
So far, I’ve been going for a very low-level approach: I look for IP addresses that generate “too much” traffic, block them, and look at the database tables that hold comment spam. Then I remove the offending comments manually.
As with all problems that grow fast, though, this was not acceptable. I ended up doing something a little more organized. Now I have a warning system that checks the database every set number of minutes and immediately reports when something has been added to the tables that could contain spam.
The script is very simple: it builds a SQL query and runs it against the database. It then stores the results in a text file. The next time it runs, it compares the stored file with the current result. If the two don’t match (and something has been added to the site), the script warns me.
This could be refined by checking the content of the message, sending out a warning only if the content contains suspicious words, like Guggi, Viagra, enlargement, and http:.
More than giving me a warning, though, this system functions as a pacifier. I now know that nothing bad is happening on my servers while I am not actively watching.