Building a DIY Spam Filter

If you read my first post, you may remember that I'm writing this blog platform as I go. You may have also noticed that I never got around to implementing a spam filter. After all, who would want to spam little old me? Turns out, quite a few folks! Yes, the spam train has arrived. I've had to manually weed out thousands of spam comments to stop them appearing here. What to do?

Well the smart money would have been to use a 3rd party system like Askimet, but as I say, where's the fun in that? I am actually genuinely interested in the patterns of spam and how we can beat it, so I wanted to see if I could build a simple and effective filter. But I want to lay some ground-rules:

  • No CAPTCHAs, users hate them
  • No mandatory account registration or Facebook/Twitter integration
  • Must be unsupervised, I don't want to have to train a machine learning algorithm
  • Must be contextual to my blog; I'm not trying to write a new Askimet, it doesn't need to cater for generic content.

So I've built a very simple filter to try identify obviously valid or spammy comments by looking at both the content, and the user who's posting it. I think spam filters focus too much on the content and not enough on analysing the behaviour of the user posting the comment, to see if they act as a normal user would. So some quick observations on the spam I get;

  • Valid commenters spend a reasonable amount of time on your site before commenting, maybe a few minutes. Spammers automate their comments.
  • Valid commenters don't post too many links (maybe one). Spammers posts loads of links
  • Valid commenters might put their blog or twitter url in the "website" field without repeating it in their comment. Spammers often do.
  • Spammers usually come from IPs found to be spammy before
  • Spammers usually include multiple spammy words.
  • Spammers always put their links in anchor tags.

So the spam filter I've implemented basically looks at the observations above and rewards obviously valid comments, and punishes obviously spammy comments. It's based on a points system, with an additional weighting on each rule to give more importance where applicable and results in a spam probability % which it uses to make a decision on whether the comment is spam or not. 0% means we're confident that it's not spam, 100% means we're confident it is spam. Somewhere in the middle is a threshold %.

Does it work?

So far, the signs point to...YES! After a bit of tweaking with the scores and weights for each rule, it's stopped the vast majority of spam. What still gets through?

  • Manually entered spam - yup some people actually sit down and manually post spam on sites. How fun. This means my time-on-site metric is no good, as they sometimes spend a few minutes on the site. But luckily they're stupid enough to break most of the other rules and it often does flag them as spam.
  • Spam with no links - every so often you get spam that actually has no links in the content (and no spammy words). It could be an attempt to train machine learning algorithms to accept similar comments in the future (which will have links in no doubt) and maybe white-list their IP.
  • Spam in other (non-English) languages
How could I improve it?

Lots of ways! Some simple rules I still need to add:

  • Analysis of user interaction on the page (using JavaScript) - did the user move their mouse, press keys etc? Try reward commenters who behave in a normal way, e.g. spend a few minutes reading the post, typing the comment and then submitting.
  • Detect non-English comments

But my big idea is about challenging the logic that a spam check is black or white, that our system decides that it either is spam or is not spam. That's fine for the obvious cases, but there are plenty of times where the actual answer is "we're not sure". How about having a tiered system based on our scoring, so that if we're not sure, we take additional steps to try determine if the content is spam or not. Say we decide that a score between 25% and 75% represents the "we're not sure" state. Some options on what additional steps we could take:

  • Go back to the user and ask them to submit a CAPTCHA. I know, I know, I said I don't want them, but the vast majority of users wouldn't see it, only ones who are pushing their luck with potentially spammy comments.
  • Follow the links that they include and analyse that content to see if it's spam. Black-listing of URLs/domains doesn't work because spammers use profile pages on popular sites like Flickr and Fotolog as a gateway page to their actual spam. But these pages are often filled with the normal spammy key words and might tell us what we need to know.
  • Email the user (using the address given in the comment form) asking them to confirm their comment by email.
  • Analyse known formats of comments. E.g. I often see spam in this form:
    {question}? <a href="{url-very-similar-to-one-supplied-as-commenter-website}">{spammy words}</a>

If any spammers are reading this and hatching a plan to thwart my efforts, I direct you to this obligatory XKCD comic.

Comments (0)

Add a Comment