Spam comments left by humans

In the past few months, Mollom has done a very good job blocking spam on my site. Its accuracy has increased a lot since I first installed it, nearly no spam gets through even though there are hundreds attempts a day.

However, I've been seeing a different kind of spam that does get through. Comments that are left by humans, who can solve the captcha shown by Mollom. These comments are usually on-topic and at first glance appear to be legit, except that the homepage URL of the author links to a spam site. I enabled nofollow a long time ago, but that doesn't appear to make much of a difference anymore.

I've even seen comments that had no link in them at all. I'm not sure what the purpose of those messages is. Maybe they are just checking whether someone removes these comments, or maybe they are trying to confuse Bayesian filters. Either way they are pretty annoying.

I could enable comment moderation on this site, but that would make me the bottleneck as I don't check the queue that often. I noticed that nearly all of these comments show up on a limited number of pages. So I wrote a simple little module that allows me to enable comment moderation for individual nodes and enabled this on the pages that are targeted by these spammers.
This has been running for about a week now and has blocked all such messages. There have even been less attempts, possibly because of the warning message that appears in the comment form. Lets hope it stays this way so I can remove the nofollow tags on links.

filed under

Mollom

After months of testing, Mollom has finally gone public beta. Mollom is a web service that checks comments and other content for spam similar to what Akismet does. The nice thing is that when it's not sure about a message, it doesn't place it in a moderation queue but instead shows the user a CAPTCHA. For more information, see "How mollom works".

I've been testing the module the past few months and it works pretty well. Very few spam comments got past the filters while at the same time most of the regular visitors never had to fill out an ugly CAPTCHA. They also have some interesting goals, not limited to just blocking spam, but checking the quality of content as well.

And most important, it has fancy graphs ;)

It will be interesting to see how this evolves and if their infrastructure can handle all the new users. Earlier today the website itself was unreachable because of a major power outage in the LCL datacenter where the website is hosted. Luckily the web service itself is distributed across multiple locations and as far as I can tell it just kept working. Leave it to Murphy to test your fail-over on the very first day..

filed under

PDF Spam

After reading about it on several websites, I finally got my first PDF Spam message. The message itself was empty, with only a simple PDF document attached. The contents of the PDF is a single image, the same type you often see in image spam:

It has misaligned fonts and different colors to fool OCR software, and the PDF itself was damaged. PDF readers had no problem opening the document but some other tools complained about broken references and refused to work. Obviously those tools were not written with spammers in mind.

The message came through my ISP's spam filters undetected.

I checked our filters at work and there a few of these messages were blocked by the antispam engine. It didn't seem to have a problem detecting them, but there were only a few, maybe it just got lucky ;)

filed under

The end of greylisting?

Most already know that I'm not a big fan of greylisting, mainly because I believe that simply delaying mails is not acceptable in larger environments and is very easy to bypass. If everyone were using greylisting, we'd have to throw more hardware at outgoing mail as well. Good for bussiness though ;)

For a while now I have been using selective greylisting on my personal server, only greylisting senders without a valid reverse DNS or those that are listed in blacklists. This has worked pretty well, however recently the guys from openminds reported that spammers are getting smarter, retrying messages when they were refused by greylisting. For a long time I suspected that spammers would adapt real soon, but it took them far longer than I thought.

Does anyone have more accurate numbers about this? I don't have large systems running greylisting to pull such statistics from but I do think it's strange that only a couple people reported this. Was this only a local glitch or are spammers really starting to adapt?

 

On a sidenote, have a look at the interesting reverse DNS behavior of networks like 65.111.26.0/24, 64.191.43.0/24, 216.74.115.0/24.. and many more. They have PTR records with a very short TTL (120 seconds) and are regulary switching from one domain to another.
For example in the past few days the address 65.111.26.16 has resolved to:

  • crowflies16.forexpose.com
  • crowflies16.hiccupeast.com
  • crowflies16.againwhite.com
  • crowflies16.shortgypsy.com

And probably many more because I only checked a few times. An attempt at circumventing reputation filters based on domain name?

filed under