Spam comments left by humans

In the past few months, Mollom has done a very good job blocking spam on my site. Its accuracy has increased a lot since I first installed it, nearly no spam gets through even though there are hundreds attempts a day.

However, I've been seeing a different kind of spam that does get through. Comments that are left by humans, who can solve the captcha shown by Mollom. These comments are usually on-topic and at first glance appear to be legit, except that the homepage URL of the author links to a spam site. I enabled nofollow a long time ago, but that doesn't appear to make much of a difference anymore.

I've even seen comments that had no link in them at all. I'm not sure what the purpose of those messages is. Maybe they are just checking whether someone removes these comments, or maybe they are trying to confuse Bayesian filters. Either way they are pretty annoying.

I could enable comment moderation on this site, but that would make me the bottleneck as I don't check the queue that often. I noticed that nearly all of these comments show up on a limited number of pages. So I wrote a simple little module that allows me to enable comment moderation for individual nodes and enabled this on the pages that are targeted by these spammers.
This has been running for about a week now and has blocked all such messages. There have even been less attempts, possibly because of the warning message that appears in the comment form. Lets hope it stays this way so I can remove the nofollow tags on links.

filed under

Using Amazon S3 to serve static files

A few months ago I wrote about using lighttpd with secdownload to serve files. At that time I already mentioned running out of disk space on my server, so I had to look for alternatives. I had to choose between renting a server with more diskspace and using something different all together.

I decided to stop using my own server and using Amazon S3 instead for a few simple reasons:

  • It is a lot cheaper than renting a server which is just sitting idle 99.9% of the time.
  • Unlimited storage. I can just keep uploading files to S3 without having to worry about how much space I am using. Upgrading my server is just a temporary solution because in a year or so I would be faced with the same problem all over again.
  • It should be far more reliable than just a single static file server.

Note that I'm just using an external file server because the primary server has a rather slow connection. It's enough to serve web pages, but not fast enough to serve large files. I don't need fast replication between the servers. In fact, I want it to be scheduled to use the spare bandwidth at night.

Switching from my old setup to S3 turned out to be very simple. In the past, I used rsync to transfer files to the file server at night and used a simple PHP function in my theme to replace the links to local files with links to the file server.
Similar tools exist for S3. Instead of rsync, there is a tool called S3Sync that can be used to transfer files to amazon S3. It took some time getting it to work though. Initially it didn't support EU buckets (it does now), and it is less resilient. Whenever the connection got interrupted, the script would display a Broken Pipe message and get stuck in retries. But thanks to a post on the forum I was able to modify the script so that it would recover from such errors. Note that by now a new version has been released which probably fixes this problem - but I haven't upgraded yet. Currently the only drawback is that there is no support for bandwidth limiting so I'll have to use another tool for that.

On the PHP side I had some difficulties getting the signing to behave correctly. Most of the examples I found online had issues with filenames containing spaces or slashes. I eventually used the function below, I can't guarantee that it is correct but haven't seen any problems with it so far.

  1. <?php
  2. // grab this with "pear install --onlyreqdeps Crypt_HMAC"
  3. require_once('Crypt/HMAC.php');
  4.  
  5. // Amazon S3 credentials
  6. define('S3_ACCESS_KEY_ID', 'your-S3-access-key');
  7. define('S3_SECRET_ACCESS_KEY', 'your-S3-secret-access-key');
  8.  
  9. /**
  10.  * Generate a link to download a file from Amazon S3 using query string
  11.  * authentication. This link is only valid for a limited amount of time.
  12.  *
  13.  * @param $bucket The name of the bucket in which the file is stored.
  14.  * @param $filekey The key of the file, excluding the leading slash.
  15.  * @param $expires The amount of time the link is valid (in seconds).
  16.  * @param $operation The type of HTTP operation. Either GET or HEAD.
  17.  */
  18. function mymodule_get_s3_auth_link($bucket, $filekey, $expires = 300, $operation = 'GET') {
  19. $expire_time = time() + $expires;
  20.  
  21. $filekey = rawurlencode($filekey);
  22. $filekey = str_replace('%2F', '/', $filekey);
  23.  
  24. $path = $bucket .'/'. $filekey;
  25.  
  26. /**
  27.   * StringToSign = HTTP-VERB + "\n" +
  28.   * Content-MD5 + "\n" +
  29.   * Content-Type + "\n" +
  30.   * Expires + "\n" +
  31.   * CanonicalizedAmzHeaders +
  32.   * CanonicalizedResource;
  33.   */
  34. $stringtosign =
  35. $operation ."\n". // type of HTTP request (GET/HEAD)
  36. "\n". // Content-MD5 is meaningless for GET
  37. "\n". // Content-Type is meaningless for GET
  38. $expire_time ."\n". // set the expire date of this link
  39. "/$path"; // full path (incl bucket), starting with a /
  40.  
  41. $signature = urlencode(mymodule_constructSig($stringtosign));
  42.  
  43. $url = sprintf('http://%s.s3.amazonaws.com/%s?AWSAccessKeyId=%s&Expires=%u&Signature=%s',
  44. $bucket, $filekey, S3_ACCESS_KEY_ID, $expire_time, $signature);
  45.  
  46. return $url;
  47. }
  48.  
  49. function mymodule_hex2b64($str) {
  50. $raw = '';
  51. for ($i=0; $i &lt; strlen($str); $i+=2) {
  52. $raw .= chr(hexdec(substr($str, $i, 2)));
  53. }
  54. return base64_encode($raw);
  55. }
  56.  
  57. function mymodule_constructSig($str) {
  58. $hasher =& new Crypt_HMAC(S3_SECRET_ACCESS_KEY, 'sha1');
  59. $signature = mymodule_hex2b64($hasher-&gt;hash($str));
  60. return $signature;
  61. }
  62. ?>

Of course, as these URLs are only valid for a limited amount of time, make sure the clock on your server is accurate, otherwise the download links won't work.

Mollom

After months of testing, Mollom has finally gone public beta. Mollom is a web service that checks comments and other content for spam similar to what Akismet does. The nice thing is that when it's not sure about a message, it doesn't place it in a moderation queue but instead shows the user a CAPTCHA. For more information, see "How mollom works".

I've been testing the module the past few months and it works pretty well. Very few spam comments got past the filters while at the same time most of the regular visitors never had to fill out an ugly CAPTCHA. They also have some interesting goals, not limited to just blocking spam, but checking the quality of content as well.

And most important, it has fancy graphs ;)

It will be interesting to see how this evolves and if their infrastructure can handle all the new users. Earlier today the website itself was unreachable because of a major power outage in the LCL datacenter where the website is hosted. Luckily the web service itself is distributed across multiple locations and as far as I can tell it just kept working. Leave it to Murphy to test your fail-over on the very first day..

filed under

Drupal 5 Themes Book

A while ago I received a new addition to my Drupal books collection, Drupal 5 Themes. For personal reasons I never got around to writing down my thoughts about it.

Being a developer myself, I thought it would be interesting to read about Drupal from a designers point of view. I even learned a thing or two that could be useful next time I need to modify a theme. I think the book is a good introduction for designers, it covers the basics without exposing the reader to large blocks of PHP code (although some PHP knowledge might be useful).

The first chapters explain the basics about how themes work and how pages are rendered in Drupal, including theming engines and how overrides work. It then goes on to showing how to modify an existing theme and eventually building a new theme from scratch. The final chapter has a short introduction on altering forms.

I was a bit disappointed that the book didn't mention some of the important contrib modules like CCK or Views. Also, some of the screenshots in the book were a bit difficult to read, probably because of the way they were converted from color to grayscale.

That aside, if you are new to drupal theming, this book is a good introduction. Of course, don't expect to know everything there is to know about drupal theming just by reading one book. The book was written for Drupal 5, even though Drupal 6 has been released by now, many of the underlying principles are still the same. So the book is still useful. If anything, theming has gotten easier in Drupal 6, see the theme update documentation and the theme guide for more information.

filed under