Experimenting with Varnish

As a little experiment, my blog is now running behind a Varnish http accelerator. While squid can be used for this purpose as well, its primary function is a forward proxy. Varnish on the other hand is designed to be a very fast caching reverse proxy server.

I initially tried using the debian etch package of Varnish, but it is a bit outdated. So I ended up building my own package from source. It took a bit of research to configure everything correctly but all in all the configuration was pretty simple and it is easy to read. After a few weeks, I still know what it all means.

There are a few gotchas:

  • There isn't much documentation.
  • It will refuse to start if /tmp is mounted "noexec".
  • Turn off KeepAlives on the backend Apache servers, otherwise you might experience delays when requesting pages.
  • The VCL "pass" action does not support POST requests. You might come across older configuration examples on the internet that use the "pass" action but that simply doesn't work. You need to use the "pipe" action.
  • Varnish will only respect the max-age parameter of Cache-Control header by default. Any other values such as those indicating whether or not an object may be cached, will be ignored by default. You need to include some code in your VCL configuration file to handle these headers correctly (see example below).
  • If you need a large cache (multiple GBs), you need a 64bit system.
  • There is no persistent cache. Every time you restart varnish, the cache gets wiped. So don't restart unless you really need to.
  • During load testing, new requests would sometimes hang and ip_conntrack: table full: packet dropped was logged. This wasn't a problem with Varnish but the firewall which couldn't handle that many connections. Increasing the value in net/ipv4/netfilter/ip_conntrack_max fixed that.

All in all, Varnish seems to work pretty well. It can still be improved in many areas but thats to be expected for a v1 release.
Time for me to start testing this on a larger websites.

For future reference, below is the configuration file I've been using.

  1. # configuration file for andromeda
  2.  
  3. backend default {
  4. set backend.host = "208.68.209.225";
  5. set backend.port = "80";
  6. }
  7.  
  8. #
  9. # Block unwanted clients
  10. #
  11. acl blacklisted {
  12. "192.168.100.100";
  13. }
  14.  
  15. #
  16. # handling of request that are received from clients.
  17. # decide whether or not to lookup data in the cache first.
  18. #
  19. sub vcl_recv {
  20. # reject malicious requests
  21. call vcl_recv_sentry;
  22.  
  23. if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") {
  24. # Non-RFC2616 or CONNECT which is weird.
  25. pipe;
  26. }
  27. if (req.http.Expect) {
  28. # Expect is just too hard at present.
  29. pipe;
  30. }
  31. if (req.request != "GET" && req.request != "HEAD") {
  32. # we only deal with GET and HEAD
  33. # note that we need to use "pipe" instead of "pass" here. Pass isn't supported for
  34. # POST requests
  35. pipe;
  36. }
  37. if (req.http.Authorization) {
  38. # don't cache pages that are protected by basic authentication
  39. pass;
  40. }
  41. if (req.http.Accept-Encoding) {
  42. # Handle compression correctly. Varnish treats headers literally, not
  43. # semantically. So it is very well possible that there are cache misses
  44. # because the headers sent by different browsers aren't the same.
  45. # For more info: http:// varnish.projects.linpro.no/wiki/FAQ/Compression
  46. if (req.http.Accept-Encoding ~ "gzip") {
  47. # if the browser supports it, we'll use gzip
  48. set req.http.Accept-Encoding = "gzip";
  49. } elsif (req.http.Accept-Encoding ~ "deflate") {
  50. # next, try deflate if it is supported
  51. set req.http.Accept-Encoding = "deflate";
  52. } else {
  53. # unknown algorithm. Probably junk, remove it
  54. remove req.http.Accept-Encoding;
  55. }
  56. }
  57. if (req.url ~ "\.(jpg|jpeg|gif|png|css|js)$") {
  58. # allow caching of all images and css/javascript files
  59. lookup;
  60. }
  61. if (req.url ~ "^/files") {
  62. # anything in drupals files directory is static and may be cached
  63. lookup;
  64. }
  65. if (req.http.Cookie) {
  66. # Not cacheable by default
  67. # TODO: do we even need this? Can't we simply make sure dynamic
  68. # content never exists in the cache?
  69. pass;
  70. }
  71.  
  72. # every thing else we try to look up in the cache first
  73. lookup;
  74. }
  75.  
  76. #
  77. # Called when entering pipe mode
  78. #
  79. #sub vcl_pipe {
  80. # pipe;
  81. #}
  82.  
  83. #
  84. # Called when entering pass mode
  85. #
  86. #sub vcl_pass {
  87. # pass;
  88. #}
  89.  
  90. #
  91. # Called when entering an object into the cache
  92. #
  93. #sub vcl_hash {
  94. # set req.hash += req.url;
  95. # if (req.http.host) {
  96. # set req.hash += req.http.host;
  97. # } else {
  98. # set req.hash += req.http.host;
  99. # }
  100. # hash;
  101. #}
  102.  
  103. #
  104. # Called when the requested object was found in the cache
  105. #
  106. sub vcl_hit {
  107. if (!obj.cacheable) {
  108. # A response is considered cacheable if all of the following are true:
  109. # - it is valid
  110. # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410
  111. # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account.
  112. pass;
  113. }
  114. deliver;
  115. }
  116.  
  117. #
  118. # Called when the requested object was not found in the cache
  119. #
  120. #sub vcl_miss {
  121. # fetch;
  122. #}
  123.  
  124. #
  125. # Called when the requested object has been retrieved from the
  126. # backend, or the request to the backend has failed
  127. #
  128. sub vcl_fetch {
  129. if (!obj.valid) {
  130. # don't cache invalid responses.
  131. error;
  132. }
  133. if (!obj.cacheable) {
  134. # A response is considered cacheable if all of the following are true:
  135. # - it is valid
  136. # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410
  137. # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account.
  138. #
  139. # If a response is not cachable, simply pass it along to the client.
  140. pass;
  141. }
  142. if (obj.http.Set-Cookie) {
  143. # don't cache content that sets cookies (eg dynamic PHP pages).
  144. pass;
  145. }
  146. if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") {
  147. # varnish by default ignores Pragma and Cache-Control headers. It
  148. # only looks at the "max-age=" value in the Cache-Control header to
  149. # determine the TTL. So we need this rule so that the cache respects
  150. # the wishes of the backend application.
  151. pass;
  152. }
  153. if (obj.ttl < 180s) {
  154. # force minimum ttl of 180 seconds for all cached objects.
  155. set obj.ttl = 180s;
  156. }
  157. insert;
  158. }
  159.  
  160. #
  161. # Called before a cached object is delivered to the client
  162. #
  163. #sub vcl_deliver {
  164. # deliver;
  165. #}
  166.  
  167. #
  168. # Called when an object nears its expiry time
  169. #
  170. #sub vcl_timeout {
  171. # discard;
  172. #}
  173.  
  174. #
  175. # Called when an object is about to be discarded
  176. #
  177. #sub vcl_discard {
  178. # discard;
  179. #}
  180.  
  181. #
  182. # Custom routine to detect malicious requests and reject them (called by vcl_recv).
  183. #
  184. sub vcl_recv_sentry {
  185. if (client.ip ~ blacklisted) {
  186. error 503 "Your IP has been blocked.";
  187. }
  188. }

Configuring an IPv6 tunnel on a netscreen SSG firewall

I have a netscreen SSG firewall from which I wanted to establish a tunnel to SixXS. I tried this in the past when IPv6 support had just been added, but didn't have much luck. However, nowadays it works pretty well. I've had a working tunnel for about a year now.

The first thing to do is to get a tunnel to an IPv6 broker and request a subnet for your internal systems. Personally I use SixXS but you can probably use any broker you like. As an example, I'll be using these settings:

  • Tunnel broker ipv4 address: 127.34.8.97
  • Tunnel ipv6 address: 2001:DB8:202:123::2/64
  • ipv6 subnet used internally: 2001:DB8:8AA:1::/64

Second, start by upgrading your device to the latest available firmware. IPv6 support is constantly improving as support for more ALGs is added. If you still have a 5GT, you're stuck using 5.4 but if you have an SSG, you can upgrade to 6.0 or even 6.1.

Next, enable IPv6 support on the device. It's not enabled by default so you have to change a boot parameter and restart the device. You can only do this from the CLI:

  1. set envar ipv6=yes

After the device has rebooted, you'll see that there are a few new commands available to configure IPv6 on interfaces and use IPv6 addresses in policies. Next we'll begin configuring the tunnel to our tunnel broker. I'd recommend doing this from the CLI when you are using version 5.4 because the WebUI had some bugs back then.

  1. set interface tunnel.1 zone "Untrust"
  2. set interface tunnel.1 ipv6 mode "host"
  3. set interface tunnel.1 ipv6 ip 2001:DB8:202:123::2/64
  4. set interface tunnel.1 ipv6 enable
  5. set interface tunnel.1 tunnel encap ip6in4 manual
  6. set interface tunnel.1 tunnel local-if adsl2/0 dst-ip 127.34.8.97
  7. set interface tunnel.1 mtu 1280
  8. set route ::/0 interface tunnel.1 gateway :: preference 20

This should give you a working tunnel and allow you to ping IPv6 addresses from the netscreen itself. Next we'll assign an IPv6 address to our internal interface and configure it so that the attached systems will get an autoconfigured address:

  1. set interface ethernet0/1 ipv6 mode "router"
  2. set interface ethernet0/1 ipv6 ip 2001:DB8:8AA:1::1/64
  3. set interface ethernet0/1 ipv6 enable
  4. unset interface ethernet0/1 ipv6 ra link-address
  5. set interface ethernet0/1 ipv6 ra transmit
  6. set interface ethernet0/1 ipv6 nd nud

That should give your internal systems an IPv6 address. All that is left now is configure a policy to allow the IPv6 traffic to pass through. Configuring this is just like when you are using IPv4, except that there now isn't an object named "Any" anymore but there are two, "Any-IPv4" and "Any-IPv6".
A simple policy to allow outbound connections would be:

  1. set policy from "Trust" to "Untrust" "Any-IPv6" "Any-IPv6" "ANY" permit log

Of course, I'd recommend to make them a bit more restrictive. Be very careful with rules from Untrust to Trust. All your internal systems will have a public IP address, don't just allow everything or you might wake up one morning and find something like this:

One thing to keep in mind when you are updating your policies is that you cannot have both IPv4 and IPv6 addresses in one address group. This effectively means that you will have separate policies in your rulebase for IPv4 and IPv6, so try to keep them ordered logically.

Update sept 13, 2009
I recently set up another IPv6 tunnel, this time on a netscreen 5GT running ScreenOS 6.2r3. I decided to incorporate some of the feedback from the comments below, so that the netscreen can be pinged by SixXS for monitoring. The netscreen should respond to pings on its tunnel interface, but doesn't do so for IPv6 traffic, a bug which has been reported to Juniper but which has not yet been fixed.

So instead of putting my IPv6 endpoint address on the tunnel interface, I used a loopback interface, with a /128 subnet mask. Then I created the tunnel interface as before, but making it unnumbered, inheriting the address from the loopback interface:

  1. set interface "loopback.1" zone "Untrust"
  2. set interface "loopback.1" ipv6 mode "host"
  3. set interface "loopback.1" ipv6 ip 2001:DB8:202:123::2/128
  4. set interface "loopback.1" ipv6 enable
  5. set interface loopback.1 route
  6. set interface loopback.1 manage ping
  7. unset interface loopback.1 ipv6 nd nud
  8.  
  9. set interface tunnel.6 ip unnumbered interface loopback.1
  10. set interface "tunnel.6" ipv6 mode "host"
  11. set interface "tunnel.6" ipv6 enable
  12. set interface tunnel.6 tunnel encap ip6in4 manual
  13. set interface tunnel.6 tunnel local-if adsl2/0 dst-ip 127.34.8.97
  14. set interface tunnel.6 mtu 1280
  15. ! had to disable NUD otherwise the tunnel interface state changed to down.
  16. unset interface tunnel.6 ipv6 nd nud
  17.  
  18. set route ::/0 interface tunnel.6 gateway :: preference 20
  19.  
  20. ! note: an explicit policy is only needed when intra-zone blocking is enabled on the Untrust zone
  21. set policy name "SixXS monitoring" from "Untrust" to "Untrust" "Any-IPv6" "2001:DB8:202:123::2/128" "ICMP6 Echo Request" permit

This works perfectly. Because my external interface IP is dynamic on this device, I set the SixXS tunnel type to Heartbeat and installed AICCU on an internal machine to keep the connection alive.

Blog Category:

Using Amazon S3 to serve static files

A few months ago I wrote about using lighttpd with secdownload to serve files. At that time I already mentioned running out of disk space on my server, so I had to look for alternatives. I had to choose between renting a server with more diskspace and using something different all together.

I decided to stop using my own server and using Amazon S3 instead for a few simple reasons:

  • It is a lot cheaper than renting a server which is just sitting idle 99.9% of the time.
  • Unlimited storage. I can just keep uploading files to S3 without having to worry about how much space I am using. Upgrading my server is just a temporary solution because in a year or so I would be faced with the same problem all over again.
  • It should be far more reliable than just a single static file server.

Note that I'm just using an external file server because the primary server has a rather slow connection. It's enough to serve web pages, but not fast enough to serve large files. I don't need fast replication between the servers. In fact, I want it to be scheduled to use the spare bandwidth at night.

Switching from my old setup to S3 turned out to be very simple. In the past, I used rsync to transfer files to the file server at night and used a simple PHP function in my theme to replace the links to local files with links to the file server.
Similar tools exist for S3. Instead of rsync, there is a tool called S3Sync that can be used to transfer files to amazon S3. It took some time getting it to work though. Initially it didn't support EU buckets (it does now), and it is less resilient. Whenever the connection got interrupted, the script would display a Broken Pipe message and get stuck in retries. But thanks to a post on the forum I was able to modify the script so that it would recover from such errors. Note that by now a new version has been released which probably fixes this problem - but I haven't upgraded yet. Currently the only drawback is that there is no support for bandwidth limiting so I'll have to use another tool for that.

On the PHP side I had some difficulties getting the signing to behave correctly. Most of the examples I found online had issues with filenames containing spaces or slashes. I eventually used the function below, I can't guarantee that it is correct but haven't seen any problems with it so far.

  1. <?php
  2. // grab this with "pear install --onlyreqdeps Crypt_HMAC"
  3. require_once('Crypt/HMAC.php');
  4.  
  5. // Amazon S3 credentials
  6. define('S3_ACCESS_KEY_ID', 'your-S3-access-key');
  7. define('S3_SECRET_ACCESS_KEY', 'your-S3-secret-access-key');
  8.  
  9. /**
  10. * Generate a link to download a file from Amazon S3 using query string
  11. * authentication. This link is only valid for a limited amount of time.
  12. *
  13. * @param $bucket The name of the bucket in which the file is stored.
  14. * @param $filekey The key of the file, excluding the leading slash.
  15. * @param $expires The amount of time the link is valid (in seconds).
  16. * @param $operation The type of HTTP operation. Either GET or HEAD.
  17. */
  18. function mymodule_get_s3_auth_link($bucket, $filekey, $expires = 300, $operation = 'GET') {
  19. $expire_time = time() + $expires;
  20.  
  21. $filekey = rawurlencode($filekey);
  22. $filekey = str_replace('%2F', '/', $filekey);
  23.  
  24. $path = $bucket .'/'. $filekey;
  25.  
  26. /**
  27. * StringToSign = HTTP-VERB + "\n" +
  28. * Content-MD5 + "\n" +
  29. * Content-Type + "\n" +
  30. * Expires + "\n" +
  31. * CanonicalizedAmzHeaders +
  32. * CanonicalizedResource;
  33. */
  34. $stringtosign =
  35. $operation ."\n". // type of HTTP request (GET/HEAD)
  36. "\n". // Content-MD5 is meaningless for GET
  37. "\n". // Content-Type is meaningless for GET
  38. $expire_time ."\n". // set the expire date of this link
  39. "/$path"; // full path (incl bucket), starting with a /
  40.  
  41. $signature = urlencode(mymodule_constructSig($stringtosign));
  42.  
  43. $url = sprintf('http://%s.s3.amazonaws.com/%s?AWSAccessKeyId=%s&Expires=%u&Signature=%s',
  44. $bucket, $filekey, S3_ACCESS_KEY_ID, $expire_time, $signature);
  45.  
  46. return $url;
  47. }
  48.  
  49. function mymodule_hex2b64($str) {
  50. $raw = '';
  51. for ($i=0; $i &lt; strlen($str); $i+=2) {
  52. $raw .= chr(hexdec(substr($str, $i, 2)));
  53. }
  54. return base64_encode($raw);
  55. }
  56.  
  57. function mymodule_constructSig($str) {
  58. $hasher =& new Crypt_HMAC(S3_SECRET_ACCESS_KEY, 'sha1');
  59. $signature = mymodule_hex2b64($hasher-&gt;hash($str));
  60. return $signature;
  61. }
  62. ?>

Of course, as these URLs are only valid for a limited amount of time, make sure the clock on your server is accurate, otherwise the download links won't work.

Mollom

After months of testing, Mollom has finally gone public beta. Mollom is a web service that checks comments and other content for spam similar to what Akismet does. The nice thing is that when it's not sure about a message, it doesn't place it in a moderation queue but instead shows the user a CAPTCHA. For more information, see "How mollom works".

I've been testing the module the past few months and it works pretty well. Very few spam comments got past the filters while at the same time most of the regular visitors never had to fill out an ugly CAPTCHA. They also have some interesting goals, not limited to just blocking spam, but checking the quality of content as well.

And most important, it has fancy graphs ;)

It will be interesting to see how this evolves and if their infrastructure can handle all the new users. Earlier today the website itself was unreachable because of a major power outage in the LCL datacenter where the website is hosted. Luckily the web service itself is distributed across multiple locations and as far as I can tell it just kept working. Leave it to Murphy to test your fail-over on the very first day..

Pages

Subscribe to Bart Jansens RSS Subscribe to Bart Jansens - All comments