Using the varnish HTTP accelerator - Experiences so far

A couple weeks ago, I started experimenting with varnish as a reverse proxy server. My setup has changed a lot since then, I like to think it improved. So here are my experiences so far.

Use -trunk (or version 2.0), not the 1.1.2 release

Initially I was using the 1.1.2 release, but I ran into a couple problems. The worst one was the white screen of death for users behind a proxy server, such as squid. There was a problem with the way HTTP/1.0 requests were handled, resulting in blank pages being sent to the client.
The solution was to upgrade to -trunk, it contains several bug fixes and interesting new features and is pretty stable (at least at the moment). I usually don't like using the development version, but at the moment it seems to be the most reliable, and was also recommended by several people.
Version 2.0 will be released soon, use that one when its available.

One thing to keep in mind though is that there is a problem with subroutines. This bug has been reported, so for now I'd recommend not to use them.

Don't duplicate the default VCL functions

In VCL you can have multiple functions with the same name, they will simply be concatenated. This means that you don't have to duplicate the entire default functions if you just want to add a single line. This can simplify your VCL code a lot, and you still benefit from improvements made in the default functions.
For an example, see my new VCL below. I'm using this in all functions except for vcl_recv.

Experimenting with Varnish

As a little experiment, my blog is now running behind a Varnish http accelerator. While squid can be used for this purpose as well, its primary function is a forward proxy. Varnish on the other hand is designed to be a very fast caching reverse proxy server.

I initially tried using the debian etch package of Varnish, but it is a bit outdated. So I ended up building my own package from source. It took a bit of research to configure everything correctly but all in all the configuration was pretty simple and it is easy to read. After a few weeks, I still know what it all means.

There are a few gotchas:

  • There isn't much documentation.
  • It will refuse to start if /tmp is mounted "noexec".
  • Turn off KeepAlives on the backend Apache servers, otherwise you might experience delays when requesting pages.
  • The VCL "pass" action does not support POST requests. You might come across older configuration examples on the internet that use the "pass" action but that simply doesn't work. You need to use the "pipe" action.
  • Varnish will only respect the max-age parameter of Cache-Control header by default. Any other values such as those indicating whether or not an object may be cached, will be ignored by default. You need to include some code in your VCL configuration file to handle these headers correctly (see example below).
  • If you need a large cache (multiple GBs), you need a 64bit system.
  • There is no persistent cache. Every time you restart varnish, the cache gets wiped. So don't restart unless you really need to.
  • During load testing, new requests would sometimes hang and ip_conntrack: table full: packet dropped was logged. This wasn't a problem with Varnish but the firewall which couldn't handle that many connections. Increasing the value in net/ipv4/netfilter/ip_conntrack_max fixed that.

All in all, Varnish seems to work pretty well. It can still be improved in many areas but thats to be expected for a v1 release.
Time for me to start testing this on a larger websites.

For future reference, below is the configuration file I've been using.