As a little experiment, my blog is now running behind a Varnish http accelerator. While squid can be used for this purpose as well, its primary function is a forward proxy. Varnish on the other hand is designed to be a very fast caching reverse proxy server.
I initially tried using the debian etch package of Varnish, but it is a bit outdated. So I ended up building my own package from source. It took a bit of research to configure everything correctly but all in all the configuration was pretty simple and it is easy to read. After a few weeks, I still know what it all means.
There are a few gotchas:
- There isn't much documentation.
- It will refuse to start if /tmp is mounted "noexec".
- Turn off KeepAlives on the backend Apache servers, otherwise you might experience delays when requesting pages.
- The VCL "pass" action does not support POST requests. You might come across older configuration examples on the internet that use the "pass" action but that simply doesn't work. You need to use the "pipe" action.
- Varnish will only respect the max-age parameter of Cache-Control header by default. Any other values such as those indicating whether or not an object may be cached, will be ignored by default. You need to include some code in your VCL configuration file to handle these headers correctly (see example below).
- If you need a large cache (multiple GBs), you need a 64bit system.
- There is no persistent cache. Every time you restart varnish, the cache gets wiped. So don't restart unless you really need to.
- During load testing, new requests would sometimes hang and ip_conntrack: table full: packet dropped was logged. This wasn't a problem with Varnish but the firewall which couldn't handle that many connections. Increasing the value in net/ipv4/netfilter/ip_conntrack_max fixed that.
All in all, Varnish seems to work pretty well. It can still be improved in many areas but thats to be expected for a v1 release.
Time for me to start testing this on a larger websites.
For future reference, below is the configuration file I've been using.
# configuration file for andromeda backend default { set backend.host = "208.68.209.225"; set backend.port = "80"; } # # Block unwanted clients # acl blacklisted { "192.168.100.100"; } # # handling of request that are received from clients. # decide whether or not to lookup data in the cache first. # sub vcl_recv { # reject malicious requests call vcl_recv_sentry; if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") { # Non-RFC2616 or CONNECT which is weird. pipe; } if (req.http.Expect) { # Expect is just too hard at present. pipe; } if (req.request != "GET" && req.request != "HEAD") { # we only deal with GET and HEAD # note that we need to use "pipe" instead of "pass" here. Pass isn't supported for # POST requests pipe; } if (req.http.Authorization) { # don't cache pages that are protected by basic authentication pass; } if (req.http.Accept-Encoding) { # Handle compression correctly. Varnish treats headers literally, not # semantically. So it is very well possible that there are cache misses # because the headers sent by different browsers aren't the same. # For more info: http:// varnish.projects.linpro.no/wiki/FAQ/Compression if (req.http.Accept-Encoding ~ "gzip") { # if the browser supports it, we'll use gzip set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { # next, try deflate if it is supported set req.http.Accept-Encoding = "deflate"; } else { # unknown algorithm. Probably junk, remove it remove req.http.Accept-Encoding; } } if (req.url ~ "\.(jpg|jpeg|gif|png|css|js)$") { # allow caching of all images and css/javascript files lookup; } if (req.url ~ "^/files") { # anything in drupals files directory is static and may be cached lookup; } if (req.http.Cookie) { # Not cacheable by default # TODO: do we even need this? Can't we simply make sure dynamic # content never exists in the cache? pass; } # every thing else we try to look up in the cache first lookup; } # # Called when entering pipe mode # #sub vcl_pipe { # pipe; #} # # Called when entering pass mode # #sub vcl_pass { # pass; #} # # Called when entering an object into the cache # #sub vcl_hash { # set req.hash += req.url; # if (req.http.host) { # set req.hash += req.http.host; # } else { # set req.hash += req.http.host; # } # hash; #} # # Called when the requested object was found in the cache # sub vcl_hit { if (!obj.cacheable) { # A response is considered cacheable if all of the following are true: # - it is valid # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410 # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account. pass; } deliver; } # # Called when the requested object was not found in the cache # #sub vcl_miss { # fetch; #} # # Called when the requested object has been retrieved from the # backend, or the request to the backend has failed # sub vcl_fetch { if (!obj.valid) { # don't cache invalid responses. error; } if (!obj.cacheable) { # A response is considered cacheable if all of the following are true: # - it is valid # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410 # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account. # # If a response is not cachable, simply pass it along to the client. pass; } if (obj.http.Set-Cookie) { # don't cache content that sets cookies (eg dynamic PHP pages). pass; } if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") { # varnish by default ignores Pragma and Cache-Control headers. It # only looks at the "max-age=" value in the Cache-Control header to # determine the TTL. So we need this rule so that the cache respects # the wishes of the backend application. pass; } if (obj.ttl < 180s) { # force minimum ttl of 180 seconds for all cached objects. set obj.ttl = 180s; } insert; } # # Called before a cached object is delivered to the client # #sub vcl_deliver { # deliver; #} # # Called when an object nears its expiry time # #sub vcl_timeout { # discard; #} # # Called when an object is about to be discarded # #sub vcl_discard { # discard; #} # # Custom routine to detect malicious requests and reject them (called by vcl_recv). # sub vcl_recv_sentry { if (client.ip ~ blacklisted) { error 503 "Your IP has been blocked."; } }





I had to disable varnish again. The front page showed up blank for some users. Data was getting from the backend to varnish, which passed along all the headers but left out the entire content. Oops.
Ok, were back in business :)
I switched to the latest svn "trunk" version and it works now. It also introduced a couple interesting new features I'm going to try tomorrow.
At the end of your vcl_recv you have a lookup, so you don't need any of those if statements that cache images, they will automatically be cached by the last lookup.
I'm not so sure about that. The line before passes all requests that contain a cookie header. The problem I had is that drupal always sends a cookie to the client (also for anonymous users). The client sends that cookie in all further requests, effectively bypassing the cache. Thats why i forced the lookup for images and files.
I don't think the cookie check even makes sense. In fact, I removed it and so far I haven't seen any problems. Dynamic content is never inserted into the cache anyway.
Bart,
Just wanted to thank you for documenting your experiments with Varnish. You are completely right that there isn't much documentation available, so it's very difficult to get started.
--Joao
Varnish has been running stable for a couple months now, but there have been significant changes to my VCL code. I wrote a new post about it and included my new code here: Varnish: experiences so far.
very usefull thanks
Thanks for sharing - the KeepAlive Off info helped me.
Why do you say: "Turn off KeepAlives on the backend Apache servers, otherwise you might experience delays when requesting pages."?
Post new comment