Experimenting with Varnish

As a little experiment, my blog is now running behind a Varnish http accelerator. While squid can be used for this purpose as well, its primary function is a forward proxy. Varnish on the other hand is designed to be a very fast caching reverse proxy server.

I initially tried using the debian etch package of Varnish, but it is a bit outdated. So I ended up building my own package from source. It took a bit of research to configure everything correctly but all in all the configuration was pretty simple and it is easy to read. After a few weeks, I still know what it all means.

There are a few gotchas:

  • There isn't much documentation.
  • It will refuse to start if /tmp is mounted "noexec".
  • Turn off KeepAlives on the backend Apache servers, otherwise you might experience delays when requesting pages.
  • The VCL "pass" action does not support POST requests. You might come across older configuration examples on the internet that use the "pass" action but that simply doesn't work. You need to use the "pipe" action.
  • Varnish will only respect the max-age parameter of Cache-Control header by default. Any other values such as those indicating whether or not an object may be cached, will be ignored by default. You need to include some code in your VCL configuration file to handle these headers correctly (see example below).
  • If you need a large cache (multiple GBs), you need a 64bit system.
  • There is no persistent cache. Every time you restart varnish, the cache gets wiped. So don't restart unless you really need to.
  • During load testing, new requests would sometimes hang and ip_conntrack: table full: packet dropped was logged. This wasn't a problem with Varnish but the firewall which couldn't handle that many connections. Increasing the value in net/ipv4/netfilter/ip_conntrack_max fixed that.

All in all, Varnish seems to work pretty well. It can still be improved in many areas but thats to be expected for a v1 release.
Time for me to start testing this on a larger websites.

For future reference, below is the configuration file I've been using.

  1. # configuration file for andromeda
  2.  
  3. backend default {
  4. set backend.host = "208.68.209.225";
  5. set backend.port = "80";
  6. }
  7.  
  8. #
  9. # Block unwanted clients
  10. #
  11. acl blacklisted {
  12. "192.168.100.100";
  13. }
  14.  
  15. #
  16. # handling of request that are received from clients.
  17. # decide whether or not to lookup data in the cache first.
  18. #
  19. sub vcl_recv {
  20. # reject malicious requests
  21. call vcl_recv_sentry;
  22.  
  23. if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") {
  24. # Non-RFC2616 or CONNECT which is weird.
  25. pipe;
  26. }
  27. if (req.http.Expect) {
  28. # Expect is just too hard at present.
  29. pipe;
  30. }
  31. if (req.request != "GET" && req.request != "HEAD") {
  32. # we only deal with GET and HEAD
  33. # note that we need to use "pipe" instead of "pass" here. Pass isn't supported for
  34. # POST requests
  35. pipe;
  36. }
  37. if (req.http.Authorization) {
  38. # don't cache pages that are protected by basic authentication
  39. pass;
  40. }
  41. if (req.http.Accept-Encoding) {
  42. # Handle compression correctly. Varnish treats headers literally, not
  43. # semantically. So it is very well possible that there are cache misses
  44. # because the headers sent by different browsers aren't the same.
  45. # For more info: http:// varnish.projects.linpro.no/wiki/FAQ/Compression
  46. if (req.http.Accept-Encoding ~ "gzip") {
  47. # if the browser supports it, we'll use gzip
  48. set req.http.Accept-Encoding = "gzip";
  49. } elsif (req.http.Accept-Encoding ~ "deflate") {
  50. # next, try deflate if it is supported
  51. set req.http.Accept-Encoding = "deflate";
  52. } else {
  53. # unknown algorithm. Probably junk, remove it
  54. remove req.http.Accept-Encoding;
  55. }
  56. }
  57. if (req.url ~ "\.(jpg|jpeg|gif|png|css|js)$") {
  58. # allow caching of all images and css/javascript files
  59. lookup;
  60. }
  61. if (req.url ~ "^/files") {
  62. # anything in drupals files directory is static and may be cached
  63. lookup;
  64. }
  65. if (req.http.Cookie) {
  66. # Not cacheable by default
  67. # TODO: do we even need this? Can't we simply make sure dynamic
  68. # content never exists in the cache?
  69. pass;
  70. }
  71.  
  72. # every thing else we try to look up in the cache first
  73. lookup;
  74. }
  75.  
  76. #
  77. # Called when entering pipe mode
  78. #
  79. #sub vcl_pipe {
  80. # pipe;
  81. #}
  82.  
  83. #
  84. # Called when entering pass mode
  85. #
  86. #sub vcl_pass {
  87. # pass;
  88. #}
  89.  
  90. #
  91. # Called when entering an object into the cache
  92. #
  93. #sub vcl_hash {
  94. # set req.hash += req.url;
  95. # if (req.http.host) {
  96. # set req.hash += req.http.host;
  97. # } else {
  98. # set req.hash += req.http.host;
  99. # }
  100. # hash;
  101. #}
  102.  
  103. #
  104. # Called when the requested object was found in the cache
  105. #
  106. sub vcl_hit {
  107. if (!obj.cacheable) {
  108. # A response is considered cacheable if all of the following are true:
  109. # - it is valid
  110. # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410
  111. # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account.
  112. pass;
  113. }
  114. deliver;
  115. }
  116.  
  117. #
  118. # Called when the requested object was not found in the cache
  119. #
  120. #sub vcl_miss {
  121. # fetch;
  122. #}
  123.  
  124. #
  125. # Called when the requested object has been retrieved from the
  126. # backend, or the request to the backend has failed
  127. #
  128. sub vcl_fetch {
  129. if (!obj.valid) {
  130. # don't cache invalid responses.
  131. error;
  132. }
  133. if (!obj.cacheable) {
  134. # A response is considered cacheable if all of the following are true:
  135. # - it is valid
  136. # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410
  137. # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account.
  138. #
  139. # If a response is not cachable, simply pass it along to the client.
  140. pass;
  141. }
  142. if (obj.http.Set-Cookie) {
  143. # don't cache content that sets cookies (eg dynamic PHP pages).
  144. pass;
  145. }
  146. if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") {
  147. # varnish by default ignores Pragma and Cache-Control headers. It
  148. # only looks at the "max-age=" value in the Cache-Control header to
  149. # determine the TTL. So we need this rule so that the cache respects
  150. # the wishes of the backend application.
  151. pass;
  152. }
  153. if (obj.ttl < 180s) {
  154. # force minimum ttl of 180 seconds for all cached objects.
  155. set obj.ttl = 180s;
  156. }
  157. insert;
  158. }
  159.  
  160. #
  161. # Called before a cached object is delivered to the client
  162. #
  163. #sub vcl_deliver {
  164. # deliver;
  165. #}
  166.  
  167. #
  168. # Called when an object nears its expiry time
  169. #
  170. #sub vcl_timeout {
  171. # discard;
  172. #}
  173.  
  174. #
  175. # Called when an object is about to be discarded
  176. #
  177. #sub vcl_discard {
  178. # discard;
  179. #}
  180.  
  181. #
  182. # Custom routine to detect malicious requests and reject them (called by vcl_recv).
  183. #
  184. sub vcl_recv_sentry {
  185. if (client.ip ~ blacklisted) {
  186. error 503 "Your IP has been blocked.";
  187. }
  188. }

I had to disable varnish again. The front page showed up blank for some users. Data was getting from the backend to varnish, which passed along all the headers but left out the entire content. Oops.

Ok, were back in business :)
I switched to the latest svn "trunk" version and it works now. It also introduced a couple interesting new features I'm going to try tomorrow.

At the end of your vcl_recv you have a lookup, so you don't need any of those if statements that cache images, they will automatically be cached by the last lookup.

I'm not so sure about that. The line before passes all requests that contain a cookie header. The problem I had is that drupal always sends a cookie to the client (also for anonymous users). The client sends that cookie in all further requests, effectively bypassing the cache. Thats why i forced the lookup for images and files.

I don't think the cookie check even makes sense. In fact, I removed it and so far I haven't seen any problems. Dynamic content is never inserted into the cache anyway.

Bart,

Just wanted to thank you for documenting your experiments with Varnish. You are completely right that there isn't much documentation available, so it's very difficult to get started.

--Joao

Varnish has been running stable for a couple months now, but there have been significant changes to my VCL code. I wrote a new post about it and included my new code here: Varnish: experiences so far.

very usefull thanks

Thanks for sharing - the KeepAlive Off info helped me.

Why do you say: "Turn off KeepAlives on the backend Apache servers, otherwise you might experience delays when requesting pages."?

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.