HTTP Accelerator

Using the varnish HTTP accelerator - Experiences so far

A couple weeks ago, I started experimenting with varnish as a reverse proxy server. My setup has changed a lot since then, I like to think it improved. So here are my experiences so far.

Use -trunk (or version 2.0), not the 1.1.2 release

Initially I was using the 1.1.2 release, but I ran into a couple problems. The worst one was the white screen of death for users behind a proxy server, such as squid. There was a problem with the way HTTP/1.0 requests were handled, resulting in blank pages being sent to the client.
The solution was to upgrade to -trunk, it contains several bug fixes and interesting new features and is pretty stable (at least at the moment). I usually don't like using the development version, but at the moment it seems to be the most reliable, and was also recommended by several people.
Version 2.0 will be released soon, use that one when its available.

One thing to keep in mind though is that there is a problem with subroutines. This bug has been reported, so for now I'd recommend not to use them.

Don't duplicate the default VCL functions

In VCL you can have multiple functions with the same name, they will simply be concatenated. This means that you don't have to duplicate the entire default functions if you just want to add a single line. This can simplify your VCL code a lot, and you still benefit from improvements made in the default functions.
For an example, see my new VCL below. I'm using this in all functions except for vcl_recv.

Again, for future reference, a copy of my new VCL code can be found below. As you can see, it has some new "features" but is simpler than the previous one. Note that the backend syntax has changed a little in trunk.

  1. # This is the vcl.conf file for andromeda.motd.be
  2.  
  3. backend default {
  4. .host = "208.68.209.225";
  5. .port = "80";
  6. }
  7.  
  8. #
  9. # handling of request that are received from clients.
  10. # decide whether or not to lookup data in the cache first.
  11. # if we have multiple backends, we could specify them here but for now there is just the default
  12. #
  13. sub vcl_recv {
  14. # If the client sent an X-Forwarded-For header, remove it. It cannot be trusted.
  15. unset req.http.X-Forwarded-For;
  16. # Note that we don't need to add the client ip to the X-Forwarded-For header, varnish will do that for us
  17.  
  18. if (req.http.Accept-Encoding) {
  19. # Handle compression correctly. Varnish treats headers literally, not
  20. # semantically. So it is very well possible that there are cache misses
  21. # because the headers sent by different browsers aren't the same.
  22. # @see: http:// varnish.projects.linpro.no/wiki/FAQ/Compression
  23. if (req.http.Accept-Encoding ~ "gzip") {
  24. # if the browser supports it, we'll use gzip
  25. set req.http.Accept-Encoding = "gzip";
  26. } elsif (req.http.Accept-Encoding ~ "deflate") {
  27. # next, try deflate if it is supported
  28. set req.http.Accept-Encoding = "deflate";
  29. } else {
  30. # unknown algorithm. Probably junk, remove it
  31. unset req.http.Accept-Encoding;
  32. }
  33. }
  34.  
  35. # If we get a request for a page that has just been requested by another thread and
  36. # is still being fetched from the backend, allow serving a cached page so long as it hasn't
  37. # expired more than 30 seconds ago (prevents thread pileup on cache refreshes).
  38. # Note that this only works if obj.grace is also set:
  39. # "If no in-ttl object was found AND we have a graced object AND it is also
  40. # graced by req.grace AND it is being fetched: serve the graced object."
  41. set req.grace = 30s;
  42.  
  43. # the default vcl_recv will only look up objects in the cache if there is no cookie
  44. # present in the request. Therefor, we remove the cookie header from all requests for
  45. # files which we know to be static.
  46. #
  47. # These are drupal-specific:
  48. # - everything in most of the standard drupal directories.
  49. # - txt and ico files (most important: robots.txt and favicon.ico)
  50. if (req.url ~ "^/(files|misc|sites|themes|modules)/" || req.url ~ "\.(txt|ico)$") {
  51. unset req.http.Cookie;
  52. }
  53.  
  54. # No final action. Continues in the default vcl_recv which will only look up
  55. # items from the cache for GET and HEAD requests without cookies.
  56. }
  57.  
  58. #
  59. # Called when entering pipe mode
  60. #
  61. sub vcl_pipe {
  62. # If we don't set the Connection: close header, any following
  63. # requests from the client will also be piped through and
  64. # left untouched by varnish. We don't want that.
  65. set req.http.connection = "close";
  66.  
  67. # Note: no "pipe" action here - we'll fall back to the default
  68. # pipe method so that when any changes are made there, we
  69. # still inherit them.
  70. }
  71.  
  72. #
  73. # Called when the requested object has been retrieved from the
  74. # backend, or the request to the backend has failed
  75. #
  76. sub vcl_fetch {
  77. if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "(no-cache|no-store|private)") {
  78. # varnish by default ignores Pragma and Cache-Control headers. It
  79. # only looks at the "max-age=" value in the Cache-Control header to
  80. # determine the TTL. So we need this rule so that the cache respects
  81. # the wishes of the backend application.
  82. pass;
  83. }
  84.  
  85. if (obj.ttl < 180s) {
  86. # force minimum ttl of 3 minutes for all cached objects.
  87. set obj.ttl = 180s;
  88. }
  89.  
  90. # set the grace period for this object to a maximum of 30s. This is how
  91. # long after its expire time, it is still allowed to be served from cache if
  92. # directed to do so by vcl_recv.
  93. set obj.grace = 30s;
  94.  
  95. if (req.http.Authorization && !obj.http.Cache-Control ~ "public") {
  96. # don't allow caching pages that are protected by basic authentication
  97. # unless when they explicitly set the cache-control to public.
  98. pass;
  99. }
  100.  
  101. # Note: no final action - continue in the default vcl_fetch
  102. }

Experimenting with Varnish

As a little experiment, my blog is now running behind a Varnish http accelerator. While squid can be used for this purpose as well, its primary function is a forward proxy. Varnish on the other hand is designed to be a very fast caching reverse proxy server.

I initially tried using the debian etch package of Varnish, but it is a bit outdated. So I ended up building my own package from source. It took a bit of research to configure everything correctly but all in all the configuration was pretty simple and it is easy to read. After a few weeks, I still know what it all means.

There are a few gotchas:

  • There isn't much documentation.
  • It will refuse to start if /tmp is mounted "noexec".
  • Turn off KeepAlives on the backend Apache servers, otherwise you might experience delays when requesting pages.
  • The VCL "pass" action does not support POST requests. You might come across older configuration examples on the internet that use the "pass" action but that simply doesn't work. You need to use the "pipe" action.
  • Varnish will only respect the max-age parameter of Cache-Control header by default. Any other values such as those indicating whether or not an object may be cached, will be ignored by default. You need to include some code in your VCL configuration file to handle these headers correctly (see example below).
  • If you need a large cache (multiple GBs), you need a 64bit system.
  • There is no persistent cache. Every time you restart varnish, the cache gets wiped. So don't restart unless you really need to.
  • During load testing, new requests would sometimes hang and ip_conntrack: table full: packet dropped was logged. This wasn't a problem with Varnish but the firewall which couldn't handle that many connections. Increasing the value in net/ipv4/netfilter/ip_conntrack_max fixed that.

All in all, Varnish seems to work pretty well. It can still be improved in many areas but thats to be expected for a v1 release.
Time for me to start testing this on a larger websites.

For future reference, below is the configuration file I've been using.

  1. # configuration file for andromeda
  2.  
  3. backend default {
  4. set backend.host = "208.68.209.225";
  5. set backend.port = "80";
  6. }
  7.  
  8. #
  9. # Block unwanted clients
  10. #
  11. acl blacklisted {
  12. "192.168.100.100";
  13. }
  14.  
  15. #
  16. # handling of request that are received from clients.
  17. # decide whether or not to lookup data in the cache first.
  18. #
  19. sub vcl_recv {
  20. # reject malicious requests
  21. call vcl_recv_sentry;
  22.  
  23. if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") {
  24. # Non-RFC2616 or CONNECT which is weird.
  25. pipe;
  26. }
  27. if (req.http.Expect) {
  28. # Expect is just too hard at present.
  29. pipe;
  30. }
  31. if (req.request != "GET" && req.request != "HEAD") {
  32. # we only deal with GET and HEAD
  33. # note that we need to use "pipe" instead of "pass" here. Pass isn't supported for
  34. # POST requests
  35. pipe;
  36. }
  37. if (req.http.Authorization) {
  38. # don't cache pages that are protected by basic authentication
  39. pass;
  40. }
  41. if (req.http.Accept-Encoding) {
  42. # Handle compression correctly. Varnish treats headers literally, not
  43. # semantically. So it is very well possible that there are cache misses
  44. # because the headers sent by different browsers aren't the same.
  45. # For more info: http:// varnish.projects.linpro.no/wiki/FAQ/Compression
  46. if (req.http.Accept-Encoding ~ "gzip") {
  47. # if the browser supports it, we'll use gzip
  48. set req.http.Accept-Encoding = "gzip";
  49. } elsif (req.http.Accept-Encoding ~ "deflate") {
  50. # next, try deflate if it is supported
  51. set req.http.Accept-Encoding = "deflate";
  52. } else {
  53. # unknown algorithm. Probably junk, remove it
  54. remove req.http.Accept-Encoding;
  55. }
  56. }
  57. if (req.url ~ "\.(jpg|jpeg|gif|png|css|js)$") {
  58. # allow caching of all images and css/javascript files
  59. lookup;
  60. }
  61. if (req.url ~ "^/files") {
  62. # anything in drupals files directory is static and may be cached
  63. lookup;
  64. }
  65. if (req.http.Cookie) {
  66. # Not cacheable by default
  67. # TODO: do we even need this? Can't we simply make sure dynamic
  68. # content never exists in the cache?
  69. pass;
  70. }
  71.  
  72. # every thing else we try to look up in the cache first
  73. lookup;
  74. }
  75.  
  76. #
  77. # Called when entering pipe mode
  78. #
  79. #sub vcl_pipe {
  80. # pipe;
  81. #}
  82.  
  83. #
  84. # Called when entering pass mode
  85. #
  86. #sub vcl_pass {
  87. # pass;
  88. #}
  89.  
  90. #
  91. # Called when entering an object into the cache
  92. #
  93. #sub vcl_hash {
  94. # set req.hash += req.url;
  95. # if (req.http.host) {
  96. # set req.hash += req.http.host;
  97. # } else {
  98. # set req.hash += req.http.host;
  99. # }
  100. # hash;
  101. #}
  102.  
  103. #
  104. # Called when the requested object was found in the cache
  105. #
  106. sub vcl_hit {
  107. if (!obj.cacheable) {
  108. # A response is considered cacheable if all of the following are true:
  109. # - it is valid
  110. # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410
  111. # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account.
  112. pass;
  113. }
  114. deliver;
  115. }
  116.  
  117. #
  118. # Called when the requested object was not found in the cache
  119. #
  120. #sub vcl_miss {
  121. # fetch;
  122. #}
  123.  
  124. #
  125. # Called when the requested object has been retrieved from the
  126. # backend, or the request to the backend has failed
  127. #
  128. sub vcl_fetch {
  129. if (!obj.valid) {
  130. # don't cache invalid responses.
  131. error;
  132. }
  133. if (!obj.cacheable) {
  134. # A response is considered cacheable if all of the following are true:
  135. # - it is valid
  136. # - HTTP status code is 200, 203, 300, 301, 302, 404 or 410
  137. # - it has a non-zero time-to-live when Expires and Cache-Control headers are taken into account.
  138. #
  139. # If a response is not cachable, simply pass it along to the client.
  140. pass;
  141. }
  142. if (obj.http.Set-Cookie) {
  143. # don't cache content that sets cookies (eg dynamic PHP pages).
  144. pass;
  145. }
  146. if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") {
  147. # varnish by default ignores Pragma and Cache-Control headers. It
  148. # only looks at the "max-age=" value in the Cache-Control header to
  149. # determine the TTL. So we need this rule so that the cache respects
  150. # the wishes of the backend application.
  151. pass;
  152. }
  153. if (obj.ttl < 180s) {
  154. # force minimum ttl of 180 seconds for all cached objects.
  155. set obj.ttl = 180s;
  156. }
  157. insert;
  158. }
  159.  
  160. #
  161. # Called before a cached object is delivered to the client
  162. #
  163. #sub vcl_deliver {
  164. # deliver;
  165. #}
  166.  
  167. #
  168. # Called when an object nears its expiry time
  169. #
  170. #sub vcl_timeout {
  171. # discard;
  172. #}
  173.  
  174. #
  175. # Called when an object is about to be discarded
  176. #
  177. #sub vcl_discard {
  178. # discard;
  179. #}
  180.  
  181. #
  182. # Custom routine to detect malicious requests and reject them (called by vcl_recv).
  183. #
  184. sub vcl_recv_sentry {
  185. if (client.ip ~ blacklisted) {
  186. error 503 "Your IP has been blocked.";
  187. }
  188. }
Subscribe to RSS - HTTP Accelerator