Caching in HTTP/1.1

Throughout my work experience with web applications HTTP caching was a must have element of the request-response system I was building. HTTP caching is a huge topic (e.g., this is the RFC for HTTP/1.1 – https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html), and in this post I want to discuss my experience with the cache directives and specifically the If-Match or If-None-Match ones.

The directives are unidirectional in HTTP/1.1. Let’s suppose we have a request response cycle. There is a simple web frontend rendered in a browser and the user has clicked a button. The first ever request will go to some server which will server an answer. Unidirectional means that even if the request contains some directive, that doesn’t provide any guarantee for the existence of such directive in the response. They may exist as is and they may not exist. Some common request directives include: no-cache, no-store, max-stale, min-fresh, no-transform, only-if-cached.  Some response ones are: public, must-revalidate, proxy-revalidate and others.

If we zoom into the If-Match directive we notice the close relation with the entity tags (or ETags). An ETag represents a resource and is a digest of information. Using such digest helps the client decide whether the resource returned in the response is stale or fresh. 

HTTP is a stateless protocol. ETags represent a state though. If a response tells us a story where a resource is partially or fully stale or fresh then this means there is a background story. This background story is the state of the resource we chose to include in the request response cycle by using entity tags.

One way to use ETags is by including one in the response headers. Then subsequent HTTP requests will use this information with the If-None-Match header to determine whether the data is stale. The server then will compute an information digest for the requested resource, an ETag and will respond with 304 if the resource is fresh. Alternatively, it will respond with a new tag.

Some web frameworks like ruby on rails abstract ETags away from the programmer. The essence is whether the resource is fresh or not. So Rails has a method called

stale?

which uses ETags behind the scenes. That method is used usually alongside the 

fresh_when

method as described in the docs https://guides.rubyonrails.org/caching_with_rails.html.

Example usage:

curl -i http://localhost:3000/posts/1 

Content-Length: 667 
Etag: "123123122132" 
Last-Modified: Wed, 12 Nov 2014 15:44:46 GMT 

And then:

curl -i -H 'If-None-Match: "123123122132"' http://localhost:3000/posts/1 

HTTP/1.1 304 Not Modified 
Etag: "123123122132" 
Last-Modified: Wed, 12 Nov 2014 15:44:46 GMT 

The advantage of the above is that we gain the time that would normally be spent rendering the response and the response body is empty.

Finally, there are some interesting challenges that come with using such a cache mechanism. First, the content is not cached between users. Building a caching mechanism to care for such case would need a more custom solution. Second, user specific content (so, most of it in big applications) is not handled correctly. Every user will see a different timeline, has different followers and different connection graphs. Using the above simple cache mechanism will always return stale data in those cases so we will avoid it. To solve the problem of caching a timeline we could work outside the application layer. Each user can have their first x items of their timeline cached in an edge Redis server for instance.