Web Caching Explained: Client and Server Side

The concept of web caching has appeared the moment clients started to experience slow pages and developers found no better way than making the whole page or parts of the page logic load from a cache next time it's requested. There are two types of caching solutions available for a web application:

  1. client side caching done by saving resources requested (pages, files) to your browser cache and retrieve them directly from that location on next identical request, unless they changed on server.
  2. server side caching done by saving in RAM (usually) logics that take part of a resource (eg: costly query results) and retrieve them from there on next identical call, unless source data changed on server.

When should we use client side caching?

Client side caching simply means the act of storing an entire resource (html, image, js/css file) into browser cache to be returned on next request unless resource changed. This act is governed by a set of rules standardized as part of HTTP IETF specifications both servers (senders of resource) and clients (requester of resource) must abide to. The beautiful part is that all web servers and all browsers are thus guaranteed to support it, regardless of platform.

Despite its immense advantages, browser caching is seldom employed by PHP applications with the notable notable exception when site's static resources (images, js/css files) are handled by web server directly. Web server must by definition support a cache communication with client browser, but your code doesn't unless you make a platform for it.

How does client side caching actually work?

As one can notice while looking at specifications above, the way it works in absolute detail is outside the boundaries of this article, but the main idea is this:

  1. CLIENT makes a first request to SERVER for a resource (eg: page)
  2. SERVER answers with HTTP status 200 OK along with body of that resource and ETag or Last-Modified response headers. These headers uniquely identify the version of requested resource.
  3. CLIENT caches SERVER's response, associating resource URL with response body received and any of those headers above.
  4. CLIENT makes a second request to SERVER for same resource. It will include in request the header values it previously got as If-None-Match/If-Match (if previous response came with an ETag header) or If-Modified-Since/If-Unmodified-Since (if previous response came with a Last-Modified header) headers.
  5. SERVER checks if ETag or Last-Modified calculated for that resource match their equivalents sent by CLIENT:

Of course, the further fine tunings HTTP caching gives are a lot more complicated, but above are enough to understand the fundamentals: HTTP caching is a bidirectional discussion between client and server whose subject is whether or not client to display its cached version of requested resource or query it from server.

Where do proxies come into play?

As far as HTTP caching is concerned, proxies function as a shared cache, useful when two or more applications share one or more resources or for static resources (eg: images) that need to be served from a CDN closest to requester. Not only they increase page speed, they also provide other benefits such as shielding your application from outside world, protecting against DOS attacks.

This means all communicational points mentioned in previous chapter add an extra step, thus communication will be CLIENT-PROXY-SERVER instead of CLIENT-SERVER. Unless your site lies in a CDN, this means response times will be slower (because requests first need to reach PROXY, then latter to redirect REQUEST to SERVER) and, unlike browsers, PROXIES come with no transparence guarantees. This means whatever header CLIENT or SERVER send may not be forwarded by PROXY, including ETag/Last-Modified that are essential for HTTP caching. CloudFlare, for example, the most popular proxy to date, simply doesn't forward ETag headers, so one is forced to use the much weaker Last-Modified (which is hard to be made unique).

For this reason, our recommendation is to use proxies to serve static resources only, while your SERVER should be immediately reachable by CLIENT without adding an extra step so that HTTP caching abilities are guaranteed.

When NOT to use it?

Generally, HTTP caching is a performance requirement for most applications and most resources inside (be it static or dynamic). There are some cases, however, when it shouldn't be used: resources expected to randomly change on next load (eg: pages where users can submit and viewcomments) or RESTful applications that by definition should be stateless.

When should we use server side caching?

Server side caching simply means the act of storing data that takes part of a resource in RAM using a dedicated server (eg: Redis), to be returned on next query unless changed. Unlike client side caching, this act is governed by no official rules (so it is not platform independent), but some functional rules assuming your solution uses a nosql document database can be put forward.

Despite its freshness/staleness or document/relational database inconveniences described below, browser caching is extremely common among PHP applications, especially those that work with monolithic slow frameworks (eg: Laravel, Symfony, Zend).

How does server side caching actually work?

This type of caching requires a dedicated NoSQL server (eg: Redis), or at least module (eg: APC), installed on back-end. Every information inside is typically stored in a key-value store also known as document-based database, so that everything is retrievable via a KEY whose value is the DATA itself (typically aggregates extracted from an SQL database). Thanks to this design, unlike SQL databases, NoSQL based ones need to maintain no relationship between entries, so that everything inside is atomic and written to RAM for high performance. In order to prevent staleness as much as possible, all KEYs must be set to EXPIRE.

How do document databases compete against relational databases?

There is no point going into depth on the differences between document vs. relational databases. This is already covered in great detail by specialized articles over the web. In my opinion, server side caching is overused today, mostly to alleviate bad decisions such as slow frameworks underneath or programmers' incompetence to produce optimal code. For traffic as well as resource intensive applications, however, it comes as a natural requirement that brings loads of advantages.

Above disadvantages explain why responsible developers should avoid adding areas of complexity and constant maintenance and only employ no-sql only for operations where no other solution but caching is possible even though code/hardware have been made as optimal as they can be.

When NOT to use it?

Many times caching decision isn't based on a fundamental limitation in their project (too much data requires being read/written, to many requests hammer the application), but on other aspects that are fully controllable: using a slow web framework, having low skilled developers or simply lack of a vision. There are however cases when, no matter the effort to keep code and queries optimized, there is no other objective solution for the problem given than use NoSQL caching.

What are the main problems of caching?

Regardless of solution chosen, any item that exists in cache has two fundamental problems:

A good caching policy requires keeping freshness to a maximum and staleness to a minimum. This requires making a case study of each resource's probability of change then mold policy to fit that individual case. Even with best of efforts, some resources being cached are bound to remain stale for a given time, so your site will display outdated information for that time period. To prevent a resource becoming forever stale, every resource in cache must be set to expire, especially when server caching is used! Once expired, cached resource will either be automatically cleared or no longer used on next request, which forces caller to refresh its cached version.

How to choose expiration time for server caching?

Value of expiration, expressed in seconds, must be chosen according to this consideration: an expiration of 10 minutes means that resource has maximum 10 minutes to be considered fresh (though it may have become stale meanwhile) and maximum 10 minutes to be actually stale (resource actually changed immediately after it was set). If no expiration times are given, since almost all server side caching solutions work operate with RAM only (in order to be very fast) and RAM size is limited, all caches do EVICTION once full: generally (depending on solution used) oldest elements are evicted to leave room for newer ones. This process makes writes slow, since server will have to perform two operations on every key set (delete oldest value, insert new value). To prevent that from appearing, expiration must always be used and data that is saved to cache must be tightly controlled in size.

Should we use server or client side caching?

Although server side caching is extremely common in PHP applications today, due to fundamental problems already described above, it is actually the most problematic of the two:

Client side caching to be preferred because:

Either of the two solutions are not mutually exclusive! All applications require conservative HTTP caching in the cost free mode described above (client revalidates every request and server answers with 304/200 status codes depending on whether or not target page's etag has changed) and liberal HTTP caching (where response is to be retrieved from local cache until it expires), but some, precisely those working with big data or large numbers of requests per second that hammer the database, have NoSQL caching as a natural requirement.

What does Lucinda think about it?

HTTP Caching API, part of Lucinda Framework, implements IETF standards for client caching and makes it automatic for all resources in your application (pages), requiring developers only to choose if they want to cache by ETag or Last-Modified.

NoSQL Data Access API, part of Lucinda Framework abstracts server caching logic, assuming you are using a nosql document database solution. Since operations in no-sql databases are pretty much the same regardless of vendor (being governed by same document DB requirements), this will allow you to change nosql solution later (eg: memcached to redis) while keeping the rest of code untouched.


Share