Michal Zalecki Senior Engineer at @Tooploox πŸ’Ž, smart contracts, fan of hackathons, React Wroclaw meetup organiser

Caching headers: A practical guide for frontend developers

8 min read 2322

Caching

There are multiple headers available that developers and ops people can use to manipulate cache behavior.

The old spec is mixing with the new: there are numerous settings to configure, and you can find multiple users reporting inconsistent behavior.

In this post, I’ll focus on explaining how different headers influence the browser cache and how they relate to proxy servers.

You’re going to find an example of a configuration for Nginx and the code for Node.js running Express. In the end, we’ll look into how popular services created in React are serving their web applications.

For a single page application, I’m interested in caching JavaScript, CSS, fonts, and image files indefinitely and preventing caching HTML files and service workers (if you have any).

This strategy is viable as my assets files have unique identifiers in the file names.

You can achieve the same configuration in WebPack to include a [hash] , or β€” even better β€” a [chunkhash], in the file name of your assets. This technique is called long-term caching.

But when you prevent re-downloading, how can you then make updates to your website? Maintaining the ability to update the website is why it’s so important to never cache HTML files.

Every time you visit my site, the browser fetches a fresh copy of the HTML file from the server, and only when there are new script srcs or link hrefs is the browser downloading a new asset from the server.

Cache-Control

Cache-Control: no-store

The browser should not store anything about the request when it’s told no-store. You can use it for HTML and Service Worker script.

We made a custom demo for .
No really. Click here to check it out.

Cache-Control: public, no-cache

or

Cache-Control: public, max-age=0, must-revalidate

These two are equivalent and, despite the no-cache name, allow for serving cached responses with the exception that the browser has to validate if the cache is fresh.

If you correctly set ETag or Last-Modified headers so that the browser can verify that it already has the recent version cached, you and your users are going to save on bandwidth. You can use it for HTML and service worker script.

Cache-Control: private, no-cache

or

Cache-Control: private, max-age=0, must-revalidate

By analogy, these two are also equivalent. The difference between public and private is that a shared cache (e.g., CDN) can cache public responses but not private responses.

The local cache (e.g., browser) can still cache private responses. You use private when you render your HTML on the server, and the rendered HTML contains user-specific or sensitive information.

In framework terms, you don’t need to set private for a typical Gatsby blog, but you should consider it with Next.js for pages that require authorized access.

Cache-Control: public, max-age=31536000, immutable

In this example, the browser is going to cache the response for a year according to the max-age directive (606024*365).

The immutable directive tells the browser that the content of this response (file) is not going to change, and the browser should not validate its cache by sending If-None-Match (ETag validation) or If-Modified-Since (Last-Modified validation).

Use is for your static assets to support long-term caching strategies.

Pragma and Expires

Pragma: no-cache
Expires: <http-date>

Pragma is an old header defined in the HTTP/1.0 spec as a request header.

Later, the HTTP/1.1 spec states that the Pragma: no-cache response should be handled as Cache-Control: no-cache, but it’s not a reliable replacement due to the fact that it’s still a request header.

I also keep using Pragma: no-cache as an OWASP security recommendation.

Including the Pragma: no-cache header is a precaution that protects legacy servers that don’t support newer cache control mechanisms and could cache what you don’t intend to be cached.

Some would argue that unless you have to support Internet Explorer 5 or Netscape, you don’t need Prama or Expires. It comes down to supporting legacy software.

Proxies universally understand the Expires header, which gives it a slight edge.

For HTML files, I keep Expires header disabled or set it to a past date. For static assets, I manage it together with Cache-Control’s max-age via the Nginx expires directive.

ETags

ETag: W/"5e15153d-120f"

or

ETag: "5e15153d-120f"

ETags are one of several methods of cache validation. ETag must uniquely identify the resource, and most often, the web server generates a fingerprint from the resource content.

When the resource changes, it’s going to have a different ETag value.

There are two types of ETags. A weak ETags equality indicates that resources are semantically equivalent. A strong ETags validation indicates that resources are byte-to-byte identical.

You can distinguish between the two by the “W/” prefix set for weak ETags.

Weak ETags are not suitable for byte-range requests, but they’re easy to generate on the fly.

In practice, you are not going to set ETags on your own and let your web server handle them.

curl -I <http-address>
curl -I -H "Accept-Encoding: gzip" <http-address>

You may see that when you request a static file from Nginx, it sets a strong ETag. When gzip compression is enabled, but you don’t upload compressed files, the on-the-fly compression results in weak ETags.

By sending the “If-None-Match” request header with the ETag of a cached resource, the browser expects either a 200 OK response with a new resource, or an empty 304 Not Modified response, which indicates that you should use a cached resource instead of downloading a new one.

The same optimization can apply to API GET responses, and it’s not limited to static files.

If your application receives large JSON payloads, you can configure your backend to calculate and set ETag from the content of the payload (e.g., using md5).

Before sending it to the client, compare it with the “If-None-Match” request header.

If there’s a match, instead of sending the payload, send 304 Not Modified to save on bandwidth and improve web app performance.

Last-Modified

Last-Modified: Tue, 07 Jan 2020 23:33:17 GMT

The Last-Modified response header is another cache control mechanism and uses the last modification date. The Last-Modified header is a fallback mechanism for more accurate ETags.

By sending the “If-Modified-Since” request header with the last modification date of a cached resource, the browser expects either a 200 OK response with a newer resource or an empty 304 Not Modified response, which indicates that the cached resource should be used instead of downloading a new one.

Debugging

When you set headers and then test the configuration, make sure you’re close to your server with regards to the network. What I mean by that is, if you have your server Dockerized, then run the container and test it locally.

If you configure a VM, then ssh to that VM and test headers there. If you have a Kubernetes cluster, spin up a pod and call your service from within the cluster.

In a production setup, you’re going to work with load balancers, proxies, and CDNs. At each of these steps, your headers can get modified, so it’s much easier to debug knowing your server sent correct headers in the first place.

An example of an unexpected behavior can be a Cloudflare removing the ETag header if you have Email Address Obfuscation or Automatic HTTPS Rewrites enabled.

Good luck trying to debug it by changing your server configuration! In Cloudflare’s defense, this behavior is very well documentedΒ and makes perfect sense, so it’s on you to know your tools.

Cache-Control: max-age=31536000
Cache-Control: public, immutable

Earlier in this post, I’ve put “or” in between headers in code snippets to indicate that those are two different examples. Sometimes you may notice more than one same header in the HTTP response.

It means that both headers apply. Some proxy servers can merge headers along the way. The above example is equal to:

Cache-Control: max-age=31536000, public, immutable

Using curl is going to give you the most consistent results and the ease of running in multiple environments.

If you decide to use a web browser regardless, make sure to look at the service worker while debugging caching problems. Service worker debugging is a complex topic for another post.

To troubleshoot caching problems, make sure you enable bypassing service workers in the DevTools Application tab.

Nginx Configuration

Now that you understand what different types of caching headers do, it’s time to focus on putting your knowledge into practice.

The following Nginx configuration is going to serve a Single Page Application that was built to support long-term caching.

gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript;

First of all, I enabled gzip compression for content types that benefit a Single Page Application the most. For more details on each of the available gzip settings, head to the nginx gzip module documentation.

location ~* (\.html|\/sw\.js)$ {
  expires -1y;
  add_header Pragma "no-cache";
  add_header Cache-Control "public";
}

I want to match all HTML files together with /sw.js, which is a service worker script.

Neither should be cached. The Nginx expires directive set to negative value sets past the Expires header and adds an additional Cache-Control: no-cache header.

location ~* \.(js|css|png|jpg|jpeg|gif|ico|json)$ {
  expires 1y;
  add_header Cache-Control "public, immutable";
}

I want to maximize caching for all my static assets, which are JavaScript files, CSS files, images, and static JSON files. If you host your font files, you can add them as well.

location / {
  try_files $uri $uri/ =404;
}


if ($host ~* ^www\.(.*)) {
  set $host_without_www $1;
  rewrite ^(.*) https://$host_without_www$1 permanent;
}

Those two are not related to caching, but they’re an essential part of the Nginx configuration.

Since modern Single Page Applications support routing for pretty URLs, and my static server is not aware of them. I need to serve a default index.html for every route that doesn’t match a static file.

I’m also interested in redirects from URLs with www. to URLs without www. You might not need this last one incase you host your application where your service provider already does that for you.

Express Configuration

Sometimes we are unable to serve static files using a reverse proxy server like Nginx.

It might be the case that your serverless setup/service provider limits you to using one of the popular programming languages, and performance is not your primary concern.

In such a case, you might want to use a server like Express to serve your static files.

import express, { Response } from "express";
import compression from "compression";
import path from "path";

const PORT = process.env.PORT || 3000;
const BUILD_PATH = "public";

const app = express();

function setNoCache(res: Response) {
  const date = new Date();
  date.setFullYear(date.getFullYear() - 1);
  res.setHeader("Expires", date.toUTCString());
  res.setHeader("Pragma", "no-cache");
  res.setHeader("Cache-Control", "public, no-cache");
}

function setLongTermCache(res: Response) {
  const date = new Date();
  date.setFullYear(date.getFullYear() + 1);
  res.setHeader("Expires", date.toUTCString());
  res.setHeader("Cache-Control", "public, max-age=31536000, immutable");
}

app.use(compression());
app.use(
  express.static(BUILD_PATH, {
    extensions: ["html"],
    setHeaders(res, path) {
      if (path.match(/(\.html|\/sw\.js)$/)) {
        setNoCache(res);
        return;
      }

      if (path.match(/\.(js|css|png|jpg|jpeg|gif|ico|json)$/)) {
        setLongTermCache(res);
      }
    },
  }),
);

app.get("*", (req, res) => {
  setNoCache(res);
  res.sendFile(path.resolve(BUILD_PATH, "index.html"));
});

app.listen(PORT, () => {
  console.log(`Server is running http://localhost:${PORT}`);
});

This script is mimicking what our Nginx configuration is doing. Enable gzip using the compression middleware.

Express Static middleware sets ETag and Last-Modified headers for you. We have to handle sending index.html on our own incase the request doesn’t match any known static file.

Examples

Finally, I wanted to explore how popular services utilize caching headers.

I checked headers separately for HTML and CSS or JavaScript files. I also looked at the Server header (if any) as it might give us an exciting insight into the underlying infrastructure.

Twitter

Twitter tries very hard for their HTML files not to end up in your browser cache. It looks like Twitter is using Express to serve us a <div id="react-root"> entry point for the React app.

For whatever reason, Twitter uses the Expiry header, and the Expires header is missing.

I’ve looked it up, but I haven’t founnd anything interesting.

Might it be a typo? If you know, please leave a comment.

cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
expiry: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Wed, 08 Jan 2020 22:16:19 GMT (current date)
pragma: no-cache
server: tsa_o
x-powered-by: Express

Twitter doesn’t have CSS files and is probably using some CSS-in-JS solution. It looks like a containerized application running on Amazon ECS is serving static files.

etag: "fXSAIt9bnXh6KGXnV0ABwQ=="
expires: Thu, 07 Jan 2021 22:19:54 GMT
last-modified: Sat, 07 Dec 2019 22:27:21 GMT
server: ECS (via/F339)

Instagram

Instagram doesn’t want your browser to cache HTML either, and uses a valid Expires header set to the beginning of the year 2000; any prior date than the current date is good.

last-modified: Wed, 08 Jan 2020 21:45:45 GMT
cache-control: private, no-cache, no-store, must-revalidate
pragma: no-cache
expires: Sat, 01 Jan 2000 00:00:00 GMT

Both CSS and JavaScript files served by Instagram support long term caching and also have an ETag.

etag: "3d0c27ff077a"
cache-control: public,max-age=31536000,immutable

New York Times

The New York Times is also using React and serves its articles as server-side rendered pages. The last modification date seems to be a real date that doesn’t change with every request.

cache-control: no-cache
last-modified: Wed, 08 Jan 2020 21:54:09 GMT
server: nginx

New York Times assets are also cached for a long time with both Etag and Last-Modified date provided.

cache-control: public,max-age=31536000
etag: "42db6c8821fec0e2b3837b2ea2ece8fe"
expires: Wed, 24 Jun 2020 23:27:22 GMT
last-modified: Tue, 25 Jun 2019 22:51:52 GMT
server: UploadServer

Conclusion

I’ve created this partially to organize my knowledge, but also I intend to use it as a cheat sheet for configuring current and future projects. I hope you enjoyed reading and also found it useful!

If you have any questions or would like to suggest an improvement, please leave a comment below, and I’ll be happy to answer it!

Plug: , a DVR for web apps

LogRocket is a frontend application monitoring solution that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.

In addition to logging Redux actions and state, LogRocket records console logs, JavaScript errors, stacktraces, network requests/responses with headers + bodies, browser metadata, and custom logs. It also instruments the DOM to record the HTML and CSS on the page, recreating pixel-perfect videos of even the most complex single-page apps.

.
Michal Zalecki Senior Engineer at @Tooploox πŸ’Ž, smart contracts, fan of hackathons, React Wroclaw meetup organiser

One Reply to “Caching headers: A practical guide for frontend developers”

  1. This is amazing and so valuable. Been looking for something like this to properly setup my headers. Thanks for the write up.

Leave a Reply