There are multiple headers available that developers and ops people can use to manipulate cache behavior.
The old spec is mixing with the new: there are numerous settings to configure, and you can find multiple users reporting inconsistent behavior.
In this post, I’ll focus on explaining how different headers influence the browser cache and how they relate to proxy servers.
You’re going to find an example of a configuration for Nginx and the code for Node.js running Express. In the end, we’ll look into how popular services created in React are serving their web applications.
For a single page application, I’m interested in caching JavaScript, CSS, fonts, and image files indefinitely and preventing caching HTML files and service workers (if you have any).
This strategy is viable as my assets files have unique identifiers in the file names.
You can achieve the same configuration in WebPack to include a [hash]
, or — even better — a [chunkhash]
, in the file name of your assets. This technique is called long-term caching.
But when you prevent re-downloading, how can you then make updates to your website? Maintaining the ability to update the website is why it’s so important to never cache HTML files.
Every time you visit my site, the browser fetches a fresh copy of the HTML file from the server, and only when there are new script srcs or link hrefs is the browser downloading a new asset from the server.
Cache-Control: no-store
The browser should not store anything about the request when it’s told no-store
. You can use it for HTML and Service Worker script.
Cache-Control: public, no-cache or Cache-Control: public, max-age=0, must-revalidate
These two are equivalent and, despite the no-cache name, allow for serving cached responses with the exception that the browser has to validate if the cache is fresh.
If you correctly set ETag or Last-Modified headers so that the browser can verify that it already has the recent version cached, you and your users are going to save on bandwidth. You can use it for HTML and service worker script.
Cache-Control: private, no-cache or Cache-Control: private, max-age=0, must-revalidate
By analogy, these two are also equivalent. The difference between public and private is that a shared cache (e.g., CDN) can cache public responses but not private responses.
The local cache (e.g., browser) can still cache private responses. You use private when you render your HTML on the server, and the rendered HTML contains user-specific or sensitive information.
In framework terms, you don’t need to set private for a typical Gatsby blog, but you should consider it with Next.js for pages that require authorized access.
Cache-Control: public, max-age=31536000, immutable
In this example, the browser is going to cache the response for a year according to the max-age directive (606024*365).
The immutable directive tells the browser that the content of this response (file) is not going to change, and the browser should not validate its cache by sending If-None-Match (ETag validation) or If-Modified-Since (Last-Modified validation).
Use is for your static assets to support long-term caching strategies.
Pragma: no-cache Expires: <http-date>
Pragma is an old header defined in the HTTP/1.0 spec as a request header.
Later, the HTTP/1.1 spec states that the Pragma: no-cache
response should be handled as Cache-Control: no-cache
, but it’s not a reliable replacement due to the fact that it’s still a request header.
I also keep using Pragma: no-cache
as an OWASP security recommendation.
Including the Pragma: no-cache
header is a precaution that protects legacy servers that don’t support newer cache control mechanisms and could cache what you don’t intend to be cached.
Some would argue that unless you have to support Internet Explorer 5 or Netscape, you don’t need Prama or Expires. It comes down to supporting legacy software.
Proxies universally understand the Expires header, which gives it a slight edge.
For HTML files, I keep Expires header disabled or set it to a past date. For static assets, I manage it together with Cache-Control’s max-age via the Nginx expires directive.
ETag: W/"5e15153d-120f" or ETag: "5e15153d-120f"
ETags are one of several methods of cache validation. ETag must uniquely identify the resource, and most often, the web server generates a fingerprint from the resource content.
When the resource changes, it’s going to have a different ETag value.
There are two types of ETags. A weak ETags equality indicates that resources are semantically equivalent. A strong ETags validation indicates that resources are byte-to-byte identical.
You can distinguish between the two by the “W/” prefix set for weak ETags.
Weak ETags are not suitable for byte-range requests, but they’re easy to generate on the fly.
In practice, you are not going to set ETags on your own and let your web server handle them.
curl -I <http-address> curl -I -H "Accept-Encoding: gzip" <http-address>
You may see that when you request a static file from Nginx, it sets a strong ETag. When gzip compression is enabled, but you don’t upload compressed files, the on-the-fly compression results in weak ETags.
By sending the “If-None-Match” request header with the ETag of a cached resource, the browser expects either a 200 OK response with a new resource, or an empty 304 Not Modified response, which indicates that you should use a cached resource instead of downloading a new one.
The same optimization can apply to API GET responses, and it’s not limited to static files.
If your application receives large JSON payloads, you can configure your backend to calculate and set ETag from the content of the payload (e.g., using md5).
Before sending it to the client, compare it with the “If-None-Match” request header.
If there’s a match, instead of sending the payload, send 304 Not Modified to save on bandwidth and improve web app performance.
Last-Modified: Tue, 07 Jan 2020 23:33:17 GMT
The Last-Modified response header is another cache control mechanism and uses the last modification date. The Last-Modified header is a fallback mechanism for more accurate ETags.
By sending the “If-Modified-Since” request header with the last modification date of a cached resource, the browser expects either a 200 OK response with a newer resource or an empty 304 Not Modified response, which indicates that the cached resource should be used instead of downloading a new one.
When you set headers and then test the configuration, make sure you’re close to your server with regards to the network. What I mean by that is, if you have your server Dockerized, then run the container and test it locally.
If you configure a VM, then ssh to that VM and test headers there. If you have a Kubernetes cluster, spin up a pod and call your service from within the cluster.
In a production setup, you’re going to work with load balancers, proxies, and CDNs. At each of these steps, your headers can get modified, so it’s much easier to debug knowing your server sent correct headers in the first place.
An example of an unexpected behavior can be a Cloudflare removing the ETag header if you have Email Address Obfuscation or Automatic HTTPS Rewrites enabled.
Good luck trying to debug it by changing your server configuration! In Cloudflare’s defense, this behavior is very well documented and makes perfect sense, so it’s on you to know your tools.
Cache-Control: max-age=31536000 Cache-Control: public, immutable
Earlier in this post, I’ve put “or” in between headers in code snippets to indicate that those are two different examples. Sometimes you may notice more than one same header in the HTTP response.
It means that both headers apply. Some proxy servers can merge headers along the way. The above example is equal to:
Cache-Control: max-age=31536000, public, immutable
Using curl
is going to give you the most consistent results and the ease of running in multiple environments.
If you decide to use a web browser regardless, make sure to look at the service worker while debugging caching problems. Service worker debugging is a complex topic for another post.
To troubleshoot caching problems, make sure you enable bypassing service workers in the DevTools Application tab.
Now that you understand what different types of caching headers do, it’s time to focus on putting your knowledge into practice.
The following Nginx configuration is going to serve a Single Page Application that was built to support long-term caching.
gzip on; gzip_disable "msie6"; gzip_vary on; gzip_proxied any; gzip_comp_level 6; gzip_buffers 16 8k; gzip_http_version 1.1; gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript;
First of all, I enabled gzip compression for content types that benefit a Single Page Application the most. For more details on each of the available gzip settings, head to the nginx gzip module documentation.
location ~* (\.html|\/sw\.js)$ { expires -1y; add_header Pragma "no-cache"; add_header Cache-Control "public"; }
I want to match all HTML files together with /sw.js
, which is a service worker script.
Neither should be cached. The Nginx expires
directive set to negative value sets past the Expires
header and adds an additional Cache-Control: no-cache
header.
location ~* \.(js|css|png|jpg|jpeg|gif|ico|json)$ { expires 1y; add_header Cache-Control "public, immutable"; }
I want to maximize caching for all my static assets, which are JavaScript files, CSS files, images, and static JSON files. If you host your font files, you can add them as well.
location / { try_files $uri $uri/ =404; } if ($host ~* ^www\.(.*)) { set $host_without_www $1; rewrite ^(.*) https://$host_without_www$1 permanent; }
Those two are not related to caching, but they’re an essential part of the Nginx configuration.
Since modern Single Page Applications support routing for pretty URLs, and my static server is not aware of them. I need to serve a default index.html
for every route that doesn’t match a static file.
I’m also interested in redirects from URLs with www.
to URLs without www
. You might not need this last one incase you host your application where your service provider already does that for you.
Sometimes we are unable to serve static files using a reverse proxy server like Nginx.
It might be the case that your serverless setup/service provider limits you to using one of the popular programming languages, and performance is not your primary concern.
In such a case, you might want to use a server like Express to serve your static files.
import express, { Response } from "express"; import compression from "compression"; import path from "path"; const PORT = process.env.PORT || 3000; const BUILD_PATH = "public"; const app = express(); function setNoCache(res: Response) { const date = new Date(); date.setFullYear(date.getFullYear() - 1); res.setHeader("Expires", date.toUTCString()); res.setHeader("Pragma", "no-cache"); res.setHeader("Cache-Control", "public, no-cache"); } function setLongTermCache(res: Response) { const date = new Date(); date.setFullYear(date.getFullYear() + 1); res.setHeader("Expires", date.toUTCString()); res.setHeader("Cache-Control", "public, max-age=31536000, immutable"); } app.use(compression()); app.use( express.static(BUILD_PATH, { extensions: ["html"], setHeaders(res, path) { if (path.match(/(\.html|\/sw\.js)$/)) { setNoCache(res); return; } if (path.match(/\.(js|css|png|jpg|jpeg|gif|ico|json)$/)) { setLongTermCache(res); } }, }), ); app.get("*", (req, res) => { setNoCache(res); res.sendFile(path.resolve(BUILD_PATH, "index.html")); }); app.listen(PORT, () => { console.log(`Server is running http://localhost:${PORT}`); });
This script is mimicking what our Nginx configuration is doing. Enable gzip using the compression middleware.
Express Static middleware sets ETag
and Last-Modified
headers for you. We have to handle sending index.html
on our own incase the request doesn’t match any known static file.
Finally, I wanted to explore how popular services utilize caching headers.
I checked headers separately for HTML and CSS or JavaScript files. I also looked at the Server header (if any) as it might give us an exciting insight into the underlying infrastructure.
Twitter tries very hard for their HTML files not to end up in your browser cache. It looks like Twitter is using Express to serve us a <div id="react-root">
entry point for the React app.
For whatever reason, Twitter uses the Expiry
header, and the Expires
header is missing.
I’ve looked it up, but I haven’t founnd anything interesting.
Might it be a typo? If you know, please leave a comment.
cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 expiry: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Wed, 08 Jan 2020 22:16:19 GMT (current date) pragma: no-cache server: tsa_o x-powered-by: Express
Twitter doesn’t have CSS files and is probably using some CSS-in-JS solution. It looks like a containerized application running on Amazon ECS is serving static files.
etag: "fXSAIt9bnXh6KGXnV0ABwQ==" expires: Thu, 07 Jan 2021 22:19:54 GMT last-modified: Sat, 07 Dec 2019 22:27:21 GMT server: ECS (via/F339)
Instagram doesn’t want your browser to cache HTML either, and uses a valid Expires header set to the beginning of the year 2000; any prior date than the current date is good.
last-modified: Wed, 08 Jan 2020 21:45:45 GMT cache-control: private, no-cache, no-store, must-revalidate pragma: no-cache expires: Sat, 01 Jan 2000 00:00:00 GMT
Both CSS and JavaScript files served by Instagram support long term caching and also have an ETag.
etag: "3d0c27ff077a" cache-control: public,max-age=31536000,immutable
The New York Times is also using React and serves its articles as server-side rendered pages. The last modification date seems to be a real date that doesn’t change with every request.
cache-control: no-cache last-modified: Wed, 08 Jan 2020 21:54:09 GMT server: nginx
New York Times assets are also cached for a long time with both Etag and Last-Modified date provided.
cache-control: public,max-age=31536000 etag: "42db6c8821fec0e2b3837b2ea2ece8fe" expires: Wed, 24 Jun 2020 23:27:22 GMT last-modified: Tue, 25 Jun 2019 22:51:52 GMT server: UploadServer
I’ve created this partially to organize my knowledge, but also I intend to use it as a cheat sheet for configuring current and future projects. I hope you enjoyed reading and also found it useful!
If you have any questions or would like to suggest an improvement, please leave a comment below, and I’ll be happy to answer it!
Install LogRocket via npm or script tag. LogRocket.init()
must be called client-side, not
server-side
$ npm i --save logrocket // Code: import LogRocket from 'logrocket'; LogRocket.init('app/id');
// Add to your HTML: <script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script> <script>window.LogRocket && window.LogRocket.init('app/id');</script>
Would you be interested in joining LogRocket's developer community?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowCompare Prisma and Drizzle ORMs to learn their differences, strengths, and weaknesses for data access and migrations.
It’s easy for devs to default to JavaScript to fix every problem. Let’s use the RoLP to find simpler alternatives with HTML and CSS.
Learn how to manage memory leaks in Rust, avoid unsafe behavior, and use tools like weak references to ensure efficient programs.
Bypass anti-bot measures in Node.js with curl-impersonate. Learn how it mimics browsers to overcome bot detection for web scraping.
3 Replies to "Caching headers: A practical guide for frontend developers"
This is amazing and so valuable. Been looking for something like this to properly setup my headers. Thanks for the write up.
Hi Team,
I have added below code for caching but it’s not working, please suggest.
location ~* \.(jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc|svg|woff|woff2|ttf)\$ {
etag on;
if_modified_since exact;
expires 30d;
#add_header Pragma “public”;
add_header Cache-Control “public, no-transform”;
}
Regards,
Mel
For SPA html – is it enough to use meta tag headers to tell the browser not to cache? Are they reliable? Or I must setup response headers on the server?