Leonardo Losoviz Freelance developer and writer, with an ongoing quest to integrate innovative paradigms into existing PHP frameworks, and unify all of them into a single mental model.

HTTP caching in GraphQL

6 min read 1727

HTTP Caching in GraphQL

GraphQL and caching: two words that don’t go very well together.

The reason is that GraphQL operates via POST by executing all queries against a single endpoint and passing parameters through the body of the request. That single endpoint’s URL will produce different responses, which means it cannot be cached β€” at least not using the URL as the identifier.

“But wait a second,” you say. “GraphQL surely has caching, right?”

Yes, by doing it in the client through Apollo Client and similar libraries, which cache the returned objects independently of each other, identifying them by their unique global ID.

But this is a hack. This solution exists only because GraphQL cannot handle caching in the server, for which we normally use the URL as the identifier and cache the data for all entities in the response all together.

Caching in the client has a few disadvantages:

  • The application got more JavaScript to run on the client side. Accessing the website via a low-end mobile phone will take a performance hit
  • The application got more complex and with more moving parts, since now we also need to worry about implementing the caching layer
  • Not everybody understands JavaScript (e.g., the website may be coded in PHP), but now dealing with JS also becomes a responsibility

So what’s the solution then?

It is, simply put, to use the standards. In this case, the standard is HTTP caching.

“Yeah, but that’s the whole point β€” we can’t use HTTP caching! Or what are we talking about?”

Right. But knowing that we want to use HTTP caching, we can then approach the problem from a different angle. Instead of asking, “How can we cache GraphQL?” we can ask, “In order to use HTTP caching, how should we use GraphQL?”

We made a custom demo for .
No really. Click here to check it out.

In this article, we’ll answer this question.

Accessing GraphQL via GET

Using HTTP caching means that we will cache the GraphQL response using the URL as the identifier. This has two implications:

  1. We must access GraphQL’s single endpoint via GET
  2. We must pass the query and variables as URL params

Then, if the single endpoint is /graphql, the GET operation can be executed against URL /graphql?query=...&variables=....

This applies to retrieving data from the server (via the query operation). For mutating data (via the mutation operation), we must still use POST. There is no problem here since mutations are always executed fresh; we can’t cache the results of a mutation, so we wouldn’t use HTTP caching with it anyway.

This approach works (and it’s even suggested in the official site), but there are certain considerations we must keep in mind.

Coding GraphQL queries via URL param

A GraphQL query will normally span multiple lines. For instance:

{
  posts {
    id
    title
  }
}

However, we can’t input this multi-line string directly in the URL param.

The solution is to encode it. For instance, the GraphiQL client will encode the query above like this:

%7B%0A%20%20posts%20%7B%0A%20%20%20%20id%0A%20%20%20%20title%0A%20%20%7D%0A%7D

Alright, this works. But it doesn’t look very good, right? Who can make sense of that query?

One of the virtues of GraphQL is that its queries are so easy to grasp. With some practice, once we see the query, we understand it immediately. But once it’s been codified, all that is gone, and only machines can comprehend it; the human is out of the equation.

Another solution could be to replace all the newlines in the query with a space, which works because newlines add no semantic meaning to the query. Then, the query above can be represented as:

?query={ posts { id title } }

This works well for simple queries. But if you have a really long query, opening and closing many curly brackets and adding field arguments and directives, then it becomes increasingly difficult to understand.

For instance, this query:

{
  posts(limit:5) {
    id
    title @titleCase
    excerpt @default(
      value:"No title",
      condition:IS_EMPTY
    )
    author {
      name
    }
    tags {
      id
      name
    }
    comments(
      limit:3,
      order:"date|DESC"
    ) {
      id
      date(format:"d/m/Y")
      author {
        name
      }
      content
    }
  }
}

Would become this single-line query:

{ posts(limit:5) { id title @titleCase excerpt @default(value:"No title", condition:IS_EMPTY) author { name } tags { id name } comments(limit:3, order:"date|DESC") { id date(format:"d/m/Y") author { name } content } } }

Once again, it works, but we won’t know what it is we are executing. And if the query also contains fragments, then absolutely forget it β€” there’s no way we can make sense of it.

So, what can we do about it?

GraphQL over HTTP

First, the good news: stakeholders from the GraphQL community have identified this problem and have begun work on the GraphQL over HTTP specification, which will standardize how everyone (GraphQL servers, clients, libraries, etc.) will communicate their GraphQL queries via URL param.

Second, the not-so-good news: progress on this endeavor seems to be slow, and the specification so far is not comprehensive enough to be usable. Thus, we either wait for an uncertain amount of time, or we look for another solution.

Persisted queries to the rescue

If passing the query in the URL is not satisfactory, what other option do we have? Well, to not pass the query in the URL!

This approach is called a “persisted query.” We store the query in the server and use an identifier (such as a numeric ID or a unique string produced by applying a hashing algorithm with the query as input) to retrieve it. Finally, we pass this identifier as the URL parameter instead of the query.

For instance, the query could be identified with ID 2908 (or a hash such as "50ac3e81"), and then we execute the GET operation against URL /graphql?id=2908. The GraphQL server will then retrieve the query corresponding to this ID, execute it, and return the results.

Using persisted queries, implementing HTTP caching becomes a nonissue.

Problem solved! If you want to use HTTP caching in your GraphQL server, find a GraphQL server that supports persisted queries, either natively or through some library.

Calculating the max-age value

On to the next challenge!

HTTP caching works by sending the Cache-Control header in the response, with a max-age value indicating the amount of time the response must be cached or no-store indicating not to cache it.

How will the GraphQL server calculate the max-age value for the query, considering that different fields can have different max-age values?

The answer is to get the max-age value for all fields requested in the query and find out which is the lowest one. That will be the response’s max-age.

For instance, let’s say we have an entity of type User. Following the behavior assigned to this entity, we can assign how long the corresponding field should be cached:

πŸ›  Its ID will never change β‡’ We give field id a max-age of 1 year

πŸ›  Its URL will be updated very randomly (if ever) β‡’ We give field url a max-age of 1 day

πŸ›  The person’s name may change every now and then (e.g., to add a status, or to say “Milton (wears a mask)”) β‡’ We give field name a max-age of 1 hour

πŸ›  The user’s karma on the site can change at all times (e.g., after somebody upvotes their comment) β‡’ We give field karma a max-age of 1 minute

πŸ›  If querying the data from the logged-in user, then the response can’t be cached at all (independently of whichever field we’re fetching) β‡’ The max-age must be no-store

As a result, the response to the following GraphQL queries will have the following max-age values (for this example, we ignore the max-age for field Root.users, but in practice, it will also be taken into account):

Query max-age value
{
  users {
    id
  }
}
1 year
{
  users {
    id
    url
  }
}
1 day
{
  users {
    id
    url
    name
  }
}
1 hour
{
  users {
    id
    url
    name
    karma
  }
}
1 minute
{
  me {
    id
    url
    name
    karma
  }
}
no-store (don’t cache)

Adding directives to calculate the max-age value

How can the GraphQL server calculate the response’s max-age value? Because this value will depend on all the fields present in the query, there’s an obvious candidate to do it: directives.

A schema-type directive can be assigned to a field, and we can customize its configuration via directive arguments.

Hence, we can create a directive @cacheControl with argument maxAge of type Int (measuring seconds). Specifying maxAge with value 0 is equivalent to no-store. If not provided (the argument has been defined as non-mandatory), a predefined default max-age is used.

We can now configure our schema to satisfy the max-age defined for all fields earlier on. Using the Schema Definition Language (SDL), it will look like this:

directive @cacheControl(maxAge: Int) on FIELD_DEFINITION

type User {
  id: ID @cacheControl(maxAge: 31557600)
  url: URL @cacheControl(maxAge: 86400)
  name: String @cacheControl(maxAge: 3600)
  karma: Int @cacheControl(maxAge: 60)
}

type Root {
  me: User @cacheControl(maxAge: 0)
}

Coding the @cacheControl directive

I will demonstrate my implementation of the @cacheControl directive for server GraphQL API for WordPress, which is coded in PHP. (This server has both native persisted queries and HTTP caching.)

The resolution for the directive is very simple: it just takes the maxAge value from the directive argument and injects it into a service called CacheControlEngine:

public function resolveDirective(): void
{
  $maxAge = $this->directiveArgsForSchema['maxAge'];
  if (!is_null($maxAge)) {
    $this->cacheControlEngine->addMaxAge($maxAge);
  }
}

Whenever injecting a new max-age value, the CacheControlEngine service will compute the lower value and store it in its state:

class CacheControlEngine
{
  protected ?int $minimumMaxAge = null;

  public function addMaxAge(int $maxAge): void
  {
    if (is_null($this->minimumMaxAge) || $maxAge < $this->minimumMaxAge) {
      $this->minimumMaxAge = $maxAge;
    }
  }
}

The service can then generate the Cache-control header, with the max-age value for the response:

class CacheControlEngine
{
  public function getCacheControlHeader(): ?string
  {
    if (!is_null($this->minimumMaxAge)) {
      // Minimum max-age = 0 => `no-store`
      if ($this->minimumMaxAge === 0) {
        return 'Cache-Control: no-store';
      }
      return sprintf(
        'Cache-Control: max-age=%s',
        $this->minimumMaxAge
      );
    }
    return null;
  }
}

Finally, the GraphQL server will get the Cache-Control header from the service and add it to the response.

Conclusion

In the never-ending argument of whether GraphQL is better than REST (and vice versa), REST always had an ace up its sleeve: server-side caching.

But we can also have GraphQL support HTTP caching. All it takes is storing the query in the server and then accessing this “persisted query” via GET, providing the ID for the query as a URL parameter. It is a trade-off that is more than justified and more than worth it.

GraphQL and caching: two words that go very well together.

Monitor failed and slow GraphQL requests in production

While GraphQL has some features for debugging requests and responses, making sure GraphQL reliably serves resources to your production app is where things get tougher. If you’re interested in ensuring network requests to the backend or third party services are successful, try LogRocket.https://logrocket.com/signup/

LogRocket is like a DVR for web apps, recording literally everything that happens on your site. Instead of guessing why problems happen, you can aggregate and report on problematic GraphQL requests to quickly understand the root cause. In addition, you can track Apollo client state and inspect GraphQL queries' key-value pairs.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. .
Leonardo Losoviz Freelance developer and writer, with an ongoing quest to integrate innovative paradigms into existing PHP frameworks, and unify all of them into a single mental model.

Leave a Reply