Alexander Nnakwue Software engineer. React, Node.js, Python, and other developer tools and libraries.

Caching in Node.js: Optimizing app performance with Redis

12 min read 3441

Caching Nodejs Optimizing Performance Redis

Introduction

The importance of optimizing application performance cannot be overstated. As web developers, we want our apps to provide a great user experience, so we need to employ a variety of techniques to optimize their speed and performance.

The first step in the process is to identify how and where performance bottlenecks arise in the application. The usual suspect here is how we handle and manage our application data. When it comes to Node.js applications — and backend systems/applications in general — there is almost always a need to store user data or information in a dedicated database for subsequent retrieval.

To a large extent, the way we handle the data in our application determines how well it will perform at scale, especially considering databases themselves can amplify the overall performance bottleneck of our web apps. Any technique we can apply to reduce the amount of time needed to perform read/write operations on these databases will be beneficial.

Caching is one such technique we can employ to solve these performance challenges. In this tutorial, we’ll explore ways to improve an application’s overall performance by employing data caching strategies in Node.js. Let’s begin!

Why cache data in the first place?

Caching is generally recommended for cases in which the state of the data at a particular point in the app rarely changes. Think lists of products, country calling codes, or store locations. By way of example, for a recent feature, I needed to fetch a list of banks from an external API. The most efficient method was to make that API call once and store the response in a cache.

This means that, subsequently, we won’t need to make that same API call over the internet; rather, we can just retrieve the data from our cache since we wouldn’t expect it to suddenly change. Caching the data or the API response at that layer improves our app performance by a significant degree.

With caching systems, data retrieval processes are already optimized since data is stored in-memory. This is in contrast to the on-disk storage mechanisms found in most traditional databases, wherein reads and writes (in terms of queries) are not as fast when compared to in-memory storage systems.

In-memory databases are generally used by applications that manage a large amount of data and depend on rapid response time. With that in mind, we can sum up the two major benefits of in-memory databases:

  • Fewer CPU round trips/processes, leading to faster transactions
  • Faster reads/writes, ensuring multi-user concurrency

On the flip side, in-memory databases are more volatile than traditional databases since we can easily lose data if, for example, the RAM crashes. Traditional databases perform great in this regard because data can still be restored from the disks in the event of any issues.

Node.js performance at scale: Employing caching techniques

The speed at which our application can process data is a major performance consideration when designing our application architecture. As engineers, we need to explicitly determine which portions of our data processing/handling cycle should be cached.

We made a custom demo for .
No really. Click here to check it out.

Caching systems are not used in isolation. Caching is basically a layer of abstraction that involves an intermediary storage mechanism in conjunction with a backend system (in our case Node.js) and, usually, a traditional database. The point is to enable an efficient data retrieval process, and caching systems are optimized for that particular purpose.

For Node.js apps in particular, caching strategies should address the following concerns:

  • Update or invalidate stale data from a cache: A very common problem in web development is handling cache expiration logic for maintaining an up-to-date cache
  • Cache data that is frequently accessed: Caching makes a lot of sense here as we can process the data once, store it in a cache, and then retrieve it directly later, without doing any expensive or time-consuming operations. We would then need to periodically update the cache so users can see the latest information

Issues that come with scalability

Large applications have to process, transform, and store large datasets. Since data is absolutely fundamental to these apps’ functionality, the manner in which we handle the data determines how our application will perform in the wild.

Web applications eventually grow to cater for a large amount of data. Usually, we need to store this data in storage systems, e.g., databases. When it comes to data retrieval, at times it would be faster for us to fetch these data from an intermediate storage mechanism instead of performing several database queries.

When we have lots of users, there is a need to find alternative solutions to retrieve data instead of always reading from disk. Application storage systems have limits when it comes to I/O and would therefore make sense for us to carefully understand the best mechanism to store and retrieve our data in the fastest possible way. Welcome to the world of caching with Redis.

Caching with Redis: How Redis influences app performance

Redis is a preferred caching solution because its entire database is stored in-memory, and it uses a disk database for data persistence. Since Redis is an in-memory database, its data access operations are faster than any other disk-based database, which makes Redis the perfect choice for caching.

Its key-value data storage system is another plus because it makes storage and retrieval much simpler. Using Redis, we can store and retrieve data in the cache using the SET and GET methods, respectively. Besides that, Redis also works with complex data types like lists, sets, strings, hashes, bitmaps, and so on.

The process of caching with Redis is quite simple. When we receive a user request to a route that has caching enabled, we first check whether the requested data is already stored in the cache. If it is, we can quickly retrieve the data from the Redis cache and send the response back.

If the data is not stored in the cache, however — which we call a “cache miss” — we have to first retrieve the data from the database or from an external API call and send it to the client. We also make sure to store the retrieved data in the cache so that the next time the same request is made, we can simply send the cached data back to the user.

Now that we have a clear idea of what we are going to do, let’s get started with the implementation.

Creating a custom cache service in Node.js

Queries sometimes require several operations, like retrieving data from a database, performing calculations, retrieving additional data from third-party services, and so on. All these may impact overall application performance. The goal of caching is to improve the efficiency of these data access operations.

In this section, we’ll look at how to use Redis to create a simple cache for a Node.js application, then inspect how it impacts its performance. The steps are highlighted below:

  1. Spin up a simple Node.js application
  2. Write a reusable custom Redis implementation/cache service
  3. Show how using Redis to cache data from an external API call can help improve the performance of our app

Creating a simple custom cache service will allow us to:

  • Create a reusable service that we could employ in multiple parts of our app
  • Normalize the cache API and add more methods our app will need as it grows
  • Easily replace the cache module we chose with another one (if needed)

Now, let’s go! You can use the GitHub repo to follow along as we proceed.

First, we’ll quickly bootstrap a simple Node.js application. We can do so by running npm init, which creates a package.json file for us:

{
  "name": "performance_at_scale_node.js",
  "version": "1.0.0",
  "description": "A sample app to showcase Redis caching and how it affects an app overall performance benchmark",
  "main": "app.js",
  "scripts": {
    "start": "node app.js"
  },
  "keywords": [
    "Node.js",
    "Redis",
    "Performance",
    "Cache",
    "Caching",
    "JavaScript",
    "Backend",
    "LogRocket",
    "Frontend_Monitoring"
  ],
  "author": "Alexander Nnakwue",
  "license": "MIT",
  "dependencies": {
    "@hapi/joi": "^15.0.1",
    "axios": "^0.21.1",
    "body-parser": "^1.19.0",
    "dotenv": "^8.2.0",
    "express": "^4.17.1",
    "global": "^4.4.0",
    "redis": "^3.0.2",
    "util": "^0.12.3"
  }
}

In the file above, we have installed some dependencies we need for our application. We have installed Redis for caching, Axios for HTTP requests, @hapi/joi for schema/(req.body) validation, global to require global variables, Express for our basic Express server, and a few others.

Next, let’s set up a simple Express server in the root of our project directory; we can name this anything we want. See the Express server in the app.js file below:

require('dotenv').config()
const express = require('express')
const bodyParser = require('body-parser')
const config = require('./config')
const routes = require('./app/routes')

const app = express()
require("./cacheManager");

app.use(bodyParser.json())
// parse application/x-www-form-urlencoded
app.use(bodyParser.urlencoded({ extended: true }))

// parse application/json
app.use(bodyParser.json())
app.get('/', (req, res) => {
    // eslint-disable-next-line no-tabs
    res.status(200).send('Welcome to the Node.js Cache and Performance App')
})
// add routes here
routes(app)

// catch 404 and forward to error handler
app.use((req, res, next) => {
    const err = new Error('Not Found')
    err.status = 404
    res.send('Route not found')
    next(err)
})
app.listen(process.env.PORT || config.port, () => {
    console.log(`${config.name} listening on port ${config.port}!`)
})

module.exports = app

In the server file above, we can see that we are importing a cacheManager file, which basically handles everything that has to do with asynchronous caching in Redis using the util module. Let’s see the contents of that file:

"use strict";

const redis = require("redis");
const {promisify} = require("util");
const config = require('./config')
const redisClient = redis.createClient(
    {
        host: config.redis_host,
        port: config.redis_port
    }
);
const password = config.redis_password || null;
if(password && password != "null"){
    redisClient.auth(password, (err,res) => {
        console.log("res",res);
        console.log("err",err);
    });
}
try{
    redisClient.getAsync = promisify(redisClient.get).bind(redisClient);
    redisClient.setAsync = promisify(redisClient.set).bind(redisClient);
    redisClient.lpushAsync = promisify(redisClient.lpush).bind(redisClient);
    redisClient.lrangeAsync = promisify(redisClient.lrange).bind(redisClient);
    redisClient.llenAsync = promisify(redisClient.llen).bind(redisClient);
    redisClient.lremAsync = promisify(redisClient.lrem).bind(redisClient);
    redisClient.lsetAsync = promisify(redisClient.lset).bind(redisClient);
    redisClient.hmsetAsync = promisify(redisClient.hmset).bind(redisClient);
    redisClient.hmgetAsync = promisify(redisClient.hmget).bind(redisClient);
    redisClient.clear = promisify(redisClient.del).bind(redisClient);
}catch (e) {
    console.log("redis error", e);
}

redisClient.on("connected", function () {
    console.log("Redis is connected");
});
redisClient.on("error", function (err) {
    console.log("Redis error.", err);
});
setInterval(function() {
    console.log("Keeping alive - Node.js Performance Test with Redis");
    redisClient.set('ping', 'pong');
}, 1000 * 60 * 4);

global.cache = redisClient;
module.exports = redisClient;

We can see from the file above that we have created a Redis client and connected to a Redis cluster with the config available in our env variable. Also, we are setting a keep-alive script that runs every four minutes:

09:02:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:06:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:10:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:14:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:18:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:22:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:26:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:30:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:34:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:38:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:42:42 web.1   |  Keeping alive - Node.js Performance Test with Redis
09:46:42 web.1   |  Keeping alive - Node.js Performance Test with Redis

The config file that holds all our env variables is shown below:

require('dotenv').config()
const { env } = process
module.exports = {
    name: env.APP_NAME,
    baseUrl: env.APP_BASE_URL,
    port: env.PORT,
    redis_host: env.REDIS_HOST,
    redis_port: env.REDIS_PORT,
    redis_password: env.REDIS_PASSWORD,
    paystack_secret_key: env.PAYSTACK_SECRET_KEY
}

Now that we are done with all the setup, we can proceed to the most important part of this exercise: the actual business logic that allows us to apply Redis caching strategies. See the folder structure of our app below:

Redis Folder Structure App

Inside the app folder, we can see a file called paystackRepository.js inside the paystack folder. This file calls the Paystack API, which fetches a lists of banks and resolves bank account numbers. A link to the documentation for this Paystack API feature can be found here.

The contents of the paystackRepository.js file are below:

const axios = require("axios");
const config = require('../../config')
const {handleAxiosError} = require("../../helpers");
const _axios = axios.create({
    baseURL: "https://api.paystack.co",
    headers: {
        Authorization: `Bearer ${config.paystack_secret_key}`
    }
});

exports.banks = async () => {
    try {
        return {
            data: (await _axios
                .get(`bank`)).data.data
        };
    } catch (error) {
        console.log('An Error Occurred', error, handleAxiosError(error));
        return {error: error.message};
    }
};

exports.resolveAccountNumber = async (bankCode, accountNumber) => {
    try {
        return {
            data: (await _axios
                .get(`bank/resolve`, {
                    params: {
                        bank_code: bankCode,
                        account_number: accountNumber
                    }
                })).data.data
        };
    } catch (error) {
        console.log('An Error Occurred', error, handleAxiosError(error));
        return {error: error.message};
    }
};

From the file above, we can see that we are using Axios with the PAYSTACK_SECRET_KEY header stored in our env variable to make HTTP requests to Paystack’s API.

Inside the bankAccount folder, let’s have a look at the BankService.js file. It calls the paystackRepository.js file to fetch a list of banks and resolve bank account numbers, as we reviewed above. We are also caching the results after making the first API call to eliminate the need for subsequent calls.

"use strict";

const paystackRepository = require("../paystack/PaystackRepository");

exports.fetchAllBanks = async () => {
    // incase we are calling this endpoint a second time, we do not need to make a new API request 
    let banks = await cache.getAsync("bank-list");
    console.log("Data from cache", banks);
    if(banks)
        return {data: JSON.parse(banks)};
  const {error, data} = await paystackRepository.banks();
  if(error) return {error};
  // Store the bank list in a cache, since it rarely changes
  let cacheResponse = await cache.setAsync("bank-list", JSON.stringify(data));
  console.log("Cache", cacheResponse);
  return {
      data
  }
};

exports.resolveAccount = async (bankName, accountNumber) => {
    // Relying on the cached data is faster, as it rarely changes
let banks = JSON.parse(await cache.getAsync("bank-list"));

console.log(banks, 'banks')
    // Incase the data is not stored in the cache yet (but we expect it to be), make an API call to get bank lists 
    if(!banks){
        const {error, data} = await paystackRepository.banks();
        if(error)return {error};
        banks = data;
    }
    const bank = banks.find(bank => {
        return bank.name == bankName
    })
    if(!bank)
        return {error: "Bank Not Found"};
    console.log(bank.code)
    const {error, data} =  await paystackRepository.resolveAccountNumber(bank.code, accountNumber);
    if(error)return {error };
    return {
        data: {
            accountNumber,
            bankName,
            accountName: data.account_name
        }
    }
};

exports.resolveAccountPerfTest = async (bankName, accountNumber) => {
    // if there were no cache mechanism in place we needed to go fetch the bank lists compulsorily at every API call
    let banks;
    if(bankName && accountNumber) {
        const {error, data} = await paystackRepository.banks();
        if(error)return {error};
        banks = data;
    }

    const bank = banks.find(bank => {
        return bank.name == bankName
    })
    if(!bank)
        return {error: "Bank Not Found"};
    const {error, data} =  await paystackRepository.resolveAccountNumber(bank.code, accountNumber);
    if(error)return {error };
    return {
        data: {
            accountNumber,
            bankName,
            accountName: data.account_name
        }
    }
};

For the fetchAllBanks service, we are getting the bank list from Redis, but not without first saving the results to the Redis cache upon making the initial API request. Additionally, we can see that we have duplicated the resolveAccount method and removed the Redis implementation on the second method named resolveAccountPerfTest.

Next, we can navigate to the controller file, wherein we actually use these methods:

"use strict";
const bankService = require("./BankService");
const {
    sendErrorResponse,
    sendResponse
  } = require("../../helpers");

exports.fetchAllBanks = async (req, res, next) => {
    const {error, data} = await bankService.fetchAllBanks();
    if(error)
        return sendErrorResponse({res, message: error});
    return sendResponse({res, responseBody: data});
};

exports.resolveAccountNumber = async (req, res) => {
    const {bankName, accountNumber} = req.body;
    const {error, data} = await bankService.resolveAccount(bankName, accountNumber);
    if(error)
        return sendErrorResponse({res, message: error});
    return sendResponse({res, responseBody: data});
};

exports.resolveAccountPerfTest = async (req, res) => {
    const {bankName, accountNumber} = req.body;
    const {error, data} = await bankService.resolveAccountPerfTest(bankName, accountNumber);
    if(error)
        return sendErrorResponse({res, message: error});
    return sendResponse({res, responseBody: data});
};

Finally, before we test our implementation, we can take a peek at the routes file, located in the routes folder:

"use strict";

const router = require("express").Router();
const accountValidator = require("../bankAccount/BankAccountValidator");
const accountController = require("../bankAccount/BankAccountController");

router.get("/banks", accountController.fetchAllBanks);
router.post("/resolve", accountValidator.resolveAccount, accountController.resolveAccountNumber);
// TEST ROUTE TO SIMULATE PERFORMANCE METRICS BASED ON CACHING OUR ROUTES
router.post("/resolve/perf/test", accountValidator.resolveAccount, accountController.resolveAccountPerfTest);

module.exports = router;

To understand the implementation details of other imported files, refer to the source code on GitHub. Also, note that we have deployed our app on Heroku so as to test our implementation in a simulated live environment. The base URL is https://node-perf-with-caching-app.herokuapp.com. While it’s not strictly required, feel free to read up on deploying your own Node.js app to Heroku as well.

Now it’s time to test our API endpoints to witness the performance benefits of caching firsthand. Before we begin, when I first tested out the API, it took a whooping 9.22s to return a response. I’m assuming it was due to the usual cold starts on Heroku. See image below:

API Testing Result No Caching
Fig. 1: Result of testing the API without caching.

Since this was our first request, and due to the fact that we only ran a keep-alive script on the Redis connection/cluster, our Dyno might have been inactive at that time I made the request. After the first request, I deleted the bank lists from the cache and made another fresh request. This is the result:

API Testing Result Initial Cold Start
Fig. 2: Result after the initial cold start.

As we can see from the screenshot above, we got a faster response cycle on the second request, which came in at about 896ms. Way faster, right? Note that after this initial request, we now have our bank lists in our Redis cache.

Making the request a second time does not go all the way to query the external Paystack API; it simply fetches the data from our cache, which is way more efficient. See below:

API Result Caching Banks List
Fig. 3: Result from our API after caching the banks list.

As we can see, after caching the banks list, we got a much faster response from our API: just about 271ms.

In summary, the idea is to compare how making an external API call to a third-party service performs against caching the results rather than making that same API call again. We’ve now seen the results for ourselves.

N.B., The POSTMAN collection used in testing out other routes and evaluating their speed/performance metric can be found here.

As another example, I used Redis to store a one-time password (OTP) token generated on a backend app. This method offers a faster lookup than saving that token to a database and querying the record when a user inputs a value on the client app. Here’s a sample snippet:

exports.verifyOTP = async (req,res) => {
const {otp} = req.body
// get the OTP stored in the cache earlier
const getOTPDetails = await cache.getAsync('otpKey');
const otpCode = JSON.parse(getDetails).code;
if (Number(otp) == Number(otpCode) ) {
    return sendResponse(res, "OTP Verified Successfully", 201)
}
return sendErrorResponse(res, "Unable to verify OTP. Please check the code sent to your email gain", 400)
}

Also, don’t forget that the OTP must have been set in the cache at the point of generating or requesting for the OTP.

Conclusion

Caching is a near-mandatory operation for any data-intensive application. It improves app response time and even reduces costs associated with bandwidth and data volumes. It helps to minimize expensive database operations, third-party API calls, and server-to-server requests by saving a copy of previous request results locally on the server.

In some cases, we might need to delegate caching to another application or key-value storage system to allow us to store and use data as we need it. Redis is one such option that we can also use for caching. It supports some nice features, including data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, HyperLogLogs, and others.

In this tutorial, we reviewed a quick introduction to caching strategies to improve our Node.js application performance. While there are other tools for this purpose, we chose Redis since it is open source and very popular in the industry. Now we can go forth and use Redis to cache frequently queried data in our applications and gain a considerable performance improvement. Until next time! 🙂

200’s only Monitor failed and slow network requests in production

Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third party services are successful, try LogRocket. https://logrocket.com/signup/

LogRocket is like a DVR for web apps, recording literally everything that happens on your site. Instead of guessing why problems happen, you can aggregate and report on problematic network requests to quickly understand the root cause.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. .
Alexander Nnakwue Software engineer. React, Node.js, Python, and other developer tools and libraries.

Leave a Reply