2023-01-09

#chrome#docker#node

Tigran Bayburtsyan

12406

Jan 9, 2023 ⋅ 5 min read

Setting up a Headless Chrome Node.js server in Docker

Tigran Bayburtsyan Cofounder and CTO at Hexact, Inc. Software scalability expert!

Editor’s note: This guide to setting up a Headless Chrome Node.js server in Docker was last updated on 9 January 2023 to update any outdated code, further explain the breakdown of the Dockerfile steps, and include more interactive code examples. To learn more about Docker, visit our archives here.

Headless browsers have become very popular with the rise of automated UI tests in the application development process. There are also countless use cases for website crawlers and HTML-based content analysis.

For 99 percent of these cases, you don’t need a browser GUI because it is fully automated. Running a GUI is more expensive than spinning up a Linux-based server or scaling a simple Docker container across a microservices cluster, such as Kubernetes.

But I digress. It has become increasingly critical to have a Docker container-based headless browser to maximize flexibility and scalability. In this tutorial, we’ll demonstrate how to create a Dockerfile to set up a Headless Chrome browser in Node.js.

Jump ahead:

Headless Chrome with Node.js
Headless Chrome inside a Docker container
Common problems with Headless Chrome

Headless Chrome with Node.js

Node.js is the main language interface used by the Google Chrome development team, and it has an almost native integrated library for communicating with Chrome called Puppeteer. This library uses WebSocket or a system pipe-based protocol over a Chrome DevTools interface, which can take screenshots, and measure page load metrics, connection speeds, downloaded content size, and more.

You can use the Puppeteer library to use Headless Chrome with Node.js. Puppeteer is a Node.js library that provides a high-level API to control Chrome or Chromium over the Chrome DevTools Protocol.

You can test your UI on different device simulations and take screenshots. Most importantly, Puppeteer doesn’t require a running GUI. In fact, it can all be done in a headless mode.

Here’s an implementation to use Puppeteer to control Headless Chrome and navigate to a website:

// Filename: server.js

const express = require('express');
const puppeteer = require('puppeteer');

const app = express();

app.get('/screenshot', async (req, res) => {
    console.log('Taking screenshot');
    const browser = await puppeteer.launch({
        headless: true,
        executablePath: '/usr/bin/google-chrome',
        args: [
            "--no-sandbox",
            "--disable-gpu",
        ]
    });
    const page = await browser.newPage();
    await page.goto('https://www.google.com');
    const imageBuffer = await page.screenshot();
    await browser.close();

    res.set('Content-Type', 'image/png');
    res.send(imageBuffer);
    console.log('Screenshot taken');
});

app.listen(3000, () => {
    console.log('Listening on port 3000');
});

The simple actionable code for taking a screenshot over Headless Chrome is shown above. This script will launch a Headless Chrome instance, navigate to Google, and take a screenshot of the page. The screenshot will be saved to the current directory.

The code also creates an Express.js app with a single route, /screenshot, which uses Headless Chrome to take a screenshot of this, then sends the image data back to the client in the HTTP response.

To use this route, you can make an HTTP request to http://localhost:3000/screenshot, and the server will respond with an image. You can also customize the route to accept query parameters or use other HTTP methods to control the behavior of the screenshot.

Headless Chrome inside a Docker container

Running a browser inside a container seems simple based on the code above, but it’s important not to overlook security. By default, everything inside a container runs under the root user, and the browser executes JavaScript files locally.

Dockerfile for the Google Chrome setup

To continue with this tutorial, make sure you have Docker v17.03 or later and a working browser (preferably Google Chrome).

Of course, Google Chrome is secure and doesn’t allow users to access local files from the browser-based script, but there are still potential security risks. You can minimize many of these risks by creating a new user for executing the browser itself. Google also has sandbox mode enabled by default, which restricts external scripts from accessing the local environment.

Below is the Dockerfile sample responsible for the Google Chrome setup:

# Filename: Dockerfile

FROM node:slim

# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true

# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install gnupg wget -y && \
    wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \
    sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \
    apt-get update && \
    apt-get install google-chrome-stable -y --no-install-recommends && \
    rm -rf /var/lib/apt/lists/*

# FROM public.ecr.aws/lambda/nodejs:14.2022.09.09.11
# Create working directory
WORKDIR /usr/src/app

# Copy package.json
COPY package.json ./

# Install NPM dependencies for function
RUN npm install

# Copy handler function and tsconfig
COPY server.js ./

# Expose app
EXPOSE 3000

# Run app
CMD ["node", "server.js"]

Breakdown of the Dockerfile steps

This Dockerfile creates a Docker image that runs a Node.js server using Headless Chrome. Here’s a breakdown of the different steps in the Dockerfile:

FROM node:slim: This specifies the base image for the Docker image. The slim variant of the Node.js image is a smaller version of the official Node.js image that includes only the essential packages
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true: Sets an environment variable that tells Puppeteer to skip downloading Chromium
The next block of commands installs Google Chrome Stable and the necessary fonts to make it work with Puppeteer
WORKDIR /usr/src/app: Sets the working directory for the Docker image. This is where the rest of the files and commands in the Dockerfile will be executed
COPY package.json ./: This copies the package.json file to the working directory
RUN npm install: Installs the dependencies listed in the package.json file
COPY server.js ./: Copies the server.js file to the working directory
EXPOSE 3000: Exposes port 3000 on the Docker container. This allows you to access the server from the host machine on port 3000
CMD ["node", "server.js"]: Starts the server by running the server.js script with Node.js

Building the Docker image

To build and run the Docker image, you can use the following command:

docker build -t headless-chrome .

Building a Docker Image in Node.js

If you encounter platform-related issues, for example, running it on a macOS, you can use the following command:

docker build --platform linux/amd64 -t headless-chrome .

To run the built image, use the following:

docker run --rm -p 3000:3000 headless-chrome

This will build the Docker image and run a new container based on the image. The server will start and listen for requests on port 3000. You can access the server from the host machine by visiting http://localhost:3000 in a web browser. Here’s what it will look like:

Example of a Docker Image Running Headless Chrome

Docker Container Log Showing Where the Screenshot was Taken

As seen in the Docker container logs, you can verify that the screenshot was taken. Below is a video that shows more interactively how the screenshots are taken:

Headless Chrome Demo

No Description

Taking screenshots is fun, but there are countless other use cases. Fortunately, the process described above applies to almost all of them. For the most part, only minor changes to the Node.js code would be required. The rest is pretty standard environmental setup.

Common problems with Headless Chrome

Google Chrome eats a lot of memory during execution, so it’s no surprise that Headless Chrome does the same on the server side. If you keep a browser open and reuse the same browser instance many times, your service will eventually crash.

The best solution is to follow the principle of one connection, one browser instance. While this is more expensive than managing multiple pages per browser, sticking to just one page and one browser will make your system more stable. Of course, this depends on personal preference and your particular use case. Depending on your unique needs and goals, you may be able to find a middle ground.

Take, for example, the official website for the performance monitoring tool, Hexometer. The environment includes a remote browser service that contains hundreds of idle browser pools. These are designed to pick up new connections over WebSocket when there is a need for execution, but it strictly follows the principle of one page, one browser. This makes it a stable and efficient way to not only keep running browsers idle but keep them alive.

Puppeteer connection over WebSocket is pretty stable, and you can do something similar by making a custom service like browserless.io (there is an open source version as well).

This will connect to the Headless Chrome DevTools socket using the same browser management protocol:

// Filename: server.js
// ...
// ...

const browser = await puppeteer.launch({
    browserWSEndpoint: `ws://repo.treescale.com:6799`,
});

// ...
// ...

Conclusion

Having a browser running inside a container provides a lot of flexibility and scalability. It’s also a lot cheaper than traditional virtual machine-based instances. Now, we can simply use a container service such as AWS Fargate or Google Cloud Run to trigger container execution only when we need it and scale to thousands of instances within seconds.

The most common use case is still making UI automated tests with Jest and Mocha. But if you consider that you can actually manipulate a full webpage with Node.js inside a container, the use cases are only limited by your imagination.

You can find source codes used in this GitHub repository.

200s only Monitor failed and slow network requests in production

Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third-party services are successful, try LogRocket.

LogRocket is like a DVR for web and mobile apps, recording literally everything that happens while a user interacts with your app. Instead of guessing why problems happen, you can aggregate and report on problematic network requests to quickly understand the root cause.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.

Drizzle vs. Prisma: Which ORM is best for your project?

Compare Prisma and Drizzle ORMs to learn their differences, strengths, and weaknesses for data access and migrations.

Temitope Oyedele

Nov 21, 2024 ⋅ 10 min read

Practical implementation of the Rule of Least Power for developers

It’s easy for devs to default to JavaScript to fix every problem. Let’s use the RoLP to find simpler alternatives with HTML and CSS.

Timonwa Akintokun

Nov 21, 2024 ⋅ 8 min read

Handling memory leaks in Rust

Learn how to manage memory leaks in Rust, avoid unsafe behavior, and use tools like weak references to ensure efficient programs.

Ukeje Goodness

Nov 20, 2024 ⋅ 4 min read

Using curl-impersonate in Node.js to avoid blocks

Bypass anti-bot measures in Node.js with curl-impersonate. Learn how it mimics browsers to overcome bot detection for web scraping.

Antonello Zanini

Nov 20, 2024 ⋅ 13 min read

View all posts

One Reply to "Setting up a Headless Chrome Node.js server in Docker"

Mark Aquino says:

May 8, 2020 at 3:15 pm

You didn’t provide commands in the Dockerfile to COPY the server.js file to the container or run it or expose its ports

Reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

Setting up a Headless Chrome Node.js server in Docker

Headless Chrome with Node.js

Headless Chrome inside a Docker container

Dockerfile for the Google Chrome setup

Breakdown of the Dockerfile steps

Building the Docker image

Headless Chrome Demo

Common problems with Headless Chrome

Conclusion

200s only Monitor failed and slow network requests in production

Stop guessing about your digital experience with LogRocket

Recent posts:

Drizzle vs. Prisma: Which ORM is best for your project?

Practical implementation of the Rule of Least Power for developers

Handling memory leaks in Rust

Using curl-impersonate in Node.js to avoid blocks

One Reply to "Setting up a Headless Chrome Node.js server in Docker"

Leave a ReplyCancel reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →

Headless Chrome with Node.js

Headless Chrome inside a Docker container

Dockerfile for the Google Chrome setup

Breakdown of the Dockerfile steps

Building the Docker image

Headless Chrome Demo

Common problems with Headless Chrome

Conclusion

200s only Monitor failed and slow network requests in production

Share this:

Stop guessing about your digital experience with LogRocket

Recent posts:

Drizzle vs. Prisma: Which ORM is best for your project?

Practical implementation of the Rule of Least Power for developers

Handling memory leaks in Rust

Using curl-impersonate in Node.js to avoid blocks

One Reply to "Setting up a Headless Chrome Node.js server in Docker"

Leave a ReplyCancel reply