Ankit Jain DevOps Engineer @razorpay | GSoC18 student @drupal | #GitHubCampusExpert | GCI mentor @drupal @jbossorg | GSoC19 mentor @drupal @jenkinsci | Speaker | Writer

How to reduce Docker Image sizes using multi-stage builds

4 min read 1203

Introduction

Docker is one of the most important technologies in enterprises nowadays. Most tech companies are using Docker to improve the deployment strategy of its products and services, making them robust and scalable. In this article, we will look at one of the most promising features of writing Dockerfiles efficiently to reduce the final image size. But first, let’s understand a bit about Docker.

What is Docker?

Docker is containerization of the applications similar to VMs, but very lightweight (read the complete article on Docker vs Virtual Machines). Docker is a tool to easily create, deploy, and run applications by using containers that are independent of the OS.

A container packages the application services or functions with all of the libraries, configuration files, dependencies, and other necessary parts to operate. Each container shares the services of one underlying operating system.

What are these Docker Images?

Docker Images are the set of instructions written in a file called Dockerfile. These instructions act as a multi-layered filesystem in Docker. When a Docker user runs the images, it produces one or multiple containers.

We can also say that Docker Images are immutable files, basically a snapshot of the container. We can make n number of containers from a single Docker image similar to OOPs concept of creating n objects instances (which share common characteristics and behavior) from a single Class.

Like I said earlier, Dockerfile contains the set of instructions which acts as a multi-layer filesystem. The more instructions we have (for example RUN, COPY, ADD)  in our Dockerfile, the greater the final size of the image. There are many other things that are responsible for increasing the size of the image, like the context, base image, unnecessary dependencies, packages, and a number of instructions.

Why reduce the size of Docker Images?

Why do we need to reduce the size of the Docker image in this modern era of tech, where memory and storage are relatively cheap?

By reducing the Docker image size, we keep only the required artifacts in the final image and remove all the unnecessary data. It is also necessary because:

  • First and foremost, it’s best practices
  • Installing and keeping unnecessary dependencies in your image increases complexity and chances of vulnerability in your application
  • It will take a lot of time to download and spawn the containers
  • It will also take a lot of time to create and push the image to the registry and ends up blocking our CI/CD pipelines
  • Sometimes, we end up leaving keys and secrets in the Dockerfile due to build context
  • To make the container immutable (yeah you read that right) we can’t even edit a file in the final container. That’s why we use CoreOS instances

How to reduce the size of Docker Images

Reducing Docker Images is something we should know how to do to keep our application secure and stick with the proper industry standards and guidelines.

There are a lot of ways to do this, including:

  • Use a .dockerignore file to remove unnecessary content from the build context
  • Try to avoid installing unnecessary packages and dependencies
  • Keep the layers in the image to a minimum
  • Use alpine images wherever possible
  • Use Multi-Stage Builds, which I am going to talk about in this article.

Let’s move to Multi-Stage Builds 🤘

Multi-stage builds in Docker

Multi-stage builds in Docker are a new feature introduced in Docker 17.05. It is a method to reduce the image size, create a better organization of Docker commands, and improve the performance while keeping the Dockerfile easy to read and understand.

The multi-stage build is the dividing of Dockerfile into multiple stages to pass the required artifact from one stage to another and eventually deliver the final artifact in the last stage. This way, our final image won’t have any unnecessary content except our required artifact.

Previously, when we didn’t have the multi-stage builds feature, it was very difficult to minimize the image size. We used to clean up every artifact (which isn’t required) before moving to the next instruction as every instruction in Dockerfile adds the layer to the image. We also used to write bash/shell scripts and apply hacks to remove the unnecessary artifacts.

Let’s look at an example:

This is just the one instruction of the Dockerfile in which we need to download the abc.tar.gz file from some http://xyz.com website and extract the content and run make install.

In the same instruction, we stored the content of the make install command to /tmp dir and removed the remaining data like the downloaded tar file and extracted tar contents so that we can only have the content of the make install command, which is required for our further processing.

That’s all the stuff we have to do in one instruction to reduce the size of the final image. Now we can imagine the complexity of the Dockerfile for n number of instructions.

Ohh wait..wait..wait..!!! Now we have the power of multi-stage builds with which we can reduce the size of the image without compromising the readability of the Dockerfile.

Let’s look at the same example using multi-stage build:

Here in this Dockerfile, we are using ubuntu:16.04 as a base image and called this stage as stage1 and executed some instructions as follows:

  1. Run apt-get update to update the packages
  2. Run apt-get -y install make curl to install make and curl packages
  3. We downloaded the abc.tar.gz file from http://xyz.com using curl
  4. Untar the abc.tar.gz file and change the directory to abc
  5. Run the make DESTDIR=/tmp install command to store the output to tmp directory
  6. Rather than removing the unnecessary artifacts, we created another stage i.e stage 2 with alpine:3.10 as the base image because it is lighter
  7. We copied the content from the /tmp dir from stage1 to /abc dir in stage2 by simply running COPY --from=stage1 /tmp /abc command
  8. Finally, we added the path of the binary in the Entrypoint to run it

This way, we copied the required artifact from stage 1 to stage 2 without compromising the Dockerfile and successfully created the most optimized and reduced image. Similarly, we can use multi-stage builds to create a static build for the frontend files and pass the static files to stage 2 where we can use nginx base image to host them without keeping the large, bulky node_modules in our app which is of no use after the static build.

Conclusion

We can also use external Docker images as a stage and can also stop at a specific build stage. It is not always useful as we lost the previous stage intermediate containers so we won’t be able to leverage build cache in Docker. Read more about the multi-stage build from Docker official docs.

In this article, we looked at what Docker is, why we need to reduce the size of images, and how can we do this using multi-stage builds effectively. I hope this article helped you understand Docker and its multi-stage builds feature.

Feel free to comment and ask me anything. You can follow me on Twitter and Medium. Thanks for reading! 👍

 

 

Plug: , a DVR for web apps

LogRocket is a frontend application monitoring solution that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.

In addition to logging Redux actions and state, LogRocket records console logs, JavaScript errors, stacktraces, network requests/responses with headers + bodies, browser metadata, and custom logs. It also instruments the DOM to record the HTML and CSS on the page, recreating pixel-perfect videos of even the most complex single-page apps.

.
Ankit Jain DevOps Engineer @razorpay | GSoC18 student @drupal | #GitHubCampusExpert | GCI mentor @drupal @jbossorg | GSoC19 mentor @drupal @jenkinsci | Speaker | Writer

Leave a Reply