Docker is one of the most important technologies in enterprises nowadays. Most tech companies are using Docker to improve the deployment strategy of its products and services, making them robust and scalable. In this article, we will look at one of the most promising features of writing Dockerfiles efficiently to reduce the final image size. But first, let’s understand a bit about Docker.
What is Docker?
Docker is containerization of the applications similar to VMs, but very lightweight (read the complete article on Docker vs Virtual Machines). Docker is a tool to easily create, deploy, and run applications by using containers that are independent of the OS.
A container packages the application services or functions with all of the libraries, configuration files, dependencies, and other necessary parts to operate. Each container shares the services of one underlying operating system.
What are these Docker Images?
Docker Images are the set of instructions written in a file called
Dockerfile. These instructions act as a multi-layered filesystem in Docker. When a Docker user runs the images, it produces one or multiple containers.
We can also say that Docker Images are immutable files, basically a snapshot of the container. We can make n number of containers from a single Docker image similar to OOPs concept of creating n objects instances (which share common characteristics and behavior) from a single Class.
Like I said earlier, Dockerfile contains the set of instructions which acts as a multi-layer filesystem. The more instructions we have (for example
ADD) in our Dockerfile, the greater the final size of the image. There are many other things that are responsible for increasing the size of the image, like the context, base image, unnecessary dependencies, packages, and a number of instructions.
Why reduce the size of Docker Images?
Why do we need to reduce the size of the Docker image in this modern era of tech, where memory and storage are relatively cheap?
By reducing the Docker image size, we keep only the required artifacts in the final image and remove all the unnecessary data. It is also necessary because:
- First and foremost, it’s best practices
- Installing and keeping unnecessary dependencies in your image increases complexity and chances of vulnerability in your application
- It will take a lot of time to download and spawn the containers
- It will also take a lot of time to create and push the image to the registry and ends up blocking our CI/CD pipelines
- Sometimes, we end up leaving keys and secrets in the Dockerfile due to build context
- To make the container immutable (yeah you read that right) we can’t even edit a file in the final container. That’s why we use CoreOS instances
How to reduce the size of Docker Images
Reducing Docker Images is something we should know how to do to keep our application secure and stick with the proper industry standards and guidelines.
There are a lot of ways to do this, including:
- Use a .dockerignore file to remove unnecessary content from the build context
- Try to avoid installing unnecessary packages and dependencies
- Keep the layers in the image to a minimum
- Use alpine images wherever possible
- Use Multi-Stage Builds, which I am going to talk about in this article.
Let’s move to Multi-Stage Builds 🤘
Multi-stage builds in Docker
Multi-stage builds in Docker are a new feature introduced in Docker 17.05. It is a method to reduce the image size, create a better organization of Docker commands, and improve the performance while keeping the Dockerfile easy to read and understand.
The multi-stage build is the dividing of Dockerfile into multiple stages to pass the required artifact from one stage to another and eventually deliver the final artifact in the last stage. This way, our final image won’t have any unnecessary content except our required artifact.
Previously, when we didn’t have the multi-stage builds feature, it was very difficult to minimize the image size. We used to clean up every artifact (which isn’t required) before moving to the next instruction as every instruction in Dockerfile adds the layer to the image. We also used to write bash/shell scripts and apply hacks to remove the unnecessary artifacts.
Let’s look at an example:
This is just the one instruction of the Dockerfile in which we need to download the
abc.tar.gz file from some
http://xyz.com website and extract the content and run
In the same instruction, we stored the content of the
make install command to
/tmp dir and removed the remaining data like the downloaded
tar file and extracted
tar contents so that we can only have the content of the
make install command, which is required for our further processing.
That’s all the stuff we have to do in one instruction to reduce the size of the final image. Now we can imagine the complexity of the Dockerfile for n number of instructions.
Ohh wait..wait..wait..!!! Now we have the power of multi-stage builds with which we can reduce the size of the image without compromising the readability of the Dockerfile.
Let’s look at the same example using multi-stage build:
Here in this Dockerfile, we are using
ubuntu:16.04 as a base image and called this stage as
stage1 and executed some instructions as follows:
apt-get updateto update the packages
apt-get -y install make curlto install make and curl packages
- We downloaded the
- Untar the
abc.tar.gzfile and change the directory to
- Run the
make DESTDIR=/tmp installcommand to store the output to
- Rather than removing the unnecessary artifacts, we created another stage i.e stage 2 with
alpine:3.10as the base image because it is lighter
- We copied the content from the
stage2by simply running
COPY --from=stage1 /tmp /abccommand
- Finally, we added the path of the binary in the
Entrypointto run it
This way, we copied the required artifact from stage 1 to stage 2 without compromising the Dockerfile and successfully created the most optimized and reduced image. Similarly, we can use multi-stage builds to create a static build for the frontend files and pass the static files to stage 2 where we can use nginx base image to host them without keeping the large, bulky
node_modules in our app which is of no use after the static build.
We can also use external Docker images as a stage and can also stop at a specific build stage. It is not always useful as we lost the previous stage intermediate containers so we won’t be able to leverage build cache in Docker. Read more about the multi-stage build from Docker official docs.
In this article, we looked at what Docker is, why we need to reduce the size of images, and how can we do this using multi-stage builds effectively. I hope this article helped you understand Docker and its multi-stage builds feature.
Plug: LogRocket, a DVR for web apps
LogRocket is a frontend logging tool that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.