Keeping the docker image size down is one of the most challenging things in Docker because if the image size increases, it becomes difficult to maintain the image and, ultimately, the container that executes the image. When each instruction executes in Dockerfile, a new layer adds to the image. Hence to optimize Dockerfiles so that we can keep image size smaller, we go for multi-stage builds in docker, making Dockerfiles easy to read and maintain. Docker has introduced multi-stage builds in its latest release. In this article, we will discuss multi-stage builds in Docker in detail, covering the following topics.
- What are the issues with a single-stage build scenario in Docker?
- What are multi-stage builds in Docker?
- How to name build stages?
- How do we stop a specific build stage?
- Using an external image as a “stage”.
- How to use a previous stage as a new stage in multi-stage Docker builds?
- What are the benefits of multi-stage builds in Docker?
- When not to use Multi-Stage builds?
Issues with a single-stage build scenario in Docker
Earlier, applications commonly used the "builder pattern" with Dockerfile. We would have only one Dockerfile for development, and it contained everything we needed to build an application. Then we slimmed down this Dockerfile to use for production such that it contained only the application and things required to run it. This design basis is on a 'builder pattern'.
Using builder pattern, the Dockerfile looks like this:
COPY app.go .
RUN go get -d -v golang.org/x/net/html \
&& CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
If you notice the above Dockerfile, we have compressed two RUN command using '&&' (Bash operator). It helps us prevent creating an additional layer in the image. Therefore, keeping the image size smaller. But this design is failure-prone and challenging to maintain, especially when the number of RUN commands increases. We may also forget to include the "\" operator, in which case each image will take up space on the local machine.
The remedy to this situation is by using multi-stage builds, which we are going to discuss next.
What are multi-stage builds in Docker?
Multi-stage builds in Docker are the ones in which we use multiple "FROM" statements in the Dockerfile. Each FROM instruction in the Dockerfile uses a different base and begins a new stage of the build. In multi-stage builds, we can selectively copy artifacts from one stage to another, thus skipping everything that we do not need in the final image. Let us consider the following Dockerfile.
FROM golang : 1 . 7 . 3
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine : latest
RUN apk - -no-ache add ca-certificates
COPY - -from=0 /g/src/github.com/alexellis/href-counter/app .
The above file is an example of a multi-stage build. We can see here we have two FROM statements/instructions that execute two containers, "golang" and "alpine". Thus we only need a single Dockerfile and just run "docker build".
& docker build -t alexellis2/href-counter:latest .
Here we need not create any intermediate images and also extract any artifacts to the local machine. The result is a tiny production image with a significant reduction in complexity.
So how does the above Dockerfile work?
In the above Dockerfile, the second "FROM" instruction starts a new build stage with "alpine: latest" as its base. The statement "COPY --from=0" will copy the built artifact from the earlier stage into the new stage. Note that we have not included Go SDK or any other intermediate artifacts in the final image.
How to name build stages?
The stages are not named in multi-stage builds by default. We usually refer to them by their integer number starting with 0(zero) for the first 'FROM' instruction and so on. However, we can name each stage by adding "AS<NAME>" to the FROM instruction. Then we can use this stage name in the COPY instruction. This way, COPY doesn't break even if we decide to reorder the instructions in the Dockerfile at a later point.
As an example, in the following code, we name this stage as "golang_builder".
FROM golang:1.7.3 AS golang_builder WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go . RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest RUN apk --no-cache add ca-certificates WORKDIR /root/
COPY --from=golang_builder /go/src/github.com/alexellis/href-counter/app . CMD ["./app"
So as shown in the above example, we can conveniently name various stages in multi-stage builds and later use these stage names instead of stage numbers.
How do we stop a specific build stage?
When a Dockerfile has so many stages or statements, we may not necessarily have to build an entire Dockerfile with all stages. We can also specify the target build stage and stop other stages. It is as shown in the following command.
$ docker build --target golang_builder -t alexellis2/href-counter:latest .
Here, the above command uses the previous Dockerfile we have shown and stops at the stage named golang_builder. So here are a few scenarios where this stop build might be very powerful :
- To debug a specific build stage. For this, we stop the build at a particular stage.
- We can use it at a debug stage with all debugging symbols or tools enabled and a lean production stage.
- We can also use it in a testing stage in which the app populates with test data and building for production using different stages using real data.
Using an external image as a “stage.”
We are not limited to just copying from stages created earlier in the Dockerfile when it comes to multi-stage builds. We can also copy from a separate image, either a local image or tag available locally or global docker registry or using a tag ID. We achieve it using the instruction "COPY --from". The Docker client then pulls the image if required and copies the artifact from there. For example, consider the following command:
COPY - -from = nginx:latest /etc/nginx.conf /nginx.conf
In the above command, we are copying an image from an external source. /etc/nginx/nginx.conf.
How to use a previous stage as a new stage in multi-stage Docker builds?
Yet another feature of multi-stage builds is we can pick up where a previous stage left off. It is done by referring to the previous stage when using the "FROM" directive. Consider the following multi-stage build Dockerfile:
FROM alpine:latest as builder
RUN apk --no-cache add build-base
FROM builder as build1
COPY source1.cpp source.cpp
RUN g++ -o /binary source.cpp
FROM builder as build2
COPY source2.cpp source.cpp
RUN g++ -o /binary source.cpp
In the above Dockerfile, we have a builder stage which is the first stage. In subsequent FROM directives, we pick up from this stage to go further with COPY and RUN.
What are the benefits of multi-stage builds in Docker?
Some of the benefits of having multi-stage builds in Docker are as follows:
- Multi-stage builds are ideal for deploying production-ready applications.
- Multi-stage builds work with only one Dockerfile.
- It allows us to build smaller images, and Dockerfile separates them into various build stages.
- We have a uniform syntax to learn.
- Muti-stage builds work on local machines as well as on the CI (Continuous Integration) server.
When not to use Multi-stage builds?
So far in this article, we have discussed the situations when we use multi-stage builds. While multi-stage builds undoubtedly provide us consistency across build and execution environments, there are some aspects we should consider and decide "not to" use multi-stage builds. We summarise some of these aspects below.
- Suppose we want to keep our Dockerfile simple and easy to read. In that case, we may not use multi-stage builds, especially in a development environment wherein developers are not used to such complexities. It is because multi-stage builds in Docker increase the physical size and logical organization of Dockerfile.
- The advantages of multi-stage builds are minimal when the number of containers is few. The multi-stage builds make a difference only when there are many containers like in CI/CD.
So we should try out multi-stage builds considering the number of containers we are using, the complexity of Dockerfile, etc.
- The multi-stage builds usually work with only one Dockerfile. In this Dockerfile, we use multiple FROM statements to define various stages of builds.
- Using multi-stage builds, we can limit the size of the image we create. In single-stage builds, with each instruction executed, a new layer gets added to the image, making it bulky.
- With multi-stage builds, we can name a particular stage, stop the build at a specific stage, use an external image, or switch between stages.