NashTech Blog

Docker Multistage Builds: How to Optimize Your Images

Picture of nhannguyenh
nhannguyenh
Table of Contents

Multistage Docker image builds allow you to simplify your Dockerfiles and improve build efficiency. They let you reference more than one base image in your Dockerfile and copy only the content you need into your final image.

In this article, I will examine Docker’s multistage build features in detail. I will also show how to create a multistage Dockerfile and discuss some key best practices for reducing build times. Let’s begin by learning exactly how multistage builds differ from regular builds.

What are multistage Docker image builds?

Docker images are filesystem templates that define the initial state of Docker containers. They’re like blueprints that contain the binaries, source code, and runtimes needed by the containerized application, as well as any dependencies and other related files.

Docker images are created from Dockerfiles. A Dockerfile is a list of instructions that assemble an image’s filesystem by copying files and running commands. Dockerfiles usually start with a FROM instruction that references an existing image to use as the build’s starting point. 

The rest of the instructions are then applied on top of this base image:

FROM httpd:alpine
COPY build/ /usr/local/apache2/htdocs

In the example above, the Dockerfile selects the httpd:alpine image as its base image. It then copies the contents of the build/ path in your working directory to /usr/local/apache2/htdocs within the image’s filesystem.

Each FROM instruction starts a new build stage with its own filesystem and layer history. Stages are isolated unless you explicitly copy artifacts between them. The sample Dockerfile above contains only one stage, but you can create multiple stages by writing several FROM instructions. This allows you to use more than one base image in your build:

FROM first-image:latest as build
COPY files-to-build/ /build
RUN build-script --output /out

FROM second-image:latest as final
COPY --from=build /out/build/ /app
COPY extra-files/ /app

The sample Dockerfile above contains two distinct stages: build and final. Each stage uses a different base image. 

The first stage builds some output that’s then copied into the second stage’s environment, but the other files from first-image:latest aren’t included. This helps reduce the file size of the output image. The COPY --from instruction specifies the name of the stage that contains the files you’re copying.

Use cases and benefits for multistage Docker builds

Multistage builds improve both the Docker build process and Dockerfile maintainability in several ways:

  • Easy access to resources from multiple base images: Multistage builds allow you to use resources from several base images in a single Docker build, such as a build system, testing tools, and then a separate runtime environment.
  • Run multi-step build processes to produce a final image: This approach enables you to model your full build process in one Dockerfile. For instance, you can fetch dependencies and build your source code, then copy the compiled output into a final image layer that uses a smaller base image.
  • Reduce Dockerfile complexity: Using multiple named stages can help you organize and simplify your Dockerfiles. Whereas complex build processes historically required several Dockerfiles and the use of intermediary build helper images, multistage builds enable you to wrap everything into a single Dockerfile.
  • Improve build efficiency: Multistage builds can reduce image sizes and increase build efficiency. Your final image can use a lightweight base image, then selectively copy the files it needs from earlier build stages. Docker’s layer caching system can reuse unchanged layers within each stage, as long as the instruction order and inputs remain the same.

Multistage builds are a good fit whenever your image build process involves more than one base image, multiple steps, or large build tools that you don’t need to keep in the final image. By using a multistage build, you can stick to a single Dockerfile while still optimizing build times and layer cache efficiency.

How to create multistage Docker images

Let’s look at how to use a multistage Dockerfile in a realistic scenario.

# Stage 1: Build
FROM maven:3.8.5-openjdk-17 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn clean package -DskipTests

# Stage 2: Runtime
FROM openjdk:17-jdk-alpine AS runtime
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Here’s a deeper breakdown of how the two stages work:

Build Stage (build)

  • Base image: maven:3.8.5-openjdk-17
    • Includes the JDK and Maven CLI, ideal for building Java applications.
  • WORKDIR /app sets the working directory for subsequent commands.
  • COPY pom.xml . then mvn -B dependency:go-offline
    • Installs all dependencies declared in pom.xml.
    • Because only pom.xml is copied at this point, this layer can be heavily cached: if source code changes but dependencies don’t, Docker can skip re-downloading them.
  • COPY src ./src brings in the application’s source code.
  • RUN mvn -B clean package -DskipTests compiles and packages the application into a JAR under target/.

By copying pom.xml first and resolving dependencies before copying the source code, we allow Docker to cache the dependency layer. As a result, dependency downloads are skipped unless pom.xml changes, significantly speeding up rebuilds during development.

Runtime Stage (runtime)

  • Base image: openjdk:17-jdk-alpine
    • Much smaller than the Maven image and suitable for production use.
  • COPY --from=build /app/target/*.jar app.jar copies only the packaged JAR from the build stage.
  • EXPOSE 8080 documents that the container listens on port 8080.
  • ENTRYPOINT ["java", "-jar", "app.jar"] defines the default container process.

Why This Structure Is Efficient?

  • The build tooling (Maven, JDK, caches) is present only in the build stage, not in the final image.
  • The final image includes only:
    • A JDK runtime
    • Your app.jar
    • Minimal OS layers from Alpine
  • Dependency download is cached as long as pom.xml doesn’t change; source code changes don’t invalidate the dependency layer.

Conclusion

Multistage Docker builds use multiple Dockerfile FROM instructions to reference content from more than one base image. You can selectively copy just the files you need from each stage into your final image. This allows you to implement complex build processes using a single Dockerfile, without making your final image excessively large.

Adopting multistage Dockerfiles can improve the speed, simplicity, and ease of maintenance of your Docker builds. However, it’s also important to implement other Dockerfile best practices to ensure your builds run as smoothly as possible. 

In modern CI/CD pipelines, multistage Docker builds are no longer an optimization – they are the default approach for producing small, secure, and production-ready images.

Picture of nhannguyenh

nhannguyenh

Leave a Comment

Suggested Article

Discover more from NashTech Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading