Optimizing Docker Images with Multi-Stage Builds: A Lean and Secure Approach

Wed Nov 1, 2023

Introduction

Multi-stage builds in Dockerfiles are a feature that allows you to create more efficient and smaller Docker images by using multiple build stages.

This is particularly useful when you're working with complex applications or software stacks where the development and build environment differs from the production environment.

Multi-stage builds help to separate the build tools and dependencies from the final production image, resulting in a smaller and more secure final image.

Prerequisites

Before diving into this comprehensive guide on Multi-Stage Builds, it's essential to ensure you have the following prerequisites:

Basic Understanding of Docker: This blog assumes that you have a foundational knowledge of what Docker is and its core concepts.
Docker Server: To follow along with the hands-on examples in this blog, you'll need access to a Docker server.

If you're new to Docker, it's recommended to begin with my earlier blog posts:

Understanding Multi-Stage Builds

Multi-stage builds are a feature introduced in Docker to address a common problem in containerization: image size. Traditional Dockerfiles can produce large, bloated images because they include all the build tools, libraries, and dependencies needed during the build process. However, these components are unnecessary in the final runtime image and can significantly increase the image's size.

Multi-stage builds offer a solution to this problem by allowing you to use multiple "stages" within a single Dockerfile. Each stage represents a distinct phase of the build process, and you can copy artifacts from one stage to another. This separation of concerns enables you to create a minimal runtime image while keeping all the necessary build tools and dependencies in earlier stages.

Benefits of Multi-Stage Builds

Smaller Image Sizes: The primary advantage of multi-stage builds is the reduction of image size. By discarding unnecessary files and dependencies in the final stage, you can create a lean and efficient runtime image. Smaller images are easier to distribute, resulting in faster deployment times and reduced storage costs.
Improved Security: Reducing the attack surface by excluding build tools and development dependencies from the runtime image enhances security. Fewer components in the final image mean fewer potential vulnerabilities to exploit.
Streamlined Build Process: Multi-stage builds help to streamline the build process. Developers can work with a consistent, well-defined set of tools and dependencies during development and testing. The separation of build and runtime stages ensures that only the essentials are included in the final image.

How to Implement Multi-Stage Builds

1. Defining Stages with FROM: In a Dockerfile, you can create multiple build stages by using the FROM instruction. Each FROM instruction starts a new stage, and you can specify a base image for that stage which can have its own RUN, COPY, and other Dockerfile commands.

FROM base_image_1 as stage_name_1
# Define instructions for the first stage

FROM base_image_2 as stage_name_2
# Define instructions for the second stage

# Additional stages if needed

base_image_1 and base_image_2 are the base images for the first and second stages, respectively. You can choose any Docker image as your base, depending on your project's requirements.
stage_name_1 and stage_name_2 are user-defined names for the stages. These names are optional but helpful for referencing the stages later in your Dockerfile.

2. Separation of Concerns: Each stage represents a specific phase in your application's build process. For example, the first stage may handle compilation, dependencies installation, and code compilation, while the second stage might focus on creating the final runtime image. The separation allows you to keep only what's necessary for each phase in the respective stages.

3. Copying Artifacts Between Stages: To share files or artifacts between stages, you use the COPY --from=<stage_name> instruction. Let us see an example:

FROM base_image_1 as stage_name_1
# Build your application
RUN some_build_command
# Generate build artifacts

FROM base_image_2 as stage_name_2
# Copy artifacts from the first stage
COPY --from=stage_name_1 /path/to/artifacts /destination

In this example, the second stage copies the build artifacts created in the first stage by referencing stage_name_1. This way, you can effectively use only the necessary files in the final image without carrying over build tools or intermediate files.

4. Final Stage: You typically end your Dockerfile with the final stage where you create the runtime image. This stage usually uses a minimal image designed for production, reducing the image size.

Maven Hello World Project

Project repo: https://github.com/sampathshivakumar/my-app.git

It is a Simple Maven project for Java applications, including a source directory, a pom.xml file, and a sample Java class to get you started with a basic "Hello World" program.

Single-stage vs multi-stage Dockerfiles

Let us Build Docker images for this Project using both Single-stage and multi-stage Dockerfile so that we can clearly see the advantages of multi-stage Dockerfile.

Single-stage Dockerfile

# Use an official Maven image as a parent image
FROM maven:3.8.3-openjdk-11 AS builder

# Set the working directory in the container
WORKDIR /app

# Copy the project's pom.xml and source code
COPY ./ /app

# Build the Maven project
RUN mvn clean package

# Use the same builder image to run the Java application

# Define the CMD to run your Java application
CMD ["java", "-cp", "target/my-app-1.0-SNAPSHOT.jar", "com.mycompany.app.App"]

we can see the size of the Docker image we obtained is 681 MB using a Single Dockerfile.

Let us run the docker container to see the output

Multi-stage Dockerfile

# Use an official Maven image as a parent image
FROM maven:3.8.3-openjdk-11 AS builder

# Set the working directory in the container
WORKDIR /app

# Copy the project's pom.xml and source code
COPY ./ /app

# Build the Maven project
RUN mvn clean package

# Use an official OpenJDK image as a parent image
FROM openjdk:11-jre-slim

# Set the working directory in the container
WORKDIR /app

# Copy the JAR file from your local machine to the container
COPY --from=builder /app/target/*.jar /app/

# Define the CMD to run your Java application
CMD ["java", "-cp", "my-app-1.0-SNAPSHOT.jar", "com.mycompany.app.App"]

The Multi-Stage Dockerfile results in a much smaller image size of 223MB, which is significantly more compact compared to the Single-Stage Dockerfile's image size of 681MB.

Conclusion

In this blog, we've explored what is Multi-Stage Dockerfile and the advantages of it over a Single-Stage Dockerfile when building container images for your applications.

I hope you enjoyed reading this blog and found it informative. If you have any questions or topics you'd like us to cover in future blogs, please don't hesitate to connect with me on LinkedIn.

Thank you for joining us on this Docker journey.

Sampath Siva Kumar Boddeti
AWS & Terraform Certified