Docker has become an immensely popular tool in the world of software development, and for good reason. It provides an excellent way to create, manage, and deploy containers, which in turn enables developers to run applications that work the same way in development and production. However, creating Docker images can sometimes be a pain, as anybody who has accidentally broken the apt cache and triggered an hours-long build by making a minor change can tell you.
The standard, Dockerfile approach to creating container images can cause:
In this blog post, we'll:
The Issue with Dockerfiles
A Dockerfile is effectively a shell script that runs to set up a container image. Consider this example, adapted from the Docker Library buildpack-deps image:
Dockerfiles have a few well-known issues that can make them a less-than-ideal solution for crafting containers:
Non-hermetic. A hermetic build declares its inputs explicitly, which allows pre-fetching of inputs and builds that can run offline. Dockerfiles often contain implicit dependencies, because they fetch dependencies online at build time (without explicitly pinning an exact file hash). The internet changes constantly, as new packages are published or servers have outages. In our example, the exact packages installed will depend on the state of the Ubuntu package repository at the time the Docker build occurs.
Dockerfile builds can be made hermetic with a disciplined approach, but you'll always be fighting to detect and fix reproducibility regressions because that's not what the tool or Dockerfile spec wants to do; even the recommended “best practices” have hermeticity problems.
Ultimately, without hermeticity, a Docker image can’t be reproducible (which has problems for caching as well as verifiability). Further, builds will need to happen online—increasing the risk to the build infrastructure.
What’s in your image? Dockerfiles usually start with a base image, like Ubuntu, which comes with a lot of software. This may lead to hundreds of vulnerabilities in your final image due to software you don’t even care about. In many cases, these vulnerabilities are not exploitable, but it only takes one—and manually checking each can be a pain.
Further, Dockerfile builds can lead to software dark matter, which are files with an unexplained source (for instance, those that don’t come from a package manager). These can be confusing to Software Composition Analysis (SCA) tools like grype, snyk, and trivy. In addition to vulnerabilities, these extra files can cause bloat, leading to larger image sizes and extra bandwidth and storage costs. Dockerfile-based builds can generate Software Bills of Materials (SBOMs), but these SBOMs miss many dependencies, even though more information about the package is available at build-time than any point later on. Docker build tooling does support creating build provenance attestations, which can play a useful role in a secure software supply chain.
Even Multistage Dockerfiles, which can prevent build dependencies from ending up in the final image, have many of these same issues.
Alternatives to Dockerfiles
ko is a CNCF project designed specifically for Go applications. To compile a Go app and place it directly into a container, a user simply runs ko build—in many cases, no configuration is required! This tool doesn't depend on Docker, which makes it faster, more reliable, and portable. It also produces reproducible images with Software Bills of Materials (SBOMs) by default and results in very minimal images.
When ko works for your application, it’s an incredible tool. But it only works for pure-Go applications; if your container needs another service, strange libraries, or CGO, you might be out of luck. There are ko-like projects for Java (Jib) and .NET (dotnet publish) as well, which share the same strengths and limitations.
rules_oci (like its predecessor, rules_docker, created by Chainguard CTO Matt Moore) leverages Bazel, an open-source universal build tool. Users define their container image in a Bazel rule:
If your application already builds in Bazel, it’s easy to pull the targets into the ultimate Docker image and configure it to run. And because it uses Bazel, it's fast (with good caching), reproducible, and can run across a large build cluster.
However, “If your application already builds in Bazel” is a big ask. While Bazel supports many programming languages with first-party integrations, and extensions allow it to use any language, Bazel can be a pain to run. It can work really well in enormous, really complex monorepos (like inside Google, where Bazel came from). But “partly-Bazel” codebases are difficult to manage, and the complexity of setting it up (especially when integrating with external software) can be overwhelming.
Nix is a build tool that emphasizes reproducibility. It's based on academic theory, as described in Eelco Dolstra's PhD thesis. Nix’s dockerTools library provides excellent support for building Docker images:
Nix builds images using the Nix cache, ensuring that only runtime dependencies end up in the final image. The final image will always be bit-for-bit reproducible. If you have multiple related images, they can even share layers, reducing disk usage.
The primary downside to Nix is its steep learning curve and high cost of adoption. Docker builds in Nix are written using the Nix programming language. The Nix language is a full-blown programming language, not just a configuration format like YAML. Nix is lazy and functional, which can be confusing for anybody who doesn’t write Haskell on a daily basis. Nix can be pedantic: it differentiates between build-time tools producing build-time artifacts, build-time tools producing run-time artifacts, build-time tools producing run-time dependencies, and run-time dependencies themselves. Multiple attempts have been made to improve the learning curve with tools like Flox and Fleek, but none has caught on quite yet.
apko is a build tool from Chainguard designed specifically for creating base images. It's what we use to produce all of our Chainguard Images, and it's instrumental in being able to effectively maintain so many high-quality images. apko uses the APK package format used by Alpine Linux, and follows a radical principle: all of the contents of the container image must come from APK packages. In practice, this isn’t a big constraint, as tooling like melange makes it easy to create APKs. Wolfi uses melange to build the thousands of packages it provides. But limiting image creation to assembling packages and configuring Docker metadata comes with huge benefits:
Here’s a (lightly abridged) example of a configuration file for an Nginx image:
We can see that apko is much simpler than something like Bazel or Nix: there’s no new programming language, just YAML (and the full description of the configuration format runs about 250 lines). In cases where apko itself isn’t a good fit, it can be used to create base images to be used with any of the tools described above, even Dockerfile builds.
Declarative approaches to building containers, such as those provided by the alternatives mentioned above, offer several advantages over traditional Dockerfiles. They're generally faster, provide better caching, and result in more minimal images. They do involve learning and installing new software, which is a real cost.
So what should you do? The following guidance is appropriate in most cases:
As software development evolves, it's important to consider new and improved ways to craft containers that address the limitations of Dockerfiles. By exploring alternative tools like Ko, Bazel rules_docker, Nix, and Apko, you can create more efficient, reliable, and secure container images for your applications.