One indicator of good production hygiene is the “freshness” of deployed software. Stale software metastasizes into technical debt, and can even ultimately become a source of vulnerability. This applies to a range of contexts including:
In this post we talk about a practice dubbed “Build Horizon” at Google that imposes a maximum age on build artifacts, and how you can leverage Chainguard Enforce to detect violations of this policy.
Generally, our philosophy on dependencies is to embrace the Principle of Ephemerality, and wherever possible automate pulling in new dependencies through your standard production qualification process. For library dependencies, tools like dependabot are great. Our own Carlos Panato put together a GitHub action we call digesta-bot that sends us automated pull requests to update our image references (e.g. base images):
However, even with automation to help, things slip through! We recently discovered that we had a leftover service running on one of our own “staging” clusters. We had renamed the service from “foo” to “bar”, and “foo” had not gotten cleaned up. This (and some fun new upstream features in sigstore/policy-controller) gave us the perfect excuse to put together the “build horizon” policy I had been itching to write, and get us some defense-in-depth against stale artifacts!
This policy works by accessing the container image’s “config” using the new fetchConfigFile functionality in sigstore/policy-controller. Let’s look at an example of such a “config” using crane:
The container’s config contains a lot of interesting information, including the default entrypoint, user and environment for launching the container. However, for this policy we are after the “created” timestamp emphasized above. When fetchConfigFile is specified, the input passed to the policy contains a field named config with a mapping from the platform-architecture to its config json (linux/amd64 shown above), e.g. using rego to access the above you would use input.config["linux/amd64"], or to act on all architectures use input.config[_].
We favor rego support (also new!) over cue for this policy because it has better time functions, so leveraging the above we can write the following to check that an image was built within the past 30 days:
We can wrap this into a ClusterImagePolicy to control what images it applies to and how severely to treat violations with:
One “gotcha” with this policy is that it will always trip for naively built reproducible images, since most reproducible images use the Unix epoch as their timestamp. Take for example the Google distroless images, which suffered from this (until recently):
However, many reproducible build tools support an environment variable called SOURCE_DATE_EPOCH, which allows users to align the artifact’s timestamp with the timestamp of the source commit on which it is based. For example, for Chainguard Enforce we build our images with ko setting:
This means that images built from a particular commit will produce the same thing today as a year from now, but the config file’s timestamp may vary (even if the binary does not) as new commits are made.
You can use this policy with sigstore/policy-controller to block new deployments of stale images today. With Chainguard Enforce, this policy will also be continuously evaluated against all of the images matching this pattern that the platform has included into your Evidence Lake, including things like base images and multi-architecture variants, as well as the images directly running within your workloads. With Chainguard Enforce, even when an image is fresh when admitted by sigstore/policy-controller, our continuous verification process will send a notification (e.g. post to slack, open a github issue) if/when a deployed image falls out of compliance, so that corrective action may be taken.
For example, when we enabled this policy, Chainguard Enforce immediately flagged the stale “foo” component above, and our automation opened an issue we used to track cleaning things up: