Engineering

Fully bootstrapping Go from source in Wolfi

Ariadne Conill, Principal Software Engineer
August 11, 2023
copied

In our previous blog post, we discussed how Wolfi is one of the few distributions which has full provenance all the way back to a purely source-based build of Java.  Having a complete understanding of a language ecosystem’s provenance is important, as Ken Thompson’s seminal classic, Reflections on Trusting Trust, highlights. In his lecture, he demonstrates how it is possible to install a backdoor into a compiler, compile that compiler, remove the backdoor from the source code, and ensure that the backdoor persists in future compilations of itself. This is possible because the back-doored compiler was modified to reinsert the backdoor in future versions of itself that it compiled, meaning that there is a significant risk of tampering whenever pre-built binaries are a component of a supply chain. But there is a solution: full-source bootstrapping.

A full-source bootstrap is one where an entire language ecosystem is bootstrapped purely from source code. For Java, this was a very complicated process – we had to build an initial “bootstrap JDK” from various free software components in order to build a patched version of the initial OpenJDK release. Thankfully, for Go it is a lot simpler.

Alternative Go compilers

Most developers who use Go are familiar with the official Go toolchain, which is built around a compiler named gc, which is itself written in Go.  This compiler is what is normally ran whenever you invoke a Go build and is a highly parallelized compiler that can build many source files at the same time, and is generally considered to be the preferred Go compiler to use in general.  But since it is written in Go, that means you need a pre-existing Go compiler in order to build it.  This makes our objective fairly straightforward: we need to figure out how to arrange for a pre-existing Go compiler to be available that can be used to build the official Go toolchain.

Thankfully, the Go developers understand this bootstrapping problem. In many cases, distributions simply download the pre-built binaries of Go provided by Google in order to bootstrap, but there are two different alternative options available that are both built from the C++ gofrontend project: gcc-go and gollvm. These implementations, being written in C++, do not require a Go compiler in order to build, which solves our bootstrapping problem: if we build gcc-go, we can then build the normal Go toolchain with it.

But there are some gotchas. As with all other alternative compilers, the Go compilers built on gofrontend lag behind the official toolchain. For example, the Go compiler included with GCC 13 only implements Go 1.18:

This isn’t a problem, though. Wolfi ships a Go 1.19 version stream. Assuming that the compiler is sufficiently complete to build the official Go toolchain, we can just keep it around for bootstrapping the 1.20 version stream, and so on.

And just like that, we have another language ecosystem that is bootstrapped purely from source code, ensuring that it is protected from toolchain-level tampering.  The work continues, but the multitude of Go-based applications in our Chainguard Images collection have already been rebuilt with this toolchain, allowing our users to reap the benefits of having full provenance for those applications all the way back to the toolchain sources.

If you or your company is interested in container images that are built with full provenance for Go applications, contact us to learn more about Chainguard Images.

Related articles

Ready to lock down your supply chain?

Talk to our customer obsessed, community-driven team.