Engineering

Fully bootstrapping Java from source in Wolfi

Ariadne Conill, Principal Software Engineer and Josh Wolf, Software Engineer
June 2, 2023
copied

Providing proof of the origin of Java distributions, from the source code all the way to the final binary package, is a significant challenge for projects that produce distributions of OpenJDK. As a company that focuses on software supply chain security, and also produces distributions of OpenJDK, we have been working to create a chain of OpenJDK packages which originates from an alternative JDK built from GNU Classpath, the GNU Compiler for Java, and the Eclipse Compiler for Java. This enables us to provide full provenance from pure source code for the entire Java ecosystem in Wolfi.

Our journey begins with OpenJDK, which is the official source release of Java managed by Oracle. Because of Oracle’s restrictions on their own binary distribution of Java, many alternative distributions of Java based on OpenJDK have popped up. As is common when bootstrapping programming languages, each release of the OpenJDK must be built by the previous release.  In other words, OpenJDK 17, the current LTS version, must be built by OpenJDK 16, which itself is not an LTS version.  Accordingly, if you start from OpenJDK 8, the oldest LTS version of OpenJDK, you would have to build OpenJDK 9 through 16 to get to OpenJDK 17.

But OpenJDK 8 itself has to be built with OpenJDK 7, which itself has to be built with Java 1.5.0 or newer. Sun never provided freely redistributable source code for Java 1.5.0 (later renamed Java 5 retroactively), so how do we overcome that? The answer is complicated: we must assemble a JDK from various pieces of source code which reimplements Sun’s Java 1.5.  Enter the java-gcj-compat package.

To build the java-gcj-compat package, we make use of software from the GNU Classpath project, such as the GNU Compiler for Java, and software from the Eclipse Foundation, such as the Eclipse Compiler for Java. The GNU Classpath project provides a mostly complete implementation of the Java 1.5 class libraries, while the GNU Compiler for Java is used to build the Eclipse Compiler for Java, which itself provides a Java compiler capable of compiling source code leveraging Java 1.5 features.

With the understanding of the components needed to build a freely redistributable replacement for the Sun Java 1.5 JDK, we can start to understand the steps we need to take in order to build them and put them together. That part is fairly straightforward: the necessary GNU Classpath components were bundled together with GCC until they were removed from GCC 7, and the Eclipse Compiler for Java can be downloaded from the Eclipse Foundation website.

But Wolfi ships with GCC 13, which is much newer than GCC 7, so we had to introduce a gcc-6 package to Wolfi for the sole purpose of leveraging the GNU Compiler for Java and GNU Classpath infrastructure to build the Eclipse Compiler for Java and act as a JVM, which is then stitched together into something resembling a standard Java 1.5 JDK in the aforementioned java-gcj-compat package. In order to build GCC 6 with modern libraries, we had to include a few patches, which were taken from Alpine’s GCC 6 tree.

Once we have our Java 1.5 JDK, we can then build OpenJDK 7. A problem with the OpenJDK 7 release is that Sun did not have the legal authority to release all of the Java source code as part of the OpenJDK release, meaning that the OpenJDK 7 release is incomplete and cannot be built on its own to produce a complete JDK. To overcome this problem, the GNU Classpath project produced a new project which combined their own GNU Classpath source code with the source code of OpenJDK: the IcedTea distribution. This provided replacements of the embargoed components of Java, allowing a complete build of Java 7 from source.

From there, the bootstrap process is fairly straightforward: we use the OpenJDK 7 package we built to then go build OpenJDK 8. We then use OpenJDK 8 to build OpenJDK 9, OpenJDK 9 to build OpenJDK 10, but then we hit a problem in glibc on x86_64 systems with the AVX512VL extension: string concatenation is implemented using AVX512 instructions on these systems, and something about the OpenJDK 9 JVM tickled the AVX512 implementation in just the right way to cause memory corruption:

Based on the memory corruption happening at exactly 32-byte increments, we eventually concluded that there was a defect in the EVEX implementation of string concatenation and disabled the use of EVEX optimized routines in glibc. After that, we had no significant problems bootstrapping OpenJDK 10 and later, all the way through to 17.

A brief note on default-jvm and JAVA_HOME

Alongside the changes to demonstrate a full from-source bootstrap of Java in Wolfi, we have introduced some changes to allow multiple JDKs to be installed side by side.  This means that the preferred JDK’s JAVA_HOME should be added to your container’s path, or if you want a system-wide JDK, you can install one of the many `-default-jvm` variants of the JDK, e.g. openjdk-8-default-jvm to select OpenJDK 8 as the system-wide JRE and JDK.

Finally, if you’re interested in this level of obsession for correctness in your software supply chain, reach out and explore a Chainguard Images catalog subscription.

Related articles

Ready to lock down your supply chain?

Talk to our customer obsessed, community-driven team.