Software Bill of Materials documents, or SBOMs, have become a hot topic in software supply chain security these days, with everyone bringing their own solution to the table in response to President Biden’s executive order last year. While vendors advertise that these solutions are easy to use and accurate, the output of these tools vary significantly due to differences in how these solutions generate SBOMs, and what data sources they use. To demonstrate how even gold standard SBOM generation tools miss components, we use Syft to scan a custom image and find that it misses the binary we added.
The two main ways SBOMs get generated
There are two main ways of generating an SBOM – developers can generate an SBOM as part of their build process or after the fact with a Software Composition Analysis tool. In general, SCA tools are the way most SBOMs are generated today, though build systems are becoming more aware of the importance of SBOMs and have been adding direct support for them. SCA tools, by comparison to build systems, work by analyzing a set of build artifacts and generating a software component inventory for them. Build systems, on the other hand, are able to have the software component inventory up front for the artifacts they produce and dependencies they consume, which provides a higher level of accuracy. Once the SCA tool or build system finishes collecting this software component inventory, it is then used to generate the SBOM.
Why SCA tools don’t see locally built software
A common misconception is that SCA tools can see everything in an image, recording all components. In reality, SCA tools work by scanning various package management databases to reconstruct what components are in a given image. This approach creates very useful data but to get the whole picture, all components used in an image need to be registered with a package management system to be seen.
As an example, we will create a Docker image, using an nginx static binary, and combine it with Chainguard’s ghcr.io/distroless/static base image. The Dockerfile looks like this:
If we build the image and tag it on our local Docker daemon as nginx-static:latest, we can scan it with Syft. Will Syft see the nginx binary we added?
As can be seen in the screenshot, nginx is missing from the software component inventory because the package is not recorded in the package management database.
This is not a design flaw with Syft or other SCA tools, but rather a reality of how they are designed to work: they can only see what has been recorded in a database somewhere, or in the case of some binaries, embedded inside the binaries themselves.
How to make SCA tools see locally built software
The best way to allow SCA tools to have visibility into locally built software is to build them as packages, which allows these artifacts to be managed and inventoried as if they were any other system component, but build systems do not always support this.
In these cases, an easy way to make software artifacts visible to scanners is to wrap them in a package, by taking prebuilt artifacts and manually generating the necessary metadata about them. When you take this approach, you can record the provenance information concerning the artifact, and the system package manager can manage them as if they were any other system component, allowing them to be inventoried by SCA tools. As an example, it is easy to do this with Melange.
An alternative to wrapping pre-built dependencies in packages would be to create virtual packages representing the artifact. While this does enable capture of provenance information concerning artifacts, it does at least give SCA tools some visibility. This technique has been proposed in the Alpine community in the form of versioned virtual packages.
It is important to note that taking the approach of wrapping pre-built software artifacts is a poor supply chain security practice and should be avoided whenever possible. Instead, artifacts should be built as packages or composable OCI layers using cloud-native build tools like Melange.
A comprehensive SBOM is a valuable part of any organization’s security toolbox. By looking at SCA tools critically and understanding their theory of operation we can generate more complete and accurate SBOMs.