It used to be common to hear that software bill of material (SBOM) efforts—efforts to generate and consume ingredient lists of software—have a “chicken and egg” problem: SBOM generation tools are only valuable if SBOM consumption tools exist and are widely used and vice versa. Therefore, as SBOM generation tools have become relatively plentiful, it’s become common to hear that SBOM consumption tools are the new frontier. But what if SBOM chickens are laying bad SBOM eggs? In that case, despite the mere existence of many SBOM generation tools (and the implied existence of many SBOMs), SBOM consumption tools would struggle to parse ill-formed and incomplete SBOMs and the goal of software transparency would still be distant.
To perform a preliminary assessment of SBOM quality, this blog post discusses the creation of a dataset (bom-shelter) of 50+ SBOMs drawn from open source software projects and the application of two SBOM quality tools to these SBOM documents. While the dataset is small because in-the-wild SBOMs aren’t yet common, we hope that others point out or donate open source project SBOMs. The main results include:
In other words, it’s possible for open source projects to produce high-quality SBOMs that downstream consumers can use. Some do. But many SBOMs in this admittedly small dataset (additions welcome!) lacked even rudimentary information. Some open source project SBOM eggs are bad. Improving SBOM quality—and not only making SBOMs everywhere—should therefore also be on the SBOM agenda.
Methodology: “bom-shelter” and Two SBOM Quality Tools
Any assessment of SBOMs preferably needs a large and representative sample of SBOMs. To the best of our knowledge, that doesn’t exist. What small collections do exist are often used for the purpose of testing SBOM tools or showcasing SBOM adoption. We therefore offer bom-shelter, a project that contains 50+ SPDX and CycloneDX SBOM documents found in the wild in a variety of formats and for a wide selection of open source software projects. Detailed metadata on these SBOMs is also available. The main source of these SBOMs was Sourcegraph’s code search functionality with a variety of searches using common SBOM naming conventions (e.g. spdx.json). These SBOMs aren’t necessarily representative of all SBOMs, but these SBOMs are, at least, diverse and suggestive of the quality of open source project SBOMs.
The next step in assessing SBOMs is selecting SBOM quality tools. Two early-stage tools stood out: SBOM Scorecard and NTIA Conformance Checker. The SBOM Scorecard tool, an eBay open source project, can assess a number of SBOM formats and determine whether the SBOMs have package IDs (via a PURL or CPE), package versions, and licenses. These pieces of data are helpful information for assessing security risks and legal compliance, two typical uses of SBOMs. The tool assigns a score to each data element: the percentage of packages within the SBOM that contain the required information. The NTIA Conformance Checker, an SPDX project, determines whether an SPDX SBOM contains the so-called “minimum elements”: supplier name, component name, version of the component, other unique identifiers, dependency relationships, SBOM author, and timestamp. Both tools are at an admittedly early stage, though hopefully use and testing of these tools (like this analysis) can help improve them.
SBOM Scorecard Results
The SBOM Scorecard tool (at commit 455e3) was applied to 25 SBOM documents (at commit b9c73). Given a current limitation of the tool, the analysis only used JSON-formatted SBOMs. Additionally, to avoid duplication, this analysis only selected one SBOM per open source project present in the bom-shelter dataset.
Twenty-one of these SBOMs had a valid specification, indicating broad but not universal compliance with the specification. Table 1 provides the average SBOM Scorecard score for three different data points across the 25 SBOMs. To help interpretation, a score of .5 for a particular type of SBOM data would indicate that, on average, 50 percent of packages contain that particular type of data.
The relative presence of package ID information (does each package contain a CPE and a PURL?) and package version information suggests that component matching is relatively emphasized in these SBOMs. Of note, nearly four-fifths of the SBOMs lacked package license information and two-fifths lacked any package version information. The lack of license information is, to us, surprising.
But some of these SBOM eggs appear to be, as defined by the SBOM Scorecard tool, of notably high quality. Four SBOMs stood out as conforming with the specification, having either PURLs or CPEs for all packages, and having nearly complete information on package versions and licenses.
NTIA Conformance Checker Results
The analysis then pointed the NTIA Conformance Checker tool, which is only compatible with SPDX documents, at 24 SPDX documents. Because some of the SPDX documents in this dataset are for the same project but in different formats (e.g. JSON vs. tag-value), this analysis only used the SBOM originally associated with the open source project where it was found.
Seventeen of these SPDX SBOMs could be parsed. Of those, none complied with the NTIA minimum guidelines. For instance, a number of these SBOMs had packages without suppliers. The NTIA minimum guidelines appear to be a high bar for open source project SBOMs to clear.
Limits of SBOM Quality Analysis
First, the SBOM dataset contained in bom-shelter is not particularly large, but hopefully others can contribute additional SBOMs, growing the number and diversity. Additionally, the dataset is not necessarily representative of all SBOMs, though hopefully it’s relatively similar to the SBOMs found among open source software projects.
Second, there is not currently a consensus on what a “good” SBOM is. The usefulness of the information in an SBOM depends on the particular goals of an SBOM producer and consumer. License information could be essential to one consumer and superfluous to another. That said, the tools used in this analysis focused on SBOM data that is widely viewed as useful for either remediating vulnerable components (package ID and version), license compliance (package license information), or contract compliance (NTIA minimum elements).
The SBOM Chicken and Bad Egg Problem
The ability of this project to identify and collect dozens of SBOMs from a variety of open source projects does suggest that the original SBOM chicken and egg problem has been cracked. The results also imply, however, that there is a new SBOM chicken and egg problem: SBOM chickens laying bad SBOM eggs.
What to do? Perhaps most importantly, improving SBOM quality—and not only making SBOMs everywhere—should be on the SBOM agenda. In addition, software teams can analyze the quality of the SBOMs they depend upon, identify deficiencies, and then make tooling and process changes to improve their SBOM quality. Additionally, improving SBOM generator tools should also likely be a part of this initiative. In sum, if software transparency is to be realized through SBOMs, SBOM quality will need to become a key concern.
Join me on January 18th for a discussion with Chainguard's Head of Open Source Tracy Miranda and eBay's principal architect Justin Abrahms for a discussion about what makes SBOMs high quality. We will take a look at the results of a recent SBOM quality study and also a deeper look at emerging tooling for measuring SBOM quality. Register today!
To follow along with more Chainguard Labs research and recommendations on topics like SBOMs, or to share a related topic you’d like the team to dig into, check out our research blog page, website or sign up for Chainmail, our monthly newsletter to get the latest delivered to your inbox.