Make SBOMs, not GuessBOMs: Why we need to shift left on SBOM generation

Tracy Miranda, Head of Open Source
  •  
January 26, 2023

Software Bill of Materials (SBOMs)  continue to be one of the most talked about tools for securing the software supply chain because of their promise to provide insights into your software inventory, which allows for better vulnerability management. However, this does not mean SBOMs are without a fair share of criticisms across the industry, especially SBOM quality. In particular, many have been questioning how well do today’s SBOMs give a realistic picture of the constituent components of any given artifact. 

SBOM quality or completeness is greatly affected by when they are generated. The two key stages of SBOM creation are typically either:

  • During the build process
  • Post-build

Up until now, the vast majority of SBOMs have been generated by the second method: post-build, typically by using software composition analysis (SCA) tools. However, this method has a key flaw as Chainguard’s Ariadne Conill points out in the post not all SBOMs are created equal. The post explains how SCA tools rely on components being recorded somewhere (e.g. a database) and will miss any software that bypasses the metadata store (e.g. files being copied as part of Dockerfiles).  

The rate at which SCA or scanner tools miss key components can be surprisingly common'. Recently, Chainguard CEO Dan Lorenc examined some popular docker images to look at this issue. At the time, he discovered Golang could not be found in the Golang image, Redis could not be found in the Redis image, and Wordpress could not be found in the Wordpress image. Since then, some of these issues have been addressed, but proves it is often a game of whack-a-mole if scanners can identify complete software components.

Anecdata is great, but how widespread is this issue? To try to quantify the problem, Chainguard Labs recently published a study on software dark matter. Software dark matter comprises files that exist but are untracked by typical tools like a package manager or an SBOM. The study concluded that the analyzed containers were made up of roughly 32% software dark matter. And if containers are weighted by the number of files, this figure rises to 63%. Software dark matter is in the range of 30% to 60% on average. 

The nature of the SCA process to produce quality SBOMs can necessarily be a best guess of the 'ingredients' of the software artifacts. This is why we use the term “GuessBOMs”. SBOMs generated by trying to reverse-engineer software artifacts have severe limitations. In a recent Chainguard Twitter space, it was highlighted that SCA tools have been around for a while, and if they worked well enough this problem would be solved and we wouldn’t need SBOM standards.

The alternative to creating GuessBOMs with SCA tools is to generate SBOMs at build time. This is the stage the United States National Telecommunications and Information Administration (NTIA) recommends for when SBOMs should be generated:

“Those requesting SBOM data should try to obtain it from the instance of the build since the instance of the build captures the details of the software as built, including reflecting any changes made by the compiler or other tools.” – NTIA’s The Minimum Elements For a Software Bill of Materials

While traditionally there have been very few tools that can generate SBOMs at build time, we expect this will start to change in 2023. For an SBOM to be useful it has to have more information than you are able to devise from the final output. Creating SBOMs as part of the build where you have access to source code and more information about the process is critical. The fact that we don’t see a lot of build time SBOMs currently is due to the fact that the use of SBOMs is fairly limited right now, especially for determining inventory for vulnerability management. 

One tool making strides towards generating more complete SBOMs is apko. Apko is a command-line tool from Chainguard that allows users to build container images using a declarative language based on YAML. Apko works hand in hand with a tool called melange as part of a secure software factory, which produces an SBOM for images with all the packages listed inside. Apko is also used to build Chainguard Images with complete SBOMs at build time. This new generation of build tools produce more complete SBOMs and also takes steps forward to ensure SBOM integrity by signing them with Sigstore

Not to be left behind, it is also great to see SCA tools adapting to support build-time SBOMs. For example, Trivy now includes functionality that allows one to use an existing SBOM.  

SBOMs are a great vehicle for vulnerability management, but only if they give us an accurate picture of the underlying software. Build time is the optimal point for generation of complete SBOMs. As an industry, we need to raise the bar on quality and not be satisfied with hacky guesswork in the form of GuessBOMs. GuessBOMs gives us the necessary language to have this initial conversation and drive for improvements. More complete, higher quality SBOMs will go a long way towards making SBOMs live up to their expectations and be truly effective tools for software vulnerability management.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Don’t break the chain – secure your supply chain today!

Research

Make SBOMs, not GuessBOMs: Why we need to shift left on SBOM generation

Tracy Miranda, Head of Open Source
January 26, 2023
copied

Software Bill of Materials (SBOMs)  continue to be one of the most talked about tools for securing the software supply chain because of their promise to provide insights into your software inventory, which allows for better vulnerability management. However, this does not mean SBOMs are without a fair share of criticisms across the industry, especially SBOM quality. In particular, many have been questioning how well do today’s SBOMs give a realistic picture of the constituent components of any given artifact. 

SBOM quality or completeness is greatly affected by when they are generated. The two key stages of SBOM creation are typically either:

  • During the build process
  • Post-build

Up until now, the vast majority of SBOMs have been generated by the second method: post-build, typically by using software composition analysis (SCA) tools. However, this method has a key flaw as Chainguard’s Ariadne Conill points out in the post not all SBOMs are created equal. The post explains how SCA tools rely on components being recorded somewhere (e.g. a database) and will miss any software that bypasses the metadata store (e.g. files being copied as part of Dockerfiles).  

The rate at which SCA or scanner tools miss key components can be surprisingly common'. Recently, Chainguard CEO Dan Lorenc examined some popular docker images to look at this issue. At the time, he discovered Golang could not be found in the Golang image, Redis could not be found in the Redis image, and Wordpress could not be found in the Wordpress image. Since then, some of these issues have been addressed, but proves it is often a game of whack-a-mole if scanners can identify complete software components.

Anecdata is great, but how widespread is this issue? To try to quantify the problem, Chainguard Labs recently published a study on software dark matter. Software dark matter comprises files that exist but are untracked by typical tools like a package manager or an SBOM. The study concluded that the analyzed containers were made up of roughly 32% software dark matter. And if containers are weighted by the number of files, this figure rises to 63%. Software dark matter is in the range of 30% to 60% on average. 

The nature of the SCA process to produce quality SBOMs can necessarily be a best guess of the 'ingredients' of the software artifacts. This is why we use the term “GuessBOMs”. SBOMs generated by trying to reverse-engineer software artifacts have severe limitations. In a recent Chainguard Twitter space, it was highlighted that SCA tools have been around for a while, and if they worked well enough this problem would be solved and we wouldn’t need SBOM standards.

The alternative to creating GuessBOMs with SCA tools is to generate SBOMs at build time. This is the stage the United States National Telecommunications and Information Administration (NTIA) recommends for when SBOMs should be generated:

“Those requesting SBOM data should try to obtain it from the instance of the build since the instance of the build captures the details of the software as built, including reflecting any changes made by the compiler or other tools.” – NTIA’s The Minimum Elements For a Software Bill of Materials

While traditionally there have been very few tools that can generate SBOMs at build time, we expect this will start to change in 2023. For an SBOM to be useful it has to have more information than you are able to devise from the final output. Creating SBOMs as part of the build where you have access to source code and more information about the process is critical. The fact that we don’t see a lot of build time SBOMs currently is due to the fact that the use of SBOMs is fairly limited right now, especially for determining inventory for vulnerability management. 

One tool making strides towards generating more complete SBOMs is apko. Apko is a command-line tool from Chainguard that allows users to build container images using a declarative language based on YAML. Apko works hand in hand with a tool called melange as part of a secure software factory, which produces an SBOM for images with all the packages listed inside. Apko is also used to build Chainguard Images with complete SBOMs at build time. This new generation of build tools produce more complete SBOMs and also takes steps forward to ensure SBOM integrity by signing them with Sigstore

Not to be left behind, it is also great to see SCA tools adapting to support build-time SBOMs. For example, Trivy now includes functionality that allows one to use an existing SBOM.  

SBOMs are a great vehicle for vulnerability management, but only if they give us an accurate picture of the underlying software. Build time is the optimal point for generation of complete SBOMs. As an industry, we need to raise the bar on quality and not be satisfied with hacky guesswork in the form of GuessBOMs. GuessBOMs gives us the necessary language to have this initial conversation and drive for improvements. More complete, higher quality SBOMs will go a long way towards making SBOMs live up to their expectations and be truly effective tools for software vulnerability management.

Related articles