Dealing with multiple SBOMs
Based on this Twitter Thread
TL;DR: As SBOMs gain traction we have more and more tools to generate them, resulting in different types of SBOMs. Not all SBOMs are equal, but how do we reconcile the different results?
SBOMs do not need to look the same to validate the artifacts described in them. As long as the data inside is correct you should be able to do it.
BUT! Tooling is far from ideal for now. We are working on it!
SBOM Generation
Will two tools produce the same SBOM? The answer is almost certainly, no. Should they produce the same SBOM? No. Can you still validate an SBOM? Yes!
“When” and “How” Matter
As Matt Moore pointed out in the thread, some data in the SBOM can vary depending on when and how you collect the information.
Some tools generate them by observing what goes into the build. Some of those are build tools.
— Matt Moore ⛓🚀 (@mattomata) April 29, 2022
Others produce them from the result artifact itself, which can be a lossy transformation.
Lots of sources of variation 😞
At build time, you can observe certain things. If you try to describe already built artifacts you can observe a different (possibly smaller) set of things.
There is also a big variation between the output of tools due to semantics and expression. Think about the many ways to describe a container image: There are manifests, layers, packages in the base images, and files in the individual layers.
Which SBOM is the right one?
Are two SBOMs wrong because they are different from one another? Absolutely not! While it is true that what gets described varies, the inherent attributes of the objects being described are immutable. The SHA of a file doesn't change if you describe it in a tarball or in a container layer for example. Is an SBOM wrong if it lists those attributes wrong? Absolutely, yes! A big problem is that tools sometimes get those attributes wrong. This is still ongoing and concerning. Is it wrong if it is "less complete"? Well, I hate to do this, but "it depends" (no SBOM pun there, ha!).
If all you need is to check the integrity of a file. Then you could have a super minimal SBOM of just one file. Add a checksum and you’re done. The problem with this is that it’s hardly useful for anything else. It’s like one of those "useless machines" that have a single purpose (like this box).
Sometimes all you need is a list of dependencies and I think this use of SBOMs is what most people are looking for today. It’s a valid use case, but only a tiny piece of what an SBOM can do for you.
💡Hint: Read this post I wrote: What an SBOM Can Do for You
You may not know the final mission of your SBOM when you generate it. The consumer will use it for processes or tools you may not even be aware are out there. Perhaps you need to check licensing, compliance, integrity, analyze structure, and so on. That's a good reason to have richer data in your documents.
The SBOM is not for you. Think about the Voyager Golden Record: Make it easy for the recipient to decode your message. NASA did not create a record to make its own tools look hot, did they? You need to send aliens easily parsable instructions on how to invade the planet.
Making an SBOM Useful
There are several ways to go about creating an SBOM for someone else.
Micro SBOMs can be made "more complete" if they are part of an SBOM system where linked files draw the whole picture. We are working in a supply chain. SBOM system should ideally be created organically after each link adds value or transforms software.
Make sure to leave bread crumbs along the way so that people can find the SBOMs!
💡Hint: Use cosign to attach them! https://www.sigstore.dev/
This question from @elinesterov is also important:
Does SBOM contain information about how it was collected? Or is it inferred by some artifacts?
— ꩜ Eli Nesterov (@elinesterov) April 29, 2022
It can but it is not always there.
My colleague, @AikasVille thinks we should have a way to grade SBOMs. "Distance to origin" could be one way to do it.
If an SBOM is intended for a purpose and misses the required data, well, it is useless. Since you can get wildly expressive in an SBOM, there are efforts to define what most folks would need. For example, NTIA defined the now-famous "Minimum Elements" (read the PDF). But the problem is that the tooling to make things work in the way I just argued is simply not there yet.
What now?
We recently added simple (dumb!) file validation to the Kubernetes SBOM tool, and are working on validating other kinds but, well... Help Wanted™
Other efforts are going on at the same time: linking micro SBOMs, making standards more interoperable, and matching tools to make them understand each other. IT IS A LOT.
I have been working with one of my smartest friends (and coworker!) John Speed Meyers to check how bad SBOM variance is. It IS WILD. But there is hope!
I'll try to share some results soonish (not because they are super-secret, we need to compile something decent).
Sign up for our newsletter to stay informed.
Ready to Lock Down Your Supply Chain?
Talk to our customer obsessed, community-driven team.