Home
Unchained
Engineering Blog

This Shit Is Hard: Vulnerability Scanner Integration

Jason Hall, Principal Software Engineer, and Dan Luhring, Staff Software Engineer

In our first This Shit is Hard post, we walked through what we call the Chainguard Factory — what it takes for Chainguard to keep the stream of packages and images flowing smoothly into customers' environments, touching on the infrastructure and teams we've built to make that possible.


In this installment of "This Shit Is Hard," we wanted to talk a bit about what it takes to be confident that the software we're shipping to customers is not only as close to "CVE-free" as possible, but also that customers' supported scanners will accurately report our images as CVE-free when they are. We’re in the business of building trusted open source software, so this can’t be understated.


It turns out to be harder than you might think, especially if you want to do it right – like we do!


Scanners


Movie poster depicting a zombie with red text that reads, "Scanners".

Before we can talk about how Chainguard integrates with vulnerability scanners, let's start with what a scanner does.


At a very high level, a vulnerability scanner parses some software artifact (a compiled ELF executable, a JAR file, a Python wheel, an APK package, an OCI container image, etc., often a nesting doll of many different packaging formats) to find clues about what software is included in that artifact.


For example, by default, compiled Go programs include information embedded in their ELF structure that lists the Go modules and versions that the package depends on. A scanner might parse this ELF structure, determine it's a Go program, and read the module dependencies to see that the program depends on some particular version of some Go module. This step may also gather information about what compiler or tooling was used to build the artifact, in case that's relevant.


Next, after the scanner has collected all it can about what software is included in the artifact, it matches those packages and versions against a database of known vulnerable packages and versions. The most common source of information about known vulnerable packages is the National Vulnerability Database, but there are plenty of other databases, and some scanners also have proprietary, curated databases of known vulnerabilities.


Using our Go program example, the scanner may find that the program depends on golang.org/x/net/html at version v0.31.0, and match that against its database of known vulnerabilities. It will find that version of that module was vulnerable to CVE-2024-45338, a relatively innocuous ReDoS attack which was fixed in v0.33.0 and above.


By the way, to make matters more confusing, CVE-2024-45338 is also tracked in Go's vulnerability database as GO-2024-3333 and in GitHub's advisory database as GHSA-w32m-9786-jp63. It may go by other IDs in other systems, but they're all the same underlying vulnerability.


There's a third step after finding and matching, where if the scanner can determine that the package came from a known distro (like Chainguard), it will consult that distro's list of known security advisories. This gives distros an opportunity to enrich the scanner's findings with more context, for example to say:


  • We acknowledge the vulnerability is present; we're investigating.

  • The vulnerable code is present, but can't be fixed without the upstream project taking some action :(

  • The vulnerability has been fixed, and should no longer be reported.

  • We determined the vulnerable code could not be exploited in the artifact — this might happen when a vulnerability only affects software running on Windows, and since Chainguard doesn't build software for Windows (yet???), we know that none of our artifacts include the vulnerable code.


Those last two are really important, since they have the effect of hiding vulnerabilities in supported scanners, so they're the main mechanism a distro has to suppress false-positive vulnerabilities in scanner output. This is how distros ensure a high signal-to-noise ratio, which is kind of our whole thing.


Scanners and distros have a few different formats for expressing this additional context. Chainguard generates both an Alpine-style security.json file, and added support for the common OSV format about a year ago. We maintain separate security.json feeds for free packages in Wolfi and for private packages available to Chainguard customers, and one unified OSV feed. You can see these advisories in our Console, or on OSV.dev.


It's worth calling out at this point that distros like Chainguard gain significant powers by integrating with scanners. You could build all these packages yourself (that's hard too!), but being able to indicate to scanners that some vulnerabilities should be silenced requires even more investment of time and resources, and usually requires some demonstration that many users would benefit. It also takes considerable engagement between distros and scanners to become a trusted source of advisory data.


That's not something that just anyone can do – and we've done that already, for multiple popular scanners.


And there are lots of scanners out there. Different scanners may have different behaviors around how they find clues about components in software artifacts, or how well they match those findings against vulnerability databases, or how thorough and searchable those databases might be, or which distros' advisory feeds they know about, and how well they surface that extra context to their users. Chainguard just wants to make sure, as much as possible, that vulnerable code is fixed where possible, as fast as possible, and that whatever scanner is being used will agree that the vulnerable code is fixed, if indeed it is.


At Chainguard, we go to pretty extreme lengths to ensure that a variety of scanners can integrate with Chainguard's artifacts and advisory feeds, to produce accurate, actionable findings. We built this Vulnerability Scanner Support repository to help vulnerability scanners test and check their work. We've gotten lots of positive feedback from our scanner partners that we're able to actually explain the concepts and provide automated tests to detect regressions (Plug: if you're a scanner that doesn't support Chainguard yet, reach out! We'd love to help).


How We Do It


Okay, so a scanner has found some vulnerable software in an artifact we're shipping to customers. Time to get to business.


Our first indication is that our internal scanning finds the vulnerability. We use Grype internally, with some extensions to make it extra sensitive and verbose, because it's open source, highly extensible, and we have some familiarity with it. We constantly scan our packages with the latest Grype vuln DB, so we're notified as soon as possible when a vulnerability is found. When that happens, the first thing we do is update our advisories to note when we saw the vulnerability for the first time.


Once a vulnerability is detected, our automation gets to work attempting to fix it. In the example above where a Go program depends on a vulnerable version of the x/net/html package, the automation may first try to update that dependency to a fixed version, and rebuild the program. If that works — if the package builds and still passes tests — we've patched the vulnerability. If not, we may try other automated remediations, or drop the detection event into a prioritized queue of tickets that end up getting processed by real live humans. These humans research the vulnerability, determine if a fix is feasible, and update advisory data directly based on their findings.


If we're successful in updating the vulnerable dependency, the package now depends on a not-vulnerable version of the previously-vulnerable dependency. We wait until a subsequent scan of the updated package finds that the vulnerable dependency is indeed gone, and update our advisories to note the fix in an updated revision of the artifact. The image containing that package is rebuilt, image tags are updated, and the next time a customer pulls the updated image, they'll get the newly vulnerability-free package.


Not only that, but the next time they scan that updated image, their scanner should tell them exactly the same thing: the vulnerable dependency is gone!


But not only that!! Because we record the fixed artifact version in our advisory feed, scanners who understand that feed can surface that context to users who haven't yet updated their image, by telling them that the fix is available in a newer revision. Customers may even have automation to detect this state and start pulling the updated image, depending on the vulnerabilities that were fixed.


Oh yeah, did I mention, sometimes we're even able to remediate vulnerabilities before scanners are aware of them?! Good times.


Why This Shit Is Hard


First, this is hard just because it's so darn complex. There are tens of thousands of known vulnerabilities each year, tracked in multiple vulnerability databases, feeding into dozens of supported scanners, scanning and matching thousands of package-versions in our 1400+ images and counting, built from many different tools in various languages. Chainguard builds thousands of images every day, and scans them constantly to note when vulnerabilities are found and fixed, so the advisory feed updates frequently – there were more than 22,000 advisories in the last six months alone! Making sure it all works smoothly is a lot to keep track of, to say the least!


In part it's hard because we want to support as many scanner implementations as our customers may ever use, and they use a lot of them. Sure, it would be much easier on us if Chainguard controlled the whole end-to-end scanning experience, and provided a scanner that said everything was fine, or at least had one simpler well-supported path end-to-end. But we think that's a bit like grading your own exam. You shouldn't trust us when we say a vulnerability is fixed — you should trust whatever scanner you use, and we should support that as best we can.


Finally, it's hard because it's important. When Chainguard updates its advisory feed to say that a vulnerability is fixed, scanners trust that, and customers trust that. A bad-faith supplier could do much less work and just trivially mark every vulnerability as fixed. But even more likely, all that aforementioned complexity could easily breed subtle bugs, that lead to mistakes, that lead to a fundamental loss of trust among our customers, and among our scanner partners.


If you’re curious which scanners we’re working with already, check out our scanner integration page. And if you want to learn more about how Chainguard products can help your organization reduce engineering toil and improve security, get in touch today.

Share

Ready to Lock Down Your Supply Chain?

Talk to our customer obsessed, community-driven team.

Talk to an expert