Software dark matter is the enemy of software transparency
The universe and software might be more similar than you imagine. The universe is 27 percent dark matter, invisible matter that we can’t detect with telescopes and radios. Much like regular dark matter, software dark matter comprises packages that exist but which are effectively unseen, software that is untracked by typical tools like a package manager or a software bill of materials (SBOM). According to our own estimates examining several hundred popular open source software containers, software dark matter constitutes 32 percent of analyzed containers.
And just like cosmological dark matter poses problems for understanding the universe, software dark matter complicates the job of anyone seeking software transparency—that elusive goal currently associated with SBOM enthusiasts who seek a world in which complete and accurate knowledge about software is normal. Unfortunately, software dark matter has more tangible effects on software users than the physical equivalent: the more software dark matter present in a container, for instance, the more challenging it is for software analysis tools to find and correctly identify that software. And when software analysis tools can’t correctly identify software, there’s a greater chance that scanning tools will fail to find software vulnerabilities that are present, undermining one of the central goals of software transparency.
TL;DR
We performed an analysis to quantify the percentage of files within 350 popular open source software containers that are software dark matter. The analysis used a tool that we wrote and open-sourced, darkfiles, for measuring software dark matter. The findings include:
Popular open source containers are, on average, composed of 32 percent software dark matter. Using an average weighted by the number of files in a container, the estimate rises to 63 percent. These percentages suggest that software dark matter is a pervasive phenomenon and that software developers will need to find methods of either avoiding or coping with software dark matter.
Approximately 30 percent of the analyzed containers included less than one percent of software dark matter. The data and analysis suggests that building and using containers without blind spots is possible but not uniformly practiced.
After further defining software dark matter and presenting and analyzing a software dark matter dataset, this piece calls for reducing software dark matter to enable software transparency.
What Is Software Dark Matter?
Software dark matter refers to files that are not tracked by operating system (OS) package managers (like `apt` or `apk`), which renders these files and the packages they represent invisible—or at least complicated to find—to software composition analysis and security scanning tools. Tools like darkfiles can therefore be used to perform a straightforward calculation: what percentage of files are tracked by the underlying OS package manager.
Why Does Software Dark Matter Matter?
Software dark matter makes the job of software analysis tools harder, both conceptually and technically. This matters because when software analysis tools fail to find and correctly identify software components, then, most importantly, it becomes more likely that security scanning tools fail to flag known software security vulnerabilities. In addition, when software analysis tools fail in their function, it also enables attackers to slip in malicious, unwanted software.
It’s like finding and identifying goods shipped on a container ship but not placed in a shipping manifest: these goods are likely to be overlooked and treated as second-class cargo likely to be forgotten. Of course, there are technical tricks that scanners and other tools can use to find this dark matter, but it’s a complicated endeavor in comparison to checking a package manifest list. The implication is that SBOMs and other means of representing dependency information will likely be incomplete and wrong in a world of pervasive software dark matter, which raises the question…
How Much Software Dark Matter Is in Popular Open Source Containers?
Before advocates of software transparency declare war on software dark matter, it’s worth understanding how common software dark matter is. A reasonable starting point, though not the final word, is an assessment of the most popular open source containers on Docker Hub. This set of software artifacts represents containers that are commonly used and very likely underpin a wide set of important software applications. This analysis therefore selected 350 containers from among the 1000 most popular container images (script for collecting popular images). These 350 images had either an Alpine-based or Debian-based operating system, a requirement imposed by the current implementation of the darkfiles tool (script for identifying OS for containers). All images were then analyzed with darkfiles.
Figure 1 represents the percentage of software dark matter (along the horizontal axis) in this sample of popular container images. The vertical axis represents the percent of this sample that has a particular amount of software dark matter.

The software dark matter graph reveals that approximately thirty percent of the images in this popular Dockerhub image sample have less than one percent software dark matter. Some containers are therefore already building images with little to no software dark matter, although the practice appears to be far from widespread.
While there is a concentration of containers with high (90 percent or more) software dark matter, the distribution is relatively even, with a wide range of software dark matter percentages. Treating each contained equally, the mean software dark matter percentage is 32 percent and the median is 10 percent. If, however, containers are weighted by the number of files in the container, the mean software dark matter percentage is 63 percent.
It bears mentioning that many aspects of software dark matter are still unexplored, including what explains the prevalence of this phenomenon and whether these findings are similar across programming language and package manager ecosystems. On a more technical note, our analysis didn’t consider non-system package managers like pip: we don’t know what fraction of this dark matter these tools detect.
Less Software Dark Matter → More Software Transparency
Software transparency is rightly en vogue. Companies and individuals alike have experienced the downsides of depending on inscrutable software and want reliable means to detect known vulnerable software components and avoid tampering. SBOM advocacy epitomizes this demand for software transparency. And while advocates of software transparency acknowledge a wide range of challenges, less appreciated is that software dark matter, whether in containers or elsewhere, will pose a challenge for software transparency.
Fortunately, software dark matter need not be a permanent obstacle, though we’ll leave approaches to reducing or coping with software dark matter to another day. In short, SBOM advocates, and anyone else committed to software transparency, already have a lot on their plate, but unfortunately it’s time to add yet another challenge: software dark matter.
Share this article
Related articles
- Research
Engineers Want to Build, Not Maintain: Key Findings From Our 2026 Engineering Reality Report
Chainguard surveyed 1,200 engineers and technology leaders to better understand the state of the developer experience today and where teams can improve.
Dustin Kirkland, SVP of Engineering
- Research
The Hidden Costs of CVEs — And the Value You’re Leaving on the Table
Chainguard evaluated the amount of money customers are saving and unlocking by utilizing Chainguard Containers as their secure container image solution.
Ed Sawma, VP of Product Marketing
- Research
Mitigating Malware in the Python Ecosystem with Chainguard Libraries
Chainguard recently found that 98% of malicious Python libraries from the Backstabber's Knife Collection could be avoided by using Chainguard Libraries.
Aaditya Jain, Senior Product Marketing Manager
- Research
Panic! At The Distro: A Study of Malware Prevention in Linux Distributions
Chainguard wanted to know more about malware prevention in Linux distributions. So we did a study to see what maintainers are doing about it. See the results.
Duc-Ly Vu, Trevor Dunlap, Paul Gibert, John Speed Meyers, and Santiago Torres-Arias
- Research
Why AI developers are grumpy about containers
Chainguard interviewed AI developers about containers and how they are used in the creation of AI applications. See the results.
John Speed Meyers, Head of Chainguard Labs, and Dan Fernandez, Staff Product Manager
- Research
FuzzSlice: Separating real CVEs from fakes through fuzzing
Chainguard Labs explores FuzzSlice, a novel fuzzing technique, to improve vulnerability remediation by distinguishing exploitable CVEs from false positives.
Aniruddhan Murali, Chainguard Labs Research Intern