Home
Unchained
Product Blog

Announcing Chainguard Libraries for Python: Malware-Resistant Dependencies Built Securely from Source

Jason van Zyl, Senior Manager, Engineering, and Patrick Smyth, Staff Developer Relations Engineer

We’re excited to announce the early access release of Chainguard Libraries for Python, a malware-resistant index of Python dependencies built securely from source. Chainguard Libraries for Python enables application security teams to mitigate the growing number of serious malware attacks on popular Python projects, like Ultralytics and PyTorch, at the build and distribution stages of the open source software supply chain. This release of a secure, malware-resistant index of Python libraries represents the latest milestone on our mission to build the safe source for open source and comes closely on the heels of our March launch of Chainguard Libraries for Java.


Chainguard builds the entire dependency tree for every library from source in the Chainguard Factory, our SLSA L2-certified infrastructure with verifiable provenance. And importantly, Chainguard is tackling not only pure Python libraries, but also Python wheels that include native code or re-bundle unknown, untraceable versions of operating system libraries like OpenSSL without any transparent package metadata. Here, Chainguard stands alone in isolating and rebuilding native dependencies and bundled libraries from source to combat malware, guarantee traceable provenance, and ensure that the Python ecosystem is simpler for application developers and platform engineering teams to navigate. Chainguard Libraries for Python retain full compatibility across Linux operating systems (Chainguard OS, RHEL, Debian, and Ubuntu). So while we recommend using Chainguard Containers for additional operational security, Chainguard Libraries are compatible with a wide variety of popular container providers, operating systems, and host environments.


With Chainguard Libraries for Python, enterprise application security teams now have a secure, standardized source for their engineers to safely consume language libraries without compromising supply chain security. In this blog post, we dive into the motivations behind building Python libraries from source and the value we’re delivering to customers.


Combating Malware in the Python Ecosystem


Supply chain attacks targeting the Python ecosystem are growing in both frequency and severity. There are three main root causes for these supply chain attacks:


  • There are many points of attack along the open source software supply chain, with the build and distribution stages being most susceptible.

  • Traditional delivery mechanisms for open source Python libraries (i.e., public registries) are oriented for publisher convenience – at the cost of sacrificing enterprise-grade security.

  • Contrary to popular belief, public package indices do not thoroughly vet their hosted artifacts or provide assurance these artifacts actually match their source code.


Figure 1: The package lifecycle and types of supply chain attacks.

Two high-profile attacks have shaken the Python ecosystem and pushed organizations to look for a safer, more secure mechanism for language libraries consumption. First, in January 2023, a nightly build of PyTorch was compromised because it downloaded a bad torchtriton dependency. When that bad torchtriton package was imported, a malicious binary started exfiltrating sensitive data. This was a directed malware attack against one of the most widely used projects in AI/ML. 


Second, in December 2024, a compromised GitHub Actions workflow and a subsequently leaked PyPI API token resulted in two malicious versions of the Ultralytics YOLO library being published to PyPI. This was another directed malware attack that affected one of the most widely used computer vision projects (~61M+ annual downloads). And these are just two of the many attacks at the build and distribution stages of the software supply chain.


Figure 2: Examples of supply chain attacks at each stage of the package lifecycle.

To combat malware and supply chain attacks in the Python ecosystem at their root, Chainguard is taking a fundamentally different approach: rebuilding the most popular Python packages – and their full dependency trees – entirely from source in our SLSA L2-hardened environment. In securing every stage of the supply chain – source ingestion, build, tests, and distribution – Chainguard substantially reduces the risk from supply chain threat vectors like hijacked build processes, tainted release pipelines, and compromised distribution points. Chainguard’s approach ensures that the distributed package and the source code match, so application developers can safely consume Python dependencies while application security teams can fill a critical gap in their tool kit to combat malware. To verify our malware-mitigation thesis, Chainguard analyzed ~3k malicious Python packages sourced from the Backstabber’s Knife Collection. Our early results showed that ~98% of these malicious libraries would have been avoided by enterprises relying on Chainguard Libraries as their sole source for Python dependencies. 


Here’s what Chainguard customers, like MAN Energy Solutions and Paylocity, have to say about the promise of Chainguard Libraries for Python:


“MAN Energy Solutions enables its customers to achieve sustainable value creation in the transition towards a carbon neutral future. As a global provider of large-scale industrial machinery and energy solutions, software supply chain security is a top priority,” Carsten Skov, Senior DevOps Engineer, MAN Energy Solutions. “Chainguard Containers have already helped us ensure that our containerized analytics workloads are built and run securely by default. Now, we’re excited about the potential of Chainguard Libraries for Python to further strengthen our software supply chain by mitigating the risks posed by unverified dependencies and malware in the Python ecosystem. Securing these workloads plays a key role in ensuring that the MAN-CEON Digital Ecosystem continues to meet the requirements of ISO/IEC 27001:2022 and ABS Cyber Safety Certification.”


“At Paylocity, application security is core to the modern HR, payroll and spend management software we’re building,” said Joe Christian, Senior Engineering Manager, Application Security at Paylocity. “Chainguard already helps us reduce our attack surface while giving our teams confidence in what they’re shipping. We see promise in Chainguard Libraries for Python to ensure developers can build securely from the very first line of code.”


Isolating Native Dependencies to Simplify Python Packaging


Python is a complex ecosystem where many projects include native code for performance optimization. This reliance on C/C++ extensions also creates a tight coupling between the compiled binary wheel and the host OS – a delicate marriage that often introduces friction for Python developers. Suddenly, your Python developer is spending less time building products and platforms and more time as a build-system janitor just to keep their Python applications running.


To get around these challenges, developers and platform engineers often bundle the exact native dependencies their Python library requires directly into the wheel. However, manually curating your own mini OS for every library to divorce Python from its native requirements is labor-intensive, brittle, and risky. A single missed patch or mismatched compiler flag can trigger crashes, dramatically slow down shipping velocity, and dis-incentivizing developers from upgrading their libraries – which means they don’t consume the newest features, security patches, and performance optimizations.


With Chainguard Libraries for Python, we have done the hard work to identify and build the appropriate system dependencies directly from source. That means we isolate, rebuild, and pin every system dependency to ensure the best developer experience. In eliminating complexity, Chainguard makes secure and efficient development the easy choice.


The Python Ecosystem and Software Dark Matter


In many cases, project maintainers themselves re-bundle the required native libraries to simplify the end developer experience; however, by baking in unknown and untraceable binaries directly into Python wheels without providing the corresponding metadata, project maintainers inadvertently obscure these binaries from downstream developers. At Chainguard, we call these obscured components “software dark matter” because they cannot be identified by SCA scanners, which means that the vulnerabilities they carry go undetected in your production systems (noted here by the Python Software Foundation).


By identifying, isolating, and rebuilding vendored (i.e., re-bundled) dependencies from source, Chainguard mitigates software supply chain attacks even in the software dark matter your scanners can’t see. This approach allows us to provide verifiable provenance for the source origin of every artifact in your production systems.


Getting Started with Chainguard Libraries for Python


We’re excited to hear your feedback as you begin building with Chainguard Libraries for Python, which retain full compatibility across Linux operating systems (Chainguard OS, RHEL, Debian, and Ubuntu). Your feedback plays a key role in shaping Chainguard’s future plans to develop additional capabilities to deliver even more value to customers.


If you’d like to learn more about how Chainguard Libraries can transform your software supply chain, reach out today. Existing Chainguard Containers customers can get started with Chainguard Libraries by reaching out to your account teams and exploring our documentation.

Share

Ready to Lock Down Your Supply Chain?

Talk to our customer obsessed, community-driven team.

Talk to an expert