8 mai 2026

Cyber resiliency in practice: Lessons from recent supply chain attacks

Mike Behrmann, Director, Cyber Resiliency

What cyber resiliency actually means

"Cyber resiliency", while often referenced, has a precise definition worth holding onto: an organization's ability to anticipate, withstand, recover from, and adapt to attacks on cyber-enabled systems. The operative word is and. Resilience is the full lifecycle — Protect, Detect, Respond, and Recover — executed as an integrated capability, not four separate checkbox exercises.

Attack Case Studies

Before we dive into the lessons and best practices, here are some examples of recent software supply chain security attacks.

Trivy

In late February 2026, a threat actor known as TeamPCP (also tracked as DeadCatx3, PCPcat, ShellForce, and CipherForce) exploited a GitHub Actions misconfiguration in the Trivy project to extract a privileged access token that survived incident remediation. On March 19, TeamPCP force-pushed 76 of 77 version tags in aquasecurity/trivy-action and all 7 tags in aquasecurity/setup-trivy, silently redirecting trusted version references to malicious commits. None of the techniques were novel. Long-lived tokens, incomplete credential rotation, mutable Git tags, service account abuse — these are all well-documented. That's exactly what makes this incident instructive.

Axios

In late March 2026, the npm ecosystem was hit by a major supply chain attack targeting axios. A DPRK-aligned threat actor gained access to the lead maintainer's npm account through a targeted social engineering campaign and published two malicious versions, axios@1.14.1 and axios@0.30.4, directly to the npm registry. Both the latest and legacy dist-tags pointed to the compromised versions, meaning a default npm install axios would resolve to a backdoored release. The malicious versions injected a dependency called plain-crypto-js@4.2.1, whose sole purpose was to execute a postinstall script that contacted a command-and-control server to deliver second-stage payloads. Afterward, the malware deleted itself and replaced its own package.json with a clean version to evade forensic detection.

LiteLLM

On March 24, 2026, LiteLLM, a Python proxy library for routing requests across LLM providers with roughly 3.4 million daily downloads, was hit by a supply chain attack. Threat actor TeamPCP obtained the maintainer's PyPI publishing credentials through a prior compromise of Trivy, an open source security scanner used in LiteLLM's own CI/CD pipeline. The compromised versions, litellm==1.82.7 and litellm==1.82.8, were live on PyPI for approximately 40 minutes before being quarantined. The malicious versions deployed a three-stage payload: a credential harvester targeting over 50 categories of secrets, a Kubernetes lateral-movement toolkit, and a persistent backdoor that provides ongoing remote code execution. Version 1.82.8 also included a malicious .pth file that executed automatically on every Python process startup, uploaded directly to PyPI, and bypassed the normal release process entirely.

Namastex

In late April 2026, multiple npm packages from Namastex Labs, an agentic AI tooling company, were compromised in an active supply-chain campaign. The first poisoned release surfaced on April 21, 2026, when a tainted version of pgserve hit the registry, quickly expanding into @automagik/genie, @fairwords/websocket, @fairwords/loopback-connector-es, and other packages, with at least 16 confirmed compromised packages in total. The injected malicious code collects tokens, credentials, API and SSH keys, and secrets for cloud services, CI/CD systems, Kubernetes, and LLM platforms, exfiltrating stolen data to both a conventional webhook and an ICP canister endpoint. And crucially, it extracts npm tokens from the victim's machine, identifies packages they can publish, injects the payload into those packages, and republishes them, creating a self-propagating worm. Security research firm Socket noted a strong overlap in techniques with TeamPCP's earlier CanisterWorm attacks, but stopped short of attributing them with confidence given the available evidence.

First principles

When incidents like the recent Trivy or axios attacks occur, it’s important to adhere to these key cyber resilience principles.

Self-organization

Technical controls matter far less when the organizational substrate to act on them doesn't exist. Before the next incident, your organization needs a recognized incident event commander with pre-delegated authority, genuine ELT buy-in (not just a signed policy), and broad stakeholder representation across legal, comms, and engineering. Communication architecture matters too: run a leadership channel focused on business impact and decision points in parallel with a technical channel focused on investigation findings. Collapsing them degrades both.

Practice and readiness

An IR plan that has never been exercised is at best a hypothesis. Incident Readiness Reviews surface gaps in tooling, runbook accuracy, and access provisioning before an incident forces them into the open. Tabletop exercises stress-test the human dimensions: decision-making under uncertainty and cross-team communication. The Trivy incident highlights a gap that tabletops routinely expose: incomplete credential rotation under time pressure. A tabletop scenario asking "We think we've rotated everything… have we?" would have been uncomfortable. It would also have been cheaper than what followed.

Measure twice, then once more, and only then cut

Pressure to respond quickly is real and often counterproductive. Incomplete rotation under time pressure and premature scope declarations are both symptoms of moving faster than your evidence supports.

Know your own network architecture, including your SDLC

Most organizations have reasonable visibility into production. Far fewer have mapped their software delivery pipeline with the same rigor. CI/CD infrastructure, service accounts, build-time secrets, and artifact registries are all attack surfaces.

Wisdom is learning from other people's mistakes, not just your own

The techniques TeamPCP used were not new: the tj-actions/changed-files compromise earlier in 2025 (CVE-2025-30066) exposed the same credential-in-CI risk class, affecting over 23,000 repositories. The organizations least affected by Trivy were those that had already acted on prior warnings.

Protective policies and controls

There are several policies and controls you can implement to further protect yourself again software supply chain compromises.

OpenID Connect for GitHub Actions

The Trivy compromise was fundamentally a credential problem. Static, long-lived tokens are copied into secrets stores, exported to logs, and forgotten. When they’re stolen, they persist until explicitly revoked. OIDC for GitHub Actions replaces them with short-lived federated credentials scoped to a single workflow run. There is no token to steal. As Chainguard CEO Dan Lorenc put it: "If you only do one thing, delete all of your tokens and replace them with federated short-lived auth."

Signed commits

Signed commits provide cryptographic evidence of authorship. In a service account compromise, a gap in signing continuity is a detectable signal.

Branch and tag protection; no admin bypass

The force-push of 76 version tags was possible because the repository permitted it. Branch and tag protection rules that apply to administrators as well (not just standard users) would have meaningfully constrained what TeamPCP could do, even with valid credentials.

Source code provenance

Git tags are mutable by design, and the Trivy incident showed exactly why that matters. Any workflow pinned to a tag name gets whatever commit that tag currently points to. Pinning to cryptographic commit SHAs and verifying provenance through signed attestations ensures you're consuming what was actually built and reviewed. Chainguard customers running Trivy through provenance-verified images were unaffected by the attack because the digest they resolved against had not changed.

DNS sinkhole

The Trivy infostealer needed to call home. DNS sinkholes are a well-established control that intercepts queries to known-malicious or newly registered domains before C2 communication can be established. Critically, a sinkhole doesn't just block. It also generates a detection event. A workload that suddenly queries suspicious infrastructure it has never touched before is a signal worth investigating, even before any other indicator is available.

“Pwn Request” protection

GitHub Actions' pull_request_target trigger runs workflows with write permissions and access to secrets, even when the triggering PR comes from a fork. This is the "pwn request" vulnerability documented by Endor Labs: an attacker submits a PR that causes a privileged workflow to execute attacker-controlled code. Auditing all public repo workflows for this trigger pattern closes an entire class of initial access paths.

Technical Investigation

Automated Threat intelligence with automated retro hunt

Early detection of supply chain compromise events comes from teams already subscribed to feeds, both open (Step Security, Endor Labs, Upwind) and commercial, with automation that applies new IOCs to historical telemetry. The gap between "this happened" and "this affected us" compresses from days to minutes.

AI-augmented investigation

It’s mid 2026 as of this writing. If you're still reliant on manual analysis, you’re doing it wrong. Embrace agentic AI as an incident response force multiplier.

GitHub log ingestion

GitHub's audit logs are among the most underutilized data sources in security programs. Tag force-pushes, workflow runs, secret access, and unusual service account activity are all recorded. Ingesting these into a SIEM and building detections against high-risk event types provides the CI/CD visibility most endpoint-centric programs lack entirely.

Measure the window of exposure

Correlate GitHub audit log timestamps, artifact registry publish events, domain registration dates from WHOIS, and available OSINT to construct a precise forensic timeline. Domain registration dates establish when C2 infrastructure was stood up relative to the initial compromise. Pre-positioning looks different from improvisation.

Cryptographic hash comparison, not tag resolution

Determining whether a given environment was affected required comparing the commit SHA resolved at build time against the known-good hash, not the tag name. Compromises can be selectively applied across registries: the Docker image, GitHub release, and third-party repo may carry different payloads. Hash comparison is the correct verification primitive. For binary analysis, malcontent — Chainguard's open source tool — provides behavioral diff capability, surfacing what new capabilities were introduced between a known-good and suspect binary without requiring full reverse engineering.

Evidence of execution, not just binary discovery

Finding a malicious binary on disk measures exposure, not impact. Shell history inspection and running strings against the binary for embedded IOCs establishes capability without requiring execution. C2 traffic analysis (DNS query logs with source-process attribution and URL filtering logs) provides direct evidence of execution. Data connection analysis, including active and historical connections with transfer sizes, establishes whether exfiltration occurred. User correlation fills gaps where automated evidence is ambiguous.

CI/CD visibility and anomaly detection

Tracking dependency versions at build time, alerting on unexpected changes to workflow files or pinned actions, and detecting deviations from normal pipeline execution patterns catch compromises before they reach production. The Trivy incident exposed a detection gap most security programs share: CI/CD pipelines are largely invisible to the tools that monitor everything else.

Streamlined credential revocation

If revoking a service account credential requires a ticket and a change window, your containment speed is structurally limited. Revocation capability (both the technical mechanism and the organizational permission to use it) needs to be pre-authorized and tested before you need it.

Cyber resiliency is the key

Resiliency starts with prevention. A health organization cannot rely solely on incident response. Prevention shrinks an organization’s attack surface to a more manageable size. The Trivy incident and subsequent supply chain compromises are not an argument against open source tooling or GitHub Actions. It is an argument for treating provenance, integrity, and credential hygiene as first-class engineering concerns rather than afterthoughts. Chainguard’s “secure by design” product catalog makes prevention easy.

Get in touch with our team to learn more about our cyber resilience practices and how we can help you sleep better at night knowing your open source artifacts are safe.

Share this article

Vous souhaitez en savoir plus sur Chainguard?

Contactez-nous