August 21, 2025

Guest Post: Resiliency by Design and the Importance of Internal Developer Platforms

Gaurav Saxena, Director of Engineering, Automotive Company

At Assemble 2025, I had the privilege of sharing insights from my journey in platform engineering — particularly how internal developer platforms (IDPs) can be the unsung heroes of cloud resiliency. Resiliency by design is essential. As someone who’s spent a good deal of time in the automotive software world, I’ve seen firsthand what happens when systems buckle under complexity. My goal in this talk was simple: to reframe how we think about internal platforms as a critical layer of infrastructure stability.

The Problem We Don’t Talk About Enough

In today’s cloud-native ecosystem, we’ve embraced speed and scale, often at the expense of simplicity. Over the years, I’ve watched toolchains multiply, ownership fragment, and outages increase. The result? Skyrocketing MTTRs, reputational hits, and teams spending more time firefighting than building.

If that sounds familiar, you’re not alone. I believe the first step toward solving this is to acknowledge the cost of complexity. From service sprawl to siloed teams, it’s become clear that something foundational has to change.

The IDP: A Platform for Stability, Not Just Developer Happiness

I’ve seen many companies treat internal platforms like convenience layers for devs. But if you look under the hood, a well-designed IDP is your best defense against chaos.

Here’s how I define a good IDP:

A centralized development environment that supports consistency
Seamless CI/CD integration and automation
Declarative infrastructure, because humans shouldn't manage config drift
Built-in observability and governance
And most importantly: not opinionated about your stack, just about your outcomes

At my organization, we think of the IDP as an orchestration layer across everything—from Git to Kubernetes to cloud providers. It acts as both the interface and control plane, making it easier for engineers to ship while ensuring platform teams can uphold resilience and security.

Making the Right Things Easy

When I talk about “resiliency by design,” what I really mean is: you can’t bolt on good behaviors later. You need to bake them into your platform from day one.

So we’ve focused on:

Standardized disaster recovery and failover processes
Safe-by-default deployment and rollback strategies
Centralized observability to catch issues early
Guardrails, not gates, that guide teams toward resilient practices

Our philosophy? Build a platform where the easy path is also the right path.

Why Abstraction Is the Superpower

I often say: developers don’t think in terms of infrastructure. They think in terms of applications. And if we want to enable them to move fast and stay safe, we have to meet them where they are.

That’s why we’re big fans of abstraction models, like the Open Application Model. It gives us a way to expose stable APIs for deployment and operations—things like scaling, identity, traffic management—without tying ourselves to any specific cloud or tool. We can evolve the underlying infrastructure without needing every team to coordinate or relearn everything.

Using Chainguard’s Secure Software Factory

I’ve been closely following the work Chainguard is doing to build a secure software supply chain, and I’ve incorporated many of their tools and practices into the way we run CI/CD at my organization.

Chainguard’s approach, the Chainguard Factory, offers a blueprint for how to bake trust and resilience into every part of the build and deploy lifecycle. Tools like melange and apko let us build minimal, verifiable OCI images from source packages, and then sign and attest those artifacts before they’re deployed.

Here’s how we leverage it:

Our pipelines use Chainguard’s container images as trusted, minimal base layers.
We enforce image signing and attestation policies in our Kubernetes clusters via admission controllers.
Runtime policy enforcement ensures that only images built through our pipelines get deployed.

What I love about this model is how it integrates security directly into the development workflow. Developers aren’t forced to think about image provenance or signature enforcement—it’s all handled as part of the platform. And since the CI/CD engine is programmable, we can apply different policies based on risk levels, environments, or change type.

This process spans from source control to identity services, scan evaluation, policy validation, and secure deployment. It’s an end-to-end system that brings clarity, control, and resiliency by default.

Final Thoughts

At the end of the day, our job as platform engineers isn’t to control every variable. It’s to create an environment where resilient outcomes are the natural result of everyday work.

Chainguard has made that much easier. And if there’s one message I wanted people to take away from my talk, it’s this: don’t wait for your first major incident to start thinking about resiliency. Design it in from the start. Your developers and your customers will thank you.

Share this article