All articles

This Shit is Hard: Getting software to run on robots (and then getting the robots to work)

Eric Timmons, Chief Engineer, Asylon Robotics

Eric Timmons is Chief Engineer at Asylon Robotics, where he leads software development across autonomous ground and air platforms, cloud infrastructure, and CI/CD systems.


Chainguard’s “This Shit is Hard” series showcases the difficult engineering work we’ve tackled to deliver best-in-class outcomes for customers using our products. Now, we are expanding this series to include the difficult engineering work our customers do to provide real-world outcomes for their businesses. Today, we’re discussing how Asylon Robotics balances hardware and software to deliver robots and drones for its commercial and government customers.


Most of my career has been spent on robots that go places humans don't want to - or can’t - go: flying platforms, ground vehicles, and eventually autonomous underwater systems, including a deployment to explore underwater volcanic vents off the coast of Santorini. Each environment has its own physics, its own failure modes, its own set of things that will ruin your day in ways no simulator predicted. What I've learned from all of it is that the closer software gets to physical hardware, the less any of your assumptions hold.

I've been with Asylon since 2015. Asylon builds and integrates robotic dogs and drones for perimeter security, including commercial deployments at corporate campuses, third-party logistics facilities, and critical infrastructure sites, as well as government contracts. I'm Chief Engineer, which in practice means I own software for the robot dog and the drone, our CI/CD pipelines, cloud infrastructure, and, increasingly, on-premises infrastructure for customers who can't or won't rely on the cloud.

People who haven't worked in this space tend to think the hard part is the software. It is. But the hardware has its own hard, and the two problems don't stack. They compound.

The embedded software problem isn't just software

The robot platforms we deploy on are sophisticated pieces of hardware maintained by their own manufacturers. Boston Dynamics' Spot, our primary commercial platform, is about as current as you're going to find in embedded systems. But "current" in embedded hardware is a relative term. Embedded systems frequently require integrating with older software versions and dependency chains you don't control.

We've dealt with this carefully. Asylon firewalls its systems so that, even if a third-party hardware platform is compromised, an attacker can't use it as a hop point into our own infrastructure. It's a threat model that's easy to wave away until it isn't.

The dependency problem runs deeper than security. When you're integrating with hardware at this level, you're often stuck with specific library versions, kernel configurations, and build toolchains. The security posture of the stack you can actually use is constrained by the hardware, not by what you'd choose in a greenfield environment. You patch what you can and build walls around what you can't.

LTE is the deployment model. LTE is also fragile.

For nearly all our commercial customers, the robot operates over long-term evolution (LTE) cellular communications. This is by design. Asking a customer to put a robot on their corporate network involves security reviews, firewall exceptions, procurement cycles, and conversations that can stretch for months. LTE sidesteps that complexity. The robot shows up, the robot works.

What LTE doesn't sidestep is the physical world. One of our sites sits near a concert venue. When a show goes on, we know it immediately: LTE bandwidth degrades as tens of thousands of people all try to use their phones simultaneously. The team has to switch carriers in real time to find usable bandwidth and keep the robot operational. It's a very specific kind of alert: not a software alert, not a hardware alert, but a Taylor Swift alert.

The deeper problem is over-the-air (OTA) updates across a distributed fleet. Pushing software to a robot over a congested LTE connection, reliably, without bricking the device, is genuinely hard. Container image size matters here in a way it doesn't in a data center. We use Chainguard Containers and have seen a non-negligible improvement in OTA update performance as a result. Smaller, more tailored images transfer faster over constrained connections.

There is no substitute for real hardware

We have drone simulation environments. They're useful. They are not the same as real hardware in a real environment, and pretending otherwise will cost you.

We test at our HQ facility before anything goes near a customer site. We keep a stockpile of robot dogs and drones specifically for pre-deployment testing. Our systems use A/B partitioning across all hardware: every device can switch between the new release and the previous one, which means a bad deploy is typically recoverable in the field without a technician on-site.

But no process eliminates the problems that only surface in production. The checklist helps. The test environment helps. And then you deploy to a site, something happens that wasn't on your checklist, and you learn.

USB 3 eats GPS

A great example of why testing in real life is key is in a GPS rollout. When we started rolling out GPS to more of our robots, development on an early hardware revision went smoothly. We validated, tested, and moved to the latest hardware revision. Everything failed.

The root cause took time to find: an unshielded USB 3 cable running approximately six inches below the GPS antenna. USB 3 electromagnetic emissions are strong enough to destroy GPS reception at that distance. No software test would have caught this. It's not in any linter. There's no unit test for "Does the physical cable placement in this hardware revision interfere with RF reception?"

The fix involved slightly moving the antenna and adding extra shielding. This is where it got complicated.

Form factor is a design constraint, and design constraints have downstream effects

Asylon is deliberately conservative about the physical size and appearance of our robots, especially the robot dog. When a robot is patrolling a corporate campus or a parking lot, aesthetics are a real requirement. A robot that looks like it belongs in a factory, wearing a roll cage, is going to create friction with the people and customers at the sites we're deployed to. That friction has business consequences.

Aesthetic discipline creates engineering consequences. Smaller compute systems run hotter. Thermal management is a real engineering problem. And the form-factor constraint that keeps the robot looking clean is the same one that put the GPS antenna close to the body in the first place, which is what made the USB 3 interference so difficult to work around.

For compute, we typically run on an NVIDIA Jetson Orin NX, a small-form-factor device with GPU capabilities for edge classification. More powerful modules like the AGX or Thor are too large and draw too much power for our use case. So we run local classifiers on board for what needs low latency, and stream video to the cloud for more intensive processing. That split is driven by physics, not preference.

Get into the real world fast

If you're coming into embedded systems or robotics from a more traditional software background, here's what I'd tell you: low-level tinkering builds intuition that you can't get any other way. Understanding how software and computers actually work, not just what the abstractions promise, matters enormously when the computer is attached to a physical object moving through the real world.

And get into production as fast as you can, even if the product isn't perfect. The USB 3 and GPS problem happened because we shipped real hardware to real environments and learned something we could not have learned in simulation. Every real-world deployment changes how you design everything that follows.

The problems are hard. They stay hard. But the only way through them is through them.

Share this article

Related articles

Want to learn more about Chainguard?

Contact us