Building with AI

The First Step on Your AI Journey Is Encoding What You Already Know

Every company has this person. The engineer who every production change flows through because they're the only one who knows the constraints. If you're thinking about putting AI agents anywhere near your infrastructure, encoding what that person knows is the first problem to solve.

Paul Stack

02 Jun 2026 • 5 min read

The whiteboard knows everything. The system should too.

Every company has this person. The engineer who every production change flows through because they're the only one who knows the constraints. They know which regions are safe to deploy into and which ones have been flaky since that incident last March. They know the services have to restart in a specific order or the queue backs up. They know there's a resource limit that isn't documented anywhere because it was discovered during an outage three years ago and the fix was a Slack message that's long since buried.

If you've read The Phoenix Project, you know this person as Brent. The engineer who's involved in everything because nothing works without them. The book treated it as an organisational problem to route around. It is. But the deeper problem is that Brent's knowledge has no home outside of Brent. And if you're thinking about putting AI agents anywhere near your infrastructure, that's the first problem to solve.

They're not slow. They're essential. And that's worse, because there's no obvious fix. Every change needs their eyes on it because nobody else carries the full picture of how production actually behaves.

If you've been that person, you know what it's like. You can't take a week off without your phone buzzing. You can't focus on the platform work you've been wanting to do for six months because someone needs you to check a deployment. "Can you just look at this before we ship it" becomes the soundtrack of your week.

What undocumented knowledge actually looks like

It's a mix of things, and that's what makes it hard to deal with.

Some of it is hard constraints. The parameters within which systems can be safely deployed. Which availability zones are reliable. What the maximum replica count is before you hit account limits. Which database migrations require a maintenance window. These are facts someone discovered, verified, and now carries around.

Then there's the stuff you only learn from incidents. The payment provider's API has aggressive rate limits that aren't in their documentation. The monitoring system takes four minutes to register new instances, so don't page on the gap. A specific microservice will deadlock if it receives traffic before the cache warms up. There's a config value that looks like it should be ten but has to be seven because of something that happened eighteen months ago that nobody wrote down.

In most organisations, all of this lives in the heads of one or two people.

The solutions that don't quite work

Teams try to solve this. Write it down. Put it in the wiki. Build a runbook. Everyone attempts it.

It works for about a month. Then someone updates a resource limit but doesn't update the wiki. Someone discovers a new constraint during an incident at 3am and fixes it but never documents it. The runbook says step three is optional but it hasn't been optional since February, and the person who knows that is on holiday. Documentation describes what was true when someone last bothered to update it.

Some teams go further. They build internal APIs and remote datasources that expose deployment parameters. You can query the system for the safe regions, the resource limits, the config values. That's better. But you still need to know enough about the internals to ask the right questions. The knowledge doesn't transfer. It just gets a thinner wrapper.

The ops engineer is still the bottleneck. They've gone from being the person who checks every change to the person who maintains the APIs that check every change. The knowledge is still in their head. It's just been proxied.

What encoding actually means

Documenting knowledge and encoding it are different things.

A wiki page says "don't deploy to us-east-1a." Nobody reads it until after the deployment fails. An encoded constraint just prevents it. A runbook says "the payment service restarts before the gateway." An encoded workflow enforces that ordering and won't let you skip it, even if you've never seen the runbook. A Slack message from 2024 says "max seven concurrent jobs against the billing API." An encoded schema validates the parameter before execution begins.

Think about what happens at 2am when someone who joined three months ago is handling their first real incident. They haven't read the runbook. They definitely don't know about the Slack message. If the constraints are encoded, the system won't let them make the mistake regardless.

An agent operating against the same extensions gets the same safety net. It discovers what's available, what parameters exist, what constraints apply. It doesn't need to have lived through the incident that established the constraint. It just needs access to the extension that encodes it.

This is what swamp is built to do. Model types with typed schemas define the constraints. Workflows enforce the ordering. Pre-flight checks validate before anything mutates. The knowledge that used to live in one person's head becomes part of the system anyone can run.

The bottleneck dissolves

When the knowledge is encoded, the ops engineer stops being the gatekeeper. The system enforces what they used to enforce manually. An engineer on another team can ship a change to production with confidence because the system won't let them violate constraints they didn't even know existed.

The ops engineer gets their time back. That platform work they've been putting off for six months? They can actually do it now. They still understand the system best, and they still encode new constraints when incidents reveal them. They're just not the person who has to be in the room for every deployment anymore.

It compounds too. Every constraint they encode makes the next deployment safer for everyone. Over time, their leverage goes up every time they add to the system rather than being consumed by it.

This isn't about replacing you

Every good ops engineer I've worked with has the same instinct: build systems that work without you in the room. That's always been the goal. Encoding is the version of it that actually sticks, because the system enforces what you encoded rather than relying on someone reading what you wrote down.

The thing I keep coming back to is that the knowledge was always the expensive part. The incident at 2am that revealed a constraint. The three hours debugging why a config value has to be seven and not ten. The slow realisation that a provider has rate limits their documentation never mentions. That learning cost real time and real stress. Right now it evaporates every time someone changes teams or a wiki page goes stale.

Encoding what you know means that knowledge helps every deployment, whether you're in the room or on holiday or have moved to a different team entirely. And if you're thinking about where agents fit, they discover whatever constraints exist in the system the same way a new engineer would if every constraint was enforced rather than buried in a Slack thread from 2024.

But that's a consequence, not the reason to start. The reason is simpler: the knowledge you spent years acquiring deserves a better home than a message nobody will scroll back far enough to find.