Building with AI

Your Agent Is Starving

Five people, $3,000/month, zero lines written manually. GitHub just moved to per-token billing. The era of flat-rate pricing absorbing your agent's inefficiency is ending. Your agent isn't the bottleneck. What you're feeding it is.

Paul Stack

29 Apr 2026 • 6 min read

This is where your tokens go without structure.

I wrote a post a while back called The Vibes Don't Scale. The argument: AI code is bad because the process around it is bad. Give agents structure, conventions, architecture, and they produce fundamentally better output. Build the machine.

Building the machine is hard though. Conventions files, workflow stages, review pipelines, design docs for every subsystem, skill quality testing, a UAT gate for compiled binaries. We do all of that for swamp. It's not a one-time effort, it's an ongoing part of how we work instead of manually writing code. We can invest in it because we're a tools company and this is literally what we build. Most engineering teams have a product to ship and aren't going to pause for weeks to construct agent infrastructure from scratch.

Which is why we're building swamp to be that infrastructure. Not just for us, but for any team that wants agents to produce good output without building the trust model themselves.

This isn't just about writing code either. The same problem shows up whenever an LLM drives any kind of automation. An agent building a workflow without knowing your conventions burns the same kind of tokens as one writing code without knowing your architecture. The thinking transfers to any work you hand to a model, which is what swamp is built to support.

This post is about what that tooling does to your costs and output, and why the gap between "I know agents need structure" and "my agents actually have structure" is where the money goes.

The cost of starvation

Five of us build swamp. Each developer runs Claude Max Pro at $200/month, and none of us hit the usage cap, so $1000 for the team. Our CI pipeline costs roughly $2,000/month on top of that: four AI code reviews on every PR, adversarial security testing, skill quality gates, trigger evals, and UAT runs against compiled binaries. $3,000/month total for a team of five, zero lines of code written manually.

That number surprises people. They're spending more than that on a single developer using agents without structure.

A starving agent explores when it should already know. Reads files to understand project structure, greps for patterns to figure out conventions, opens five files to trace an import path it could have been told about. Tries an approach, hits a lint error, backtracks, tries another. Gets the architecture wrong, produces code that fails review, regenerates. Every one of those steps is tokens you're paying for. Every rejected generation is money spent twice.

What tooling solves that markdown files don't

You can write a CLAUDE.md. You should. But a conventions file is one layer of a problem that has at least four.

Process. Without a defined workflow, the agent guesses at what to do next at every step. Swamp's issue lifecycle removes that: triage, plan, adversarial review, implementation. The agent never decides whether to plan. It plans. All of its capacity goes toward the actual work.

Conventions. A CLAUDE.md helps, but conventions need to be loaded in context, enforced in review, and evolved as failure modes emerge. Swamp's convention system is part of a pipeline where PR review checks code against conventions, violations get caught before merge, and new conventions get added when new failures appear. They compound because the tooling keeps them alive.

Architecture. Design docs tell agents how subsystems work, why boundaries exist, what invariants must hold. But they need to be loaded on demand, not dumped wholesale into the context window. Swamp's skill system does targeted loading: the agent working on vaults gets vault context, the one working on workflows gets workflow context.

Trust. The trust model controls what the agent can do without asking. Local operations are pre-approved. Publication and destruction require human approval. When the agent knows its boundaries, it works freely until the one checkpoint that matters.

Each layer helps on its own. Together they're how you get to $2,800/month for four developers.

The gap nobody talks about

Every engineering leader I talk to understands that agents need structure. The problem is the distance between understanding and doing something about it.

Building this from scratch means writing conventions and keeping them current, designing workflow stages and enforcing them, creating review pipelines, writing design docs, building targeted context loading, setting up trust boundaries, testing compiled output separately from source. Each piece is tractable on its own. Together they're a significant engineering effort, and none of it is your product.

That's where money disappears. Teams know their agents need structure but don't have time to build it. Agents keep exploring, keep guessing, keep burning through tokens on navigation and retries. The bill climbs. Someone eventually asks whether agents are worth it at all.

They are, but only if the agents are well fed. For most teams, that means tooling.

What falls out when agents are well fed

When we got the structure right, several things happened.

Costs went down, not up. Four AI reviews, adversarial testing, skill quality gates, and UAT runs on every PR sounds expensive. It costs less than agents without structure, because the pipeline catches problems that would otherwise turn into reverts, rework, and regeneration cycles. And when an agent creates a swamp model, workflow, or automation, that artifact runs deterministically from that point on. No LLM in the loop. The token cost is one-time for creation. After that, users run the automation at will with zero token spend.

Output got consistent. The variance dropped. When agents follow the same workflow, conventions, and architectural context every time, the quality floor rises. You stop getting the occasional good PR mixed with three that miss the point.

The system learned. Every convention we added, every design doc we refined, every workflow constraint we tightened made the next hundred tasks better. The UAT gate catches regressions in compiled binaries. Those become new test cases. The suite gets harder to pass, and what passes it is more trustworthy. This compounds in a way that individual agent sessions don't.

Humans focused on what matters. We stopped reviewing code line by line and started reviewing plans. The agent handles implementation. The human handles judgment: is this the right approach, does the plan miss edge cases, does this fit the direction of the project.

Cheaper tokens won't save you

Token economics are not settling down. OpenAI's latest models are more expensive than their predecessors. Opus is more token-hungry. Anthropic experimented with dropping Claude Code from their lowest price tier. Yesterday, GitHub announced that Copilot is moving to usage-based billing, replacing flat-rate premium requests with token-based AI Credits starting June 1st. Their reasoning was blunt: "a quick chat question and a multi-hour autonomous coding session can cost the user the same amount. GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable."

That's the direction. This isn't just individual developer pricing either — Business and Enterprise plans are moving to the same model. Token efficiency is becoming an org-level cost conversation. The era of flat-rate pricing absorbing your agent's inefficiency is ending.

When every token is metered, structure isn't optional. An agent that doesn't understand your architecture produces the wrong change whether that costs ten dollars or ten cents. And the time cost never changes. A starving agent that takes an hour of retries still wasted an hour of your day.

Structure isn't a hedge against current pricing. It's the only way to make agents economically viable as they get more powerful and more expensive. We use agents to build the machine that makes agents efficient. The machine builds the machine. The teams that have it will scale up. The teams that don't will keep paying more for the same output.

Feed your agent

If your agent costs are climbing and the output isn't improving, the model isn't the problem. The context around it is.

A starving agent spends tokens figuring things out. It reads files to understand structure, guesses at conventions, retries after avoidable mistakes. You pay for all of that.

A well fed agent starts with context. It knows the workflow, the conventions, the architecture. It executes instead of exploring. That's where the cost difference comes from.

Start with a conventions file and a workflow definition. That alone cuts the most expensive waste. Add design docs for your core subsystems and the architecture mistakes drop. Add a review pipeline and the quality floor rises.

Or use tooling that gives you all of that upfront.

That's the bet I'm making with swamp: it's an adaptive automation platform for AI agents. You install the CLI, describe the problem you're trying to solve, and the agent builds the models, workflows, and automations for you. More importantly, it gives the agent structure from the start, so it spends far fewer tokens figuring things out.

The effect is simple: agents stop exploring and start executing. Token usage drops. Swamp is deterministic output for a probabilistic system.

The models and workflows run deterministically once created. An agent can run them, you can run them manually, CI can run them on a schedule. No LLM required at runtime if you want to reduce your costs, but available if you want it. The token cost is paid to create and maintain the automation. After that, you run it as often as you want, where you want.

Swamp gives agents the context they need so your team doesn't have to build it from scratch. Whether the agent is writing a feature, building a data pipeline, or creating an operational workflow, the pattern is the same.

Your agent isn't the bottleneck. What you're feeding it is.