# Deterministic Automation for a Probabilistic System

> Source: <https://stack72.dev/deterministic-automation-for-a-probabilistic-system/>
> Published: 2026-05-26 14:30:56+00:00

# Deterministic Automation for a Probabilistic System

AI agents are probabilistic. The same prompt doesn't always produce the same output. That's not a problem to avoid. It's a problem to engineer around. Typed schemas, validated execution, deterministic workflows. The agent reasons freely. The system keeps it honest.

I've spent more than a decade building infrastructure tooling. Puppet. Chef. Ansible. CloudFormation. Terraform. Pulumi. Kubernetes. Helm. Argo CD. I didn't just use these tools. I helped build some of them, wrote providers for others, and watched each generation try to solve the problems the last one left behind. Every few years the tooling shifts and we learn the new thing. Each shift was genuinely better than what came before.

And every single one promised the same thing. One platform. Every system. Your infrastructure. I know because I helped make that promise. Every vendor deck. Every conference keynote. Every README.

None of them delivered. They worked great for the systems they were built for, on the assumptions they were built around. Everything else was a workaround.

## The 200% problem

To use any IaC tool effectively, you don't learn one thing. You learn two.

You learn the abstraction the tool creates: the resource schema, the syntax, the lifecycle semantics. And you learn the underlying cloud API it models: what the real service does, what the tool exposes versus hides, where the abstraction leaks. That's 200% of the learning cost. Every time. For every system.

Your Terraform expertise in the AWS provider doesn't transfer to Azure. Azure has different resource models, different naming, different quirks, and a completely different provider to learn on top. GCP is different again. Your on-prem VMware might not have a maintained provider at all. Your internal developer platform has no Terraform support.

Then Kubernetes enters the picture. Helm charts for your apps, Terraform for your cloud resources, Argo CD for GitOps sync, custom operators for your platform abstractions. Each one has its own API surface, its own schema, its own failure modes. Each one charges the 200% tax again.

Every new system is a new education, twice over. The knowledge doesn't transfer. You start from zero every time.

## Declarative is powerful, but it relinquishes control

IaC solved a real problem and it continues to solve it. Describe desired infrastructure state, version it, review it, apply it repeatably. That methodology works. Millions of teams run on it every day and they should keep running on it.

But the methodology has a trade-off that matters more now than it used to. When you write `terraform apply`

, you hand control to the reconciler. It decides the order of operations, how to handle dependencies, what to create, modify, or destroy, and in what sequence. You can interrupt it, you can use `-target`

to apply in stages, but within a single run you can't pause between operations to inspect what happened and decide whether to continue. You can't validate that step three produced what you expected before step four runs against it.

If it fails partway through, the world is in a state that's neither your starting point nor your destination. Recovery is on you.

That was an acceptable trade-off when humans reviewed every plan and watched every output. You were in the loop even if you weren't in control.

## What changes when agents enter the picture

Now add AI agents to this equation. Agents are probabilistic. The same prompt doesn't always produce the same output. That's fine for writing documentation. It's not fine for provisioning production infrastructure.

The loss of control that was acceptable when a human was watching every apply becomes a real problem when an agent is driving. You're not watching anymore. The reconciler doesn't know how to ask for help. And the agent itself is introducing a second layer of unpredictability on top of the first.

Most teams adopting agents for infrastructure get this wrong from the start. They either point an agent at their infrastructure with raw API access, or they get it to generate IaC code and hope the output is correct. Either way the agent is reasoning about unstructured problems without guardrails. Sometimes it's brilliant. Sometimes it misses something critical. You don't know which run you're getting until it's done.

The problem isn't that the agent is bad. The problem is letting probabilistic output touch real infrastructure without a deterministic layer around it.

## The deterministic box

Swamp's core unit of work is a model: a typed interaction with an external system, validated against a schema.

Every model has a type, written in TypeScript with Zod schemas defining what inputs it accepts, what methods it exposes, and what the outputs look like. An agent doesn't write code to interact with AWS or Kubernetes or your internal platform. It writes a definition that fills in the specifics for a model type: which VPC CIDR, which Kubernetes namespace, which parameters. The type system validates that definition against the schema at creation time. Mismatches surface immediately, not at 2am during execution.

Extensions package these model types and publish them to a registry. A Kubernetes extension knows how to query pod state, inspect services, validate configmaps. An AWS extension knows EC2, S3, IAM. You can write extensions for your internal systems too. The extension is the contract, and agents discover what's available through the CLI before they start working.

When work spans multiple models, workflows wire them together as a DAG of jobs. Execution order is deterministic: a weighted topological sort that produces identical ordering for identical inputs. Data flows between steps through typed CEL expressions, not string interpolation. If a step references data that doesn't exist, the expression fails loudly rather than passing blank values through.

Every execution produces versioned data artifacts. The agent reasons about state by querying what exists rather than tracking what changed. Pre-flight checks run before anything mutates: policy validation, dependency checks, live API verification. Problems surface before execution begins, not after half your infrastructure is in an unknown state.

The probabilistic AI generates structured configuration. Swamp validates it against the schemas and catches errors at the boundaries. Then the agent runs the models and workflows it just created, and from that point forward the execution is deterministic. It's not reasoning about what to do anymore. It's running a system that already knows. And because those models and workflows are now the deterministic system, you can run exactly the same flow yourself without an agent. Same inputs, same execution, same output.

## Four faults, one agent, zero misses

Picture the kind of situation that gets handed to an on-call engineer at 11pm. A Kubernetes cluster with four applications: frontend, auth, api, and metrics. The cluster is up, the nodes are healthy, but nothing is working. You know something is wrong. You don't know where to start. And there are four separate faults across four separate apps, each with a different root cause.

An image tag that doesn't exist. A ConfigMap reference to a ConfigMap that was never created. A service selector that doesn't match the pod labels by one word. A targetPort that doesn't match the container port. Four faults across four different layers of the stack. The kind of scenario where each symptom points in a different direction and a skilled engineer is looking at hours of careful debugging.

We built Kubernetes debugging extensions for swamp. Model types that know how to query pod state, read events, inspect service endpoints, validate configmap references, check image availability, compare selectors against pod labels. Each operation returns structured data rather than raw kubectl output to parse. The extensions encoded what an experienced Kubernetes engineer knows to look for. The agent wrote definitions against those types, the workflow wired them into a debugging sequence, and the system executed it.

It found all four. The agent had no prior understanding of the cluster. The prompt was roughly "figure out what's wrong with my GKE cluster deployment, it's in the debug-challenge namespace, analyse the issues and propose a workflow to resolve them." That's it. The agent had access to the extensions, and the extensions knew what to look for. Structured data back at every step. Couldn't ask the wrong questions because the schema constrained what questions exist. Checked every layer in the right order regardless of which fault it hit first.

## We measured it

We didn't just assert that the guardrails make a difference. We benchmarked it.

The same Kubernetes debugging challenge, run against two approaches: swamp with typed extensions, and a raw agent running kubectl commands directly with no scaffolding.

The raw agent is faster. It's also a perfect demonstration of what probabilistic actually means in practice. When it works, it moves quickly, fires kubectl commands, reads the output, reasons about what it sees. But run it again and you might get a different answer. Sometimes it misses a fault entirely. Sometimes it misdiagnoses one. Sometimes it finds the right symptoms and draws the wrong conclusion. Same cluster, same faults, different results every time. Because it's reasoning about raw CLI output, unstructured and ambiguous, with nothing to constrain what it looks for or how it interprets what it finds.

Swamp with extensions is slightly slower. And it gets it right consistently. Every run. Same faults found, same remediation steps, same structured output. That's what deterministic means. The methodology and results are published so others can reproduce them.

Speed without reliability isn't useful in production. You don't want a faster wrong answer at 2am. Now imagine what happens when the extensions can also pull telemetry data, logs, and traces into the same typed context. The agent isn't just checking resource state. It's correlating what went wrong with why.

## The knowledge survives

There's a consequence of encoding debugging knowledge in typed extensions that goes beyond any single run.

Right now your most critical systems knowledge lives in the heads of one or two people. The engineer who knows that us-east-1a has been unreliable since March. The one who remembers the payment provider's rate limits. The one who debugged that database password rotation issue at 2am and knows the ordering matters.

When that knowledge lives in model types with typed schemas, it stops depending on who's in the room. The AZ exclusion becomes a validated constraint in the compute model type. The rate limit becomes a bounded parameter in the schema. The startup ordering becomes an explicit dependency in a workflow DAG. An agent can discover these constraints through the CLI, and the system enforces them at execution time regardless of who runs it.

New engineers benefit from knowledge they never had to acquire the hard way. The person who originally knew it can move on without the team losing the capability.

## The old methodology was built for human authors

IaC was designed for a world where humans were the only authors and consumers of automation. That was the right design for that world, and it still works for teams operating that way.

The new era is different. Describe what you want. Agents write definitions against typed schemas, workflows wire them into deterministic execution sequences, and the system validates everything at creation and execution time. Not because providers go away. But because the knowledge of how to talk to them lives in typed, reusable extensions published to a registry, discoverable through a CLI, rather than in your team's heads or in provider-specific modules that charge the 200% tax every time.

Agents do the work. Typed schemas make the work deterministic. You define the world they work in.

And you already know what you want to build. You've always known. The bottleneck was translating that knowledge into provider-specific code across a dozen tools. When you start expressing intent to an agent instead, you get feedback immediately. The output either matches what you wanted or it doesn't. Each iteration teaches you to be more precise about what you actually need. You get better at describing intent, the agent gets better inputs, and the deterministic system underneath means you can trust what comes out the other side.

The skill that matters now isn't learning another API. It's learning to say what you mean.
