cd /news/large-language-models/how-to-drive-an-llm Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-33738] src=home.robusta.dev β†— pub= topic=large-language-models verified=true sentiment=↑ positive

How to Drive an LLM

A startup engineer used Claude Code to autonomously implement remote tool calling across ten Kubernetes clusters in one hour without human intervention, demonstrating that coding agent velocity depends on building harnesses that let agents self-verify and fix issues. The company argues that removing human bottlenecks from the agent loop is the key to startup speed in 2026.

read4 min views1 publishedJun 19, 2026
How to Drive an LLM
Image: source

I've been thinking about why some teams get dramatically more out of coding agents than others, and I'm increasingly convinced the answer has less to do with the actual models than people think.

Last week, right before an hour-long call, one of our engineers told Claude to implement a feature she'd designed that morning β€” remote tool calling across remote agents. By the end of the call it was running: ten agent instances running on ten Kubernetes clusters, one querying the others. And while she'd put time into the initial plan, she didn't need to nudge the agent along after that β€” it one-shot the whole thing.

This only works because her Claude Code can deploy large amounts of test infrastructure on its own, hit the edge cases we hadn't designed for, fix them, and verify each fix live β€” so it just kept going until the feature actually worked, without stopping for a human.

We call this machinery a harness1 β€” the environment that lets an agent spin up our full stack, exercise a feature end to end, take screenshots and actually look at them β€” or do whatever else a human would need to do to verify the work. Building harnesses is easy, so long as you're persistent. Run the agent, watch where it stops, and fix that stop β€” but the right way: instead of running the command or pasting in the error yourself, give the agent the visibility to find the problem on its own, so next time it gets there without you. Then run it again. There are always more stops than you think, and you don't get the fast autonomous loop until you've worked through them all. A prerequisite for all this is running Claude Code (or your own favorite coding agent) in a sandboxed environment where you can safely auto-approve every tool call.

A few examples of what this looks like for us:

Frontend work. This is the obvious one, and the one most people are already doing. Your coding agent needs a browser, login credentials for your app, and the ability to screenshot or record what it does, so it can check its own work and show you. We've had the most success when the agent can stand up a frontend connected to a real backend, like a staging or seeded environment, and can modify and run both together.Testing AI agents. Our product is an AI SRE agent that groups and investigates massive volumes of production alerts, so the thing we most need to test is an agent itself β€” and testing that is non-trivial. Like most companies, we do this with evals β€” automated test cases that score the agent's output. But unlike most companies, we don't build a feature and then run evals afterward. Instead, Claude Code has the full setup to provision a real cloud environment and run the evals itself to check its own work as it goes. For most features it writes a failing (red) eval first, then iterates until it's green.Testing Slack bots. You can tag our SRE agent in Slack or Teams to investigate an alert, so we have to test that whole surface too. End to end, that means spinning up a Slack workspace, installing the app, and driving a browser logged in as a Slack user β€” so our coding agent can post a message, trigger our Slack bot (which is itself an agent, so the harness also needs its own LLM API key), and read what it said back, all through a real Slack UI.

For startups like us, competing and winning against bigger, established players, velocity is everything β€” and in 2026, velocity has one major variable: how often a human has to step in and unblock the agent. Every time the agent stops and waits for a person, the loop runs at human speed β€” minutes or hours per turn, orders of magnitude slower. Take the human out and the same loop runs all night, without you. Here's a tip for getting started: the next time you're about to copy-paste something to the agent β€” an error, a log, a screenshot β€” stop and ask what it would take for the agent to see that itself. There are usually several missing pieces. Pick the easiest one and build that first. Then keep doing that until you're out of the loop β€” and you'll be done. Good luck, and happy looping.

1 Technically the harness is Claude Code, but we're misappropriating the term in a way we find useful.

Natan Yellin, CEO β€” Natan has been writing software for over 15 years. He regularly posts on LinkedIn.

── more in #large-language-models 4 stories Β· sorted by recency
── more on @claude code 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/how-to-drive-an-llm] indexed:0 read:4min 2026-06-19 Β· β€”