The Core of a Coding Agent Is 128 Lines of Python. So I Built One From Scratch.

A developer built a coding agent from scratch in 128 lines of Python, demonstrating that the core loop powering tools like Claude Code and Cursor is surprisingly simple. The agent autonomously reads files, runs tests, diagnoses failures, fixes code, and re-runs tests without hard-coded steps. The project is open source under MIT license.

128 lines of Python. That's the entire core of a coding agent — the loop that powers tools like Claude Code and Cursor. I didn't believe it either, so I built one from scratch. Then I pointed it at a failing test, and it read the file, ran the test, saw the traceback, fixed the code, and re-ran it — choosing every step itself. No one hard-coded that. It's open source MIT , with a phased roadmap you can follow: 👉 github.com/osama96gh/coding-agent-from-scratch https://github.com/osama96gh/coding-agent-from-scratch I use coding agents every day. As an AI engineer, I think they're the breakout use case for LLMs right now. But using something and understanding it are different things. Reading a production agent's source to learn the core is a trap — the essential logic is buried under prompt caching, retries, telemetry, and elaborate scaffolding. You can't see the engine for the bodywork. So I built just the engine. No optimizations. Just the essence. These surprised me enough that I re-counted: | Piece | Size | |---|---| Entire REPL + agent loop + permission gate main.py | 128 lines | The system prompt that steers all behavior prompts.py | 19 lines | | Tools — read, list, grep, edit, write, run bash | 6 files, smallest is 35 | Whole project, incl. 2 swappable providers + streaming | ~1,300 lines | The thing that feels like magic — an agent autonomously reading files, running your tests, fixing the failure, re-running — comes out of about a hundred lines of orchestration. The intelligence lives in the model. Your job is plumbing. Strip away the streaming, the permission gate, and the UI, and the heartbeat of the whole thing is this: conversation.append {"role": "user", "content": user input} while True: keep going until the model stops asking for tools turn = llm.call conversation, tools=TOOL SCHEMAS, system=SYSTEM PROMPT conversation.append turn.to message if not turn.tool calls: plain text → the model is done break for call in turn.tool calls: otherwise, run each tool it asked for… result = run tool call.name, call.args conversation.append { "role": "tool", "id": call.id, "name": call.name, "content": result, } …then loop, so the model sees the results and decides what's next That's it. That's the agent. main.py ", "run pytest " .The model decides which tool and in what order ; the loop just keeps turning until the model stops asking. An agent is just an LLM, a loop, and some tools. Everything else in this repo is refinement on top of those three. This is also where "it can debug itself" comes from — for free. When the shell tool feeds exit codes and stderr back into the conversation, the model sees the failure on the next turn and proposes a fix. Nobody wrote if tests fail, edit the code . It falls out of the loop. One file each: read file , list files , grep , edit file , write file , run bash . Each is just a function plus a JSON schema describing its arguments — and that schema is all the model needs to know the tool exists and how to call it. "Tool calling" sounds advanced; it's really "here's a function signature, fill in the arguments." run bash alone is almost a superpower — with a shell you can stand in for most of the others — which is exactly why an agent needs a permission gate . These refinements sit on top of the core, and they're where most of the line count goes: git status runs unprompted while git push still stops to ask. The difference between an assistant and rm -rf roulette.That failing-test run from the top? I never scripted it. The model chose to read, run, diagnose, fix, and re-run entirely on its own — the same shape of behavior I pay for in Claude Code every day, out of ~128 lines I could read in a single sitting. The gap between "toy" and "real" is smaller than the hype suggests. The production polish — caching, retries, sandboxing, a thousand handled edge cases — is genuine, hard engineering. But the core that makes an agent an agent is within any engineer's reach in an afternoon. The repo is a phased roadmap — each phase runs on its own and teaches one concept, so you always have a working agent: read file list files , grep edit file , write file run bash — where it gets powerful and dangerous A learning project: build a simple but real coding agent think a tiny Claude Code / Cursor / Codex , step by step, from nothing — to understand how complex AI agents are actually structured under the hood. The one-sentence mental model:An agent is just an LLM, a loop, and some tools.Everything else is refinement. source This repository is an educational, from-scratch Python implementation of a terminal coding agent. It shows the core mechanics behind modern AI coding tools: a model-driven agent loop, tool calling, file exploration, targeted code edits, shell command execution, permission checks, streaming responses, usage reporting, context compaction, and pluggable OpenAI/Gemini providers. It is meant to be read, modified, and learned from. It is not a production coding agent, but a small reference implementation for understanding how production coding agents are structured under the hood. Build it, break it, extend it a new tool, a web UI, a third provider — and tell me how it goes. The fastest way to stop an AI tool from feeling like magic is to build a small one yourself.