{"slug": "i-built-an-ai-agent-that-rewrites-its-own-code-in-150-lines", "title": "I Built an AI Agent That Rewrites Its Own Code (in ~150 lines)", "summary": "A developer built a tiny Darwin Gödel Machine in ~150 lines of code that improves its own code by editing itself and keeping only changes that verifiably increase test scores. The program started by passing 1 of 8 tasks and improved to passing all 8, with all fixes written by the program itself. It runs on a laptop in under a second without any API keys or special hardware.", "body_md": "*A tiny Darwin Gödel Machine that edits itself and keeps only changes that verifiably score higher.*\n\n**TL;DR:** I built a small program that improves *itself*. It looks at the tasks it's failing, edits its own code to fix them, and keeps a change only if the change actually makes it score better on a test. It goes from passing **1 of 8** tasks to **8 of 8** — and nobody wrote those fixes but the program itself. It runs on a laptop in under a second. No fancy hardware, no API key.\n\nNormally, software only gets better when *we* make it better. You write code, you find a bug, you fix it, you ship again. The program never improves on its own.\n\nPeople have wanted \"software that improves itself\" for decades. The classic version (called a \"Gödel Machine\") had one rule that made it impossible to build: before the program could change a line of its own code, it had to *mathematically prove* the change would help. Proving that about real code is basically impossible, so the idea never worked.\n\nIn 2025, researchers found a way around it with the ** Darwin Gödel Machine**. They dropped the \"prove it first\" rule and replaced it with something every engineer already trusts:\n\nTry the change. Run the tests. If the score went up, keep it. If not, throw it away.\n\nThat's it. It's basically how we all work — make an edit, run the test suite, keep what passes. The twist is that *the program* is the one making the edits. In the real paper, this let an AI coding assistant improve its own tooling and jump from solving **20%** to **50%** of a hard benchmark of real GitHub issues.\n\nI wanted to actually see this happen, so I built the tiniest version I could.\n\n| Start | After improving itself | |\n|---|---|---|\n| What it can do | only `uppercase`\n|\nlearned 6 more skills on its own |\n| Test score | 🔴 1 / 8\n|\n🟢 8 / 8\n|\n| Who wrote the fixes? | — | the program did |\n\n```\nStart:  ███░░░░░░░░░░░░░░░░░░░░░  1/8   (only knows: uppercase)\n+reverse            ██████░░░░░░░░░░░░  2/8\n+dedup_csv          █████████░░░░░░░░░  3/8\n+sum_csv            ████████████░░░░░░  4/8\n+sort_csv           ███████████████░░░  5/8\n+title              ██████████████████  6/8\n+normalize_inputs   ████████████████████  8/8   ← one fix unlocked TWO tasks\n✅ SOLVED 8/8\n```\n\nThere are only three pieces.\n\n**1. The \"agent\" is just a bag of skills.** Each skill is a tiny function — uppercase text, reverse it, sort a list, etc. It starts out knowing almost nothing.\n\n**2. A test with known answers.** Every task has a correct answer, so checking the score is a plain equality check — `output == expected`\n\n. No human grading it, no second AI judging it. Just: did it get the right answer or not? (This \"write a checker, then measure\" idea is the same trick behind today's reasoning models.)\n\n**3. The loop.** Over and over: look at what's failing, add one skill to try to fix it, re-run the test, and **keep the change only if the score went up.** It also saves every improved version, so it can branch off any of them later instead of getting stuck.\n\n```\nnew_version = old_version + add_a_skill(things_it_is_failing)\nif score(new_version) > score(old_version):   # did the test score actually improve?\n    keep(new_version)                          # yes -> save it and build on it\n```\n\nOne of the skills it adds, \"clean up the input\" (trim weird spacing), does **nothing** by itself. But the agent had earlier learned a \"title-case\" skill that kept breaking on messy text like `\" the quick fox \"`\n\n. The moment it adds the cleanup step, **two stuck tasks start passing at once** — that's the +2 jump at the end.\n\nThis is the whole point in miniature: the agent isn't just adding features. It's making itself *better at getting better*. A boring little fix becomes the stepping stone that makes later fixes work. The real research sees the same thing at full scale — the AI invents helpers like \"try a few solutions and pick the best one,\" which then make *every* future fix more effective.\n\nFor ten years, the way to make AI better was: make the model bigger. The newer idea is to make it **improve itself while it runs**:\n\nThe common thread: improvement is shifting from *us retraining the model* to *the program improving itself*, with a simple test telling it whether each change was good. Software that edits itself starts to feel less like a fixed program and more like something that grows.\n\n```\ngit clone https://github.com/Shridhar-2205/living-software\ncd living-software/01-self-rewriting-agent\npython demo_cli.py     # watch the score climb 1/8 → 8/8\npytest -q              # the same claims, as automated tests\n```\n\nOne honest note on safety: a *real* self-rewriting agent runs code it wrote itself, which is risky. In my version the \"edits\" come from a fixed list of safe skills, so nothing dangerous ever runs — the *loop* matches the research, the *risk* is zero. (The real one runs inside a sandbox for exactly this reason.)\n\nThe old dream needed a mathematical proof before changing any code. The new version just needs a\n\ntest. If you can write a check that says \"this got better,\" you can let a program improve itself — and watch it find clever fixes you never wrote.\n\n*Written by **Shridhar Shah**, a Senior Software Engineer at Outshift by Cisco. I work on AI agents, search, and how they \"think.\" This is part 1 of a 6-post series, \"Toward Living Software,\" about software that starts to act a little bit alive. GitHub · LinkedIn*\n\nSource:Zhang, Hu, Lu, Lange, Clune, \"Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents,\" arXiv:2505.22954 (2025) — reports SWE-bench 20.0% → 50.0%.", "url": "https://wpnews.pro/news/i-built-an-ai-agent-that-rewrites-its-own-code-in-150-lines", "canonical_source": "https://dev.to/shridhar_shah2297/i-built-an-ai-agent-that-rewrites-its-own-code-in-150-lines-3jjo", "published_at": "2026-06-27 21:36:53+00:00", "updated_at": "2026-06-27 22:04:06.847050+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "ai-agents", "ai-research", "developer-tools"], "entities": ["Darwin Gödel Machine", "Gödel Machine"], "alternates": {"html": "https://wpnews.pro/news/i-built-an-ai-agent-that-rewrites-its-own-code-in-150-lines", "markdown": "https://wpnews.pro/news/i-built-an-ai-agent-that-rewrites-its-own-code-in-150-lines.md", "text": "https://wpnews.pro/news/i-built-an-ai-agent-that-rewrites-its-own-code-in-150-lines.txt", "jsonld": "https://wpnews.pro/news/i-built-an-ai-agent-that-rewrites-its-own-code-in-150-lines.jsonld"}}