How to Keep AI Coding Agents from Hallucinating: A Guide to Harness Engineering

Developer Masih Moafi introduced a technique called Harness Engineering to prevent AI coding agents from hallucinating and losing focus. The approach uses structured, repository-local control layers with Markdown-based rules to guide agents, reducing context window pollution and scope drift. Moafi released a public repository of battle-tested configuration templates and demonstrated the method by building a machine learning research project autonomously.

AI coding agents like Claude Code, Devin, or open-source equivalents like OpenClaw are incredibly powerful. They can navigate directories, write tests, refactor modules, and submit PRs. Yet, if you drop them into a raw repository without boundaries, they suffer from context window pollution , agent amnesia , and scope drift . A simple bug-fix refactor can trigger a 6-hour loop where the agent rewires half the project, deletes unrelated tests, and gets stuck in "process theater." To fix this, we need Harness Engineering . An Agent Harness is a structured, repository-local control layer designed to guide and verify the agent's work. Instead of feeding your LLM a monolithic prompt, you embed a lightweight system of record and physical feedback loops directly inside the workspace. I have packaged the exact, battle-tested Markdown-based context rules I use to steer and constraint my local agents into a public repository: MasihMoafi/harnesses-I-use . Rather than complex code, this repo shares raw configuration rule sheets: AGENTS.md CODEX CODING GUIDELINES.md TERMINAL AND GIT RULES.md git add -A , and change safety using Ubuntu pkexec for root commands instead of raw CLI password prompts . SESSION HANDOFF RULES.md ARTIFACT RULES.md abbn.md ctu = continue, fmy = familiarize, ver = verify to save token count and maintain short, high-efficiency communication.This harness approach is heavily inspired by Andrej Karpathy's open-source education repos like micrograd and Karpathy’s projects are celebrated because they strip away bloat. They focus on clear, reproducible mathematical baselines and avoid over-engineering. We applied that same philosophy to agent-driven code generation. The core rules of our harness require: To test this, I used this exact harness to build a comparative machine learning research project: Sensor Fault Diagnosis . The agent was given a realistic synthetic sensor dataset and tasked with: By restricting the agent to a single control surface a structured manual and enforcing strict keep/discard criteria, the agent completed the pipeline and wrote the final report entirely autonomously . Without a harness, the agent would have bloated the repository with decorative dashboard scripts or fake performance metrics. The harness kept it grounded. If you are building code with AI agents, stop writing 2,000-word system prompts. Start building repository harnesses. Check out the templates and configurations: 👉 MasihMoafi/harnesses-I-use https://github.com/MasihMoafi/harnesses-I-use For more of my work, experiments, and research, check out my website: 👉 masihmoafi.tech https://masihmoafi.tech