# How to Keep AI Coding Agents from Hallucinating: A Guide to Harness Engineering

> Source: <https://dev.to/masihmoafi/how-to-keep-ai-coding-agents-from-hallucinating-a-guide-to-harness-engineering-12mm>
> Published: 2026-06-14 10:05:48+00:00

AI coding agents (like Claude Code, Devin, or open-source equivalents like OpenClaw) are incredibly powerful. They can navigate directories, write tests, refactor modules, and submit PRs.

Yet, if you drop them into a raw repository without boundaries, they suffer from **context window pollution**, **agent amnesia**, and **scope drift**. A simple bug-fix refactor can trigger a 6-hour loop where the agent rewires half the project, deletes unrelated tests, and gets stuck in "process theater."

To fix this, we need **Harness Engineering**.

An **Agent Harness** is a structured, repository-local control layer designed to guide and verify the agent's work. Instead of feeding your LLM a monolithic prompt, you embed a lightweight system of record and physical feedback loops directly inside the workspace.

I have packaged the exact, battle-tested Markdown-based context rules I use to steer and constraint my local agents into a public repository: ** MasihMoafi/harnesses-I-use**.

Rather than complex code, this repo shares raw configuration rule sheets:

`AGENTS.md`

`CODEX_CODING_GUIDELINES.md`

`TERMINAL_AND_GIT_RULES.md`

`git add -A`

), and change safety (using Ubuntu `pkexec`

for root commands instead of raw CLI password prompts).`SESSION_HANDOFF_RULES.md`

`ARTIFACT_RULES.md`

`abbn.md`

`ctu`

= continue, `fmy`

= familiarize, `ver`

= verify) to save token count and maintain short, high-efficiency communication.This harness approach is heavily inspired by **Andrej Karpathy's** open-source education repos (like ** micrograd** and

Karpathy’s projects are celebrated because they strip away bloat. They focus on clear, reproducible mathematical baselines and avoid over-engineering.

We applied that same philosophy to agent-driven code generation. The core rules of our harness require:

To test this, I used this exact harness to build a comparative machine learning research project: **Sensor Fault Diagnosis**.

The agent was given a realistic synthetic sensor dataset and tasked with:

By restricting the agent to a single control surface (a structured manual) and enforcing strict keep/discard criteria, the agent completed the pipeline and wrote the final report **entirely autonomously**.

Without a harness, the agent would have bloated the repository with decorative dashboard scripts or fake performance metrics. The harness kept it grounded.

If you are building code with AI agents, stop writing 2,000-word system prompts. Start building repository harnesses.

Check out the templates and configurations:

👉 [MasihMoafi/harnesses-I-use](https://github.com/MasihMoafi/harnesses-I-use)

For more of my work, experiments, and research, check out my website:

👉 [masihmoafi.tech](https://masihmoafi.tech)
