{"slug": "i-versioned-the-way-i-think-then-i-forced-it-to-comply", "title": "I Versioned the Way I Think. Then I Forced It to Comply.", "summary": "A developer built a four-layer configuration system to enforce coding principles in Claude Code, moving beyond the suggestion-based CLAUDE.md file. The system includes a brain (CLAUDE.md), references, versioned skills, and guardrails—a hook script that blocks forbidden actions like reading .env files or bypassing git hooks. The guardrails are fail-open and provide the only layer the model cannot ignore, ensuring external verifiability over self-assessment.", "body_md": "One morning I pasted four principles into my `CLAUDE.md`\n\n, the global instruction file Claude Code reads at the start of every session. \"Think before you code\", \"simplicity first\", that kind of maxim you see fly by on X, credited to Andrej Karpathy. I felt clever for about a day.\n\nThen I watched Claude read the file, nod, and carry on exactly as before. A `CLAUDE.md`\n\nis a suggestion box. The model nods, then does whatever it wants. If I wanted it to code my way, writing it down wasn't going to cut it. I had to enforce it.\n\nWhat follows is what that frustration turned into: a config in four layers, reinstallable in one command, and a discovery that runs through everything else. The only rigor that counts is the one a model can't grant itself.\n\nMy config has four floors, from softest to hardest.\n\nThe **brain** is `CLAUDE.md`\n\n: how I work, not the docs for my code. The rule that sums it up lives inside it: \"what not to add: anything Claude rediscovers by reading the code.\" It holds my design principles, my stance on orchestrating subagents (I size up, I delegate, I verify: \"I stay the brain, they're the hands\"), and one line that becomes the thread running through the whole thing.\n\nThe **references**: a `go-best-practices.md`\n\nfile the brain points to in plain text whenever Go is involved.\n\nThe **skills**: ten of them. A skill is a folder with a playbook that Claude loads on demand for a specific job: review code, write an article, distill a book. Mine are packaged as a [marketplace, in a public GitHub repo](https://github.com/ohugonnot/claude-skills), with a changelog and a version number. That's the real differentiator: versioned tooling, not just rules scribbled in a file.\n\nThe **guardrails**, finally. And this is the only layer that reliably changes behavior. The first three, the model can read and ignore. The fourth, it can't.\n\nThe four config layers, from softest (the model can ignore) to hardest (the model is bound by the guardrail) Brain CLAUDE.md: how I work References go-best-practices.md Skills 10 versioned skills Guardrails hooks: settings.json + guard.sh read, then ignored enforced\n\nThree layers the model can ignore, one it can't.\n\nMy `CLAUDE.md`\n\nsaid, in capitals: never read `.env`\n\nfiles. A suggestion. The model honored it when convenient.\n\nSo I wrote a hook. A hook is a script Claude Code runs automatically at a precise point in its cycle. `guard.sh`\n\nattaches to `PreToolUse`\n\n, intercepts every `Read`\n\nand every `Bash`\n\nbefore it runs, and decides:\n\n```\ncase \"$tool\" in\n  Read)\n    is_secret_path \"$path\" && block \"reading a secrets file ($path).\"\n    ;;\n  Bash)\n    printf '%s' \"$cmd\" | grep -Eq -- '--no-verify|core\\.hooksPath=' \\\n      && block \"bypassing git hooks is forbidden.\"\n    ;;\nesac\n```\n\n`block`\n\nexits with code 2. For Claude Code, a `PreToolUse`\n\nhook exiting with 2 is a veto: the tool never runs. Reading a `.env`\n\n, bypassing git hooks with `--no-verify`\n\n: these are no longer things I ask it not to do, they're things that don't happen.\n\nThe hook is *fail-open*. The slightest doubt, a missing `jq`\n\n, a malformed JSON, and it allows. A bug in the guardrail has never blocked a legitimate tool. It only shuts the door on a confirmed violation.\n\nHere's the `CLAUDE.md`\n\nline that feeds everything else:\n\nExternal rigor: the only proof that counts is verifiable from the outside (a test, a command, a grep, real output), never the model's self-assessment. \"Are you sure?\" → an LLM always says yes; demand a fact, not a claim.\n\nAsk a model if it's sure, it says yes. Always. So my skills are built so the proof is executable, not declared.\n\n`feature-loop`\n\nships a feature in a loop, but before trusting a critical test, it mutates it, checks that it **goes red**, then restores it. A test that stays green after mutation tests nothing: it gets rewritten. The reason is spelled out in the skill: 76% of LLM-written tests miss the red-to-green transition. A test that always passes is as reassuring as it is dishonest.\n\n`rediger-cir`\n\n, my skill for French R&D tax-credit dossiers, pushes the same logic to the extreme. No reference enters a dossier without resolving through a metadata API: the DOI has to resolve, the journal can't be predatory, the paper has to actually say what you make it say. The golden rule fits in one sentence: never ask the model that just hallucinated a citation whether that citation is real. It's precisely the one you shouldn't ask.\n\nIf I had to keep a single decision out of this whole config, it'd be this one.\n\n`senior-review`\n\nreviews code before merge. The obvious trap would be to ask the model that just wrote the code to review itself. Except a model judging its own output over-rates it: self-preference bias is documented and causal (Panickssery, 2024). An honest reviewer has to be blind. My reviewer agents run in a clean context, never see the implementation prompt, and when the author is a known model, they're from a different family. The developer's name never enters the prompt.\n\nThen comes the gate that does the cleanup: no finding without a receipt. Every flag cites a `file:line`\n\nand passes a check, a grep, a sandbox run, a failing test. An unverifiable finding is dropped silently. Models over-flag by reflex; grounding each flag in a real execution removes around 60% of false positives. The goal isn't to find the most problems. It's to surface only the real ones.\n\n`feature-loop`\n\n, `senior-review`\n\nand `rediger-cir`\n\n, I've already dissected. But they don't live alone. The ten skills form a chain: four cover the life cycle of a dev task, from scoping to commit, and each knows when to hand off to the next. The other six are orthogonal, pulled out when needed.\n\nSkill\n\nIts marrow\n\n`issue-mr`\n\nScope. Turns a fuzzy task into a clean issue, branch and PR. An analyse mode settles the design before a line is written.\n\n`feature-loop`\n\nBuild. A quality loop where author, tester and reviewer are separate agents: an objective gate (build, tests, red-check) before any review, a mandatory live smoke test.\n\n`senior-review`\n\nReview. Blind reviewers per dimension, no finding without a receipt, an adversarial panel on the critical ones.\n\n`branch-wrap-up`\n\nClose out. Proposes a conventional commit, push and PR, captures knowledge. Never commits without my go-ahead.\n\n`code-mentor`\n\nTeach. Slows down at each decision, explains the why, makes me predict before it reveals. The opposite of copy-paste.\n\n`skill-builder`\n\nDesign. The principles for a skill that actually triggers and stays focused: one anchor word, a predictable process.\n\n`tech-article`\n\nWrite. An article that sounds human, not AI. The one you're reading came out of it.\n\n`book-distill`\n\nDistill. A faithful reading note of a book, every quote verified word-for-word against the text.\n\n`rediger-cir`\n\nJustify. A French R&D tax-credit dossier where every reference resolves through an API, zero hallucination.\n\n`vide-contexte`\n\nRemember. Before a context reset, persists the non-obvious insights into memory files.\n\nOne thread runs through all of them: the author is never the reviewer, the objective gate comes before the model's judgment, and nothing commits without my consent. These aren't ten tools, they're one way of working, declined ten times.\n\nThe brain of the config comes down to a few rules I apply to myself as much as to the model.\n\nThe sixth rule you already know: the proof is external. It's the one that turns all the rest from good intentions into guarantees.\n\nWhile building this config, I studied the big repos in the space. The numbers, checked against the GitHub API the day I'm writing this, are dizzying. \"Everything Claude Code\" tops out at 222,793 stars. Next to it, a very clean skills repo by Matt Pocock has 148,737, and another, sold as \"Karpathy's skills\", 183,652.\n\nThat last one deserves a warning, because it illustrates the theme better than any test. The repo is nothing official: it's a third party's reconstruction of a single tweet, duplicated into four formats. The label says Karpathy. The source says a copied tweet. Trusting the label is exactly the mistake my whole config tries to make impossible.\n\nAs for the 222,793-star behemoth, its traction comes from distribution, not code: a won Anthropic hackathon, threads with millions of views, an umbrella name, a README turned into a landing page. Under the hood, two or three solid hooks, whose `.env`\n\n-blocking and `--no-verify`\n\nidea I borrowed, by the way, a \"learning\" loop disabled by default, and internal counters that contradict each other from one section to the next.\n\n[My own repo](https://github.com/ohugonnot/claude-skills) sits at zero stars. And its guardrails are stricter than the average of the one with 222,793. The lesson isn't bitter, it's useful: stars measure a well-told story, not rigor. The two have almost nothing to do with each other.\n\nBefore publishing, I had my own skills audited. The verdict: the scientific references were 98% real, but four falsely precise numbers and one misattributed quote had slipped into the batch. Fixed. Even an author who thinks he's rigorous can't certify himself alone.\n\nThis article is the final proof. I had one of the ten skills write it, from the public repo, on the strength of a brief I'd written myself. That brief claimed \"Everything Claude Code\" was a lone outlier, five times bigger than everything else. The check against the GitHub API, right before writing, showed there's actually a whole pack of six-figure repos. My own brief was wrong, and only a call to an external source caught it.\n\nThat's the whole point. The model doesn't know it's wrong. Neither do I, half the time. The only thing that knows is the test that goes red, the DOI that resolves, the `file:line`\n\nyou can open. A Claude Code config worth having isn't a list of good intentions. It's the place where proof stops being a claim.", "url": "https://wpnews.pro/news/i-versioned-the-way-i-think-then-i-forced-it-to-comply", "canonical_source": "https://dev.to/ohugonnot/i-versioned-the-way-i-think-then-i-forced-it-to-comply-ddk", "published_at": "2026-06-28 12:37:30+00:00", "updated_at": "2026-06-28 13:04:40.641608+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "ai-agents", "ai-products"], "entities": ["Claude Code", "CLAUDE.md", "Andrej Karpathy", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/i-versioned-the-way-i-think-then-i-forced-it-to-comply", "markdown": "https://wpnews.pro/news/i-versioned-the-way-i-think-then-i-forced-it-to-comply.md", "text": "https://wpnews.pro/news/i-versioned-the-way-i-think-then-i-forced-it-to-comply.txt", "jsonld": "https://wpnews.pro/news/i-versioned-the-way-i-think-then-i-forced-it-to-comply.jsonld"}}