{"slug": "prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it", "title": "\"Prove your AI-written code — or get the exact input that breaks it\"", "summary": "A developer released ishvacerto, an open-source Python tool that verifies AI-generated code by running it against doctests, user tests, or a reference implementation, returning a counterexample if the code is wrong or abstaining if it cannot be verified. On the HumanEval benchmark, it produced zero false alarms on correct solutions and flagged a bug in the benchmark itself. The tool runs entirely locally with no dependencies and is available on GitHub.", "body_md": "tags: python, opensource, ai, devtools\n\nAI coding assistants are fast, and they ship confident bugs. The output looks right, the explanation sounds right, and the failing case turns up in production. The missing piece isn't a smarter generator — it's something that can *check* the generated code and refuse to bluff when it can't.\n\n[ ishvacerto](https://github.com/ishvaproducts-png/ishvacerto) is that gate. Give it a function and a way to check it — its own doctests, your tests, or a reference implementation — and it returns exactly one of three answers:\n\n`REFUTED [doctest] fn=square counterexample: square(3) (got 6, expected 9)`\n\n.The whole promise lives in that third answer. **Never wrong, sometimes silent.** It verifies what it can check and abstains on the rest — which is exactly why it never false-alarms on correct code.\n\n```\npip install ishvacerto\npython\nfrom ishvacerto import verify, verify_against_reference\n\nverify(open(\"f.py\").read())                    # uses the code's own doctests\nverify(code, tests=[(\"f(3)\", \"9\")])            # against your tests\nverify_against_reference(ai_code, ref, \"f\")    # where does it diverge from a reference?\n```\n\nFrom the command line (exits `1`\n\non REFUTED, so it gates CI directly):\n\n```\nishvacerto my_function.py\nishvacerto --ref reference.py --entry my_func ai_generated.py   # differential\nishvacerto --json my_function.py                                # machine-readable\n```\n\nYou can reproduce the headline numbers yourself — there's a script in the repo:\n\n```\npython benchmarks/humaneval_gate.py\n```\n\nOn the real **HumanEval** benchmark (164 problems), the gate produces **0 false alarms** on the canonical correct solutions, captures a checkable doctest spec on **76/164 (~46%)** of problems, and abstains on the rest. It even flags HumanEval's *own* wrong doctest (problem 47) as a spec/code conflict rather than a false alarm — it caught a benchmark bug instead of blaming the code.\n\nCoverage grows with the spec or reference you give it. The roadmap is a **reference proposer** that retrieves a same-task verified reference for code that ships with no tests, widening reach while keeping false alarms at zero.\n\nThe differential mode is the fun part: it generates inputs, runs the candidate and the reference, and shows the **first input where they disagree**. Input generation is signature-agnostic — it produces generic argument tuples, lets the reference filter the valid ones, and abstains if it can't exercise at least one.\n\nPure Python **standard library**, **zero dependencies**, **13/13** tests, CI green on Python **3.9 / 3.11 / 3.12**, MIT. It **runs entirely on your machine** — no account, no cloud, no telemetry, your code never leaves the box. There's also a VS Code extension that shows the counterexample inline.\n\nIt verifies what it can check and **abstains on the rest** — coverage is a function of the spec or reference you give it, never a guess. And the subprocess timeout guards against hangs; it is **not** a security sandbox, so verify code whose source you trust (your own assistant's output) or run it in a container.\n\nIt doesn't compete with your AI coder — it makes its output **safe to ship**.\n\n⭐ MIT, free, and the measurements are reproducible: [https://github.com/ishvaproducts-png/ishvacerto](https://github.com/ishvaproducts-png/ishvacerto)\n\n```\npip install ishvacerto\n```\n\n", "url": "https://wpnews.pro/news/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it", "canonical_source": "https://dev.to/ishvatheguru/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it-5bon", "published_at": "2026-06-24 04:54:39+00:00", "updated_at": "2026-06-24 05:13:31.228630+00:00", "lang": "en", "topics": ["developer-tools", "ai-tools"], "entities": ["ishvacerto", "HumanEval", "GitHub", "VS Code"], "alternates": {"html": "https://wpnews.pro/news/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it", "markdown": "https://wpnews.pro/news/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it.md", "text": "https://wpnews.pro/news/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it.txt", "jsonld": "https://wpnews.pro/news/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it.jsonld"}}