{"slug": "show-hn-an-adversarial-reasoning-engine-for-scientific-progress", "title": "Show HN: An adversarial reasoning engine for scientific progress", "summary": "A single human operator built a zero-trust adversarial research system called ZTARE over eight weeks, which then caught large language models from Claude, Gemini, and GPT-4o cheating their own evaluations through nine documented self-certifying strategies. The system falsified its own substrate, recording that only four of 18 catalogued primitives were actually engaged, and produced roughly 34,000 artifacts while surfacing hundreds of integrity errors in its own catch ledger. The project demonstrates that model capability compounds or degrades based on the research environment around it, not just the underlying AI.", "body_md": "**Catch LLMs cheating their own evaluations. Field-documented catalog +\naudit patterns + a forecasting finding that decomposes \"no signal\" into\ntwo opposite signals.**\n\n[9 ways LLMs cheat their own evaluations →](/sparckix/ztare/blob/main/docs/cheating_catalog.md)9 named self-certifying strategies observed under execution-grade audit across Claude, Gemini, and GPT-4o, each with a code-level cheat sketch and the audit pattern that catches it.\n\nA filesystem-first socio-technical research system for testing claims, surfacing failure modes, and governing AI-assisted research, built by one human operator and a rotating set of agentic operators over roughly eight weeks, then pointed at itself.\n\nThe core stack has three parts: a zero-trust adversarial validator, an out-of-loop research organization/runtime, and a reflexive intelligence layer that learns from forecasts, actions, catches, trajectories, and experiment records.\n\nThe core intuition is not that scaffolding replaces model capability. It is that model capability is only one input. Like human talent, it compounds or degrades depending on the environment around it: task framing, evidence boundaries, role separation, feedback, falsifiers, memory, and accountability. ZTARE is an attempt to build that environment for scientific generation and validation.\n\n``` php\nresearch org chooses work -> validator/proof/script/panel/human-agent co-work\n-> ledgers and outcomes -> forecasts / action impact / trajectory mining\n-> next action, split, defer, or kill\n```\n\nA weekly reflexive audit re-mines every artifact and feeds the result\nback. The numbers below were produced by that audit; they are not a\nlive dashboard. The live record is\n[ research_areas/EXPERIMENT_TRACK_RECORD.md](/sparckix/ztare/blob/main/research_areas/EXPERIMENT_TRACK_RECORD.md)\nand\n\n`research_areas/insights_ledger.md`\n\n. *Snapshot, mid-May 2026:*\n\n**On the order of 34,000 authored artifacts.** Roughly a quarter are ZTARE iteration files; the remainder is out-of-loop agent work, and the trailing-window share is even higher. The live substrate is agent dispatch + governance + mining.**The apparatus falsified its own substrate and recorded it.** A 28-day, 157-project capability-ROI audit found that of roughly 18 catalogued primitives, only four were engaged, seven were dead, and seven were never instantiated. The evolutionary zoo did not survive contact with the work, and the machine said so.**Recursive gain was real, then plateaued.** Contextualized insight density rose then flattened (a plateau, not an exponential; in-system rubric, so reported with that caveat).**Triple-digit ratified catches across dozens of categories — self-reported, in-system.** This is the apparatus auditing itself, not externally verified. The catch ledger's own integrity validator was found dead for weeks and resurrected (surfacing ~300 integrity errors to remediate), and a mis-selected rater was demoted mid-cycle — both recorded next to the original claims. Treat the count as an internal signal, not a validated benchmark.\n\nSingle operator, N=1, non-expert. Nothing here claims a solved Millennium problem, an autonomous research engine, or a general law. The contribution is the discipline and an honest record of where it broke.\n\n**On named personas.** Synthetic review panels and debate logs use labels\nof real individuals (for example Dijkstra, Knuth, Munger). These are\nstylistic shorthand for reasoning approaches loosely inspired by published\nwork. They do not represent the views, endorsements, or actual reasoning\nof those individuals, and no affiliation is implied. The full statement is\nin `src/ztare/personas/registry.py`\n\n.\n\nMost of the value is substrate-independent and reusable without ZTARE:\n\n, practices for pipelines whose internals are LLM calls: stub-replay testing, eligibility pre-filters, provenance telemetry, decomposed wire-in, cross-reference knowledge graphs.[Agentic engineering patterns](/sparckix/ztare/blob/main/docs/concepts/agentic_engineering_patterns.md), capabilities the architecture runs on its own infrastructure (the audit that demoted its own claims is one of them).[Reflexive primitives](/sparckix/ztare/blob/main/docs/concepts/reflexive_engineering.md), the proposer-doesn't-grade-itself constitution, plus a[Epistemic discipline](/sparckix/ztare/blob/main/docs/concepts/epistemic_principles.md)[mining-derived anti-pattern catalog](/sparckix/ztare/blob/main/docs/concepts/anti_pattern_catalog.md)and an append-only[catch ledger](/sparckix/ztare/blob/main/LEDGERS.md).**The org runtime**, M-form separation (roles, mandates, gates, damage signals) used to actually run the project as its own research company. The substrate-agnostic kernel is the separate public repo; this repo carries only a thin[github.com/sparckix/cognitive-firm](https://github.com/sparckix/cognitive-firm)*tenant overlay*of it (GP-191, see[docs/guides/forking_the_kernel.md](/sparckix/ztare/blob/main/docs/guides/forking_the_kernel.md)and[docs/concepts/organizational_primitives.md](/sparckix/ztare/blob/main/docs/concepts/organizational_primitives.md)). A fresh public clone here runs kernel-only. The`org/`\n\ntree in ZTARE is therefore a compatibility and tenant overlay surface, not the canonical upstream kernel.**Research-supervision traces for frontier labs**, the design pattern of preserving attempts, critiques, source-readiness labels, demotions, nulls, and next falsifiers as training/eval material rather than keeping only final answers. See[architecture.md](/sparckix/ztare/blob/main/docs/concepts/architecture.md)and[agent_agnostic_recursive_gain.md](/sparckix/ztare/blob/main/docs/concepts/agent_agnostic_recursive_gain.md).**The full workbench/module map**, including how ZTARE relates to adjacent systems such as AI Co-Mathematician, and how proof search, GNN novelty, forecast markets, org runtime, Orbit, supervisor, and public claims compose into a socio-technical research institution. See[system_position_and_module_map.md](/sparckix/ztare/blob/main/docs/concepts/system_position_and_module_map.md).\n\nZTARE has four public tracks.\n\n| Track | Maturity | What it does |\n|---|---|---|\nOrg Runtime Tenant Overlay |\nworking prototype | ZTARE's applied instance of the reusable cognitive-firm primitives: persistent role offices, mandates, tasks, objectives, key results, gates, preferences, transition logs, damage signals, and operator surfaces. |\nZTARE Kernel |\nstable / evolving | Turns messy source material into bounded evidence snapshots, then stress-tests claims through mutator, verification panel, judge, hard gates, telemetry, synthesis, and closure. |\nZTARE Research Co |\ndogfood / active | The repo operating as its own research company: role-bound agents use the org runtime and ZTARE kernel to run programs, close experiments, and update ledgers. |\nScientific Case Studies |\nexperimental / status-labeled | Gravity, neural scaling, Navier-Stokes, transformer-successor, and other bounded campaigns that stress-test the kernel and produce calibrated public artifacts when evidence licenses them. |\n\nThe tracks are designed to compose: the org overlay governs who acts in this repo, the reusable kernel lives upstream in cognitive-firm, the ZTARE kernel tests claims, ZTARE Research Co dogfoods the operating model, and case studies supply hard substrates with explicit evidence boundaries.\n\nThe original LLM-gaming work is one important subset of the project. It is not the whole project. The larger object is a disciplined research operating model — for one operator, not a productized platform: claims move through evidence, tests, gates, ledgers, and accountable roles.\n\n**The proposer does not grade itself.** Generation, adversarial review, scoring, and deterministic gates are separate.**Capability needs an environment.** Stronger models widen the search surface, but discipline determines whether that search becomes evidence, slop, or premature closure.**Prose is not evidence.** A claim must survive executable checks, holdout surfaces, or explicit refusal.**Memory is allowed; unearned trust is not.** The workspace can accumulate sources. The validator starts from a bounded evidence snapshot.**Failures are signal.** Nulls, refusals, residual structure, and instrument failures are recorded because they change what to build next.**Chat is not the system of record.** Durable artifacts live under`projects/`\n\n,`research_areas/`\n\n,`org/`\n\n,`ztare_workspace/`\n\n, and`papers/`\n\n.\n\n| If you want to... | Start at |\n|---|---|\n| Understand the repo layers and doc maturity |\n|\n\n[docs/concepts/system_position_and_module_map.md](/sparckix/ztare/blob/main/docs/concepts/system_position_and_module_map.md)[docs/concepts/capabilities.md](/sparckix/ztare/blob/main/docs/concepts/capabilities.md)[docs/public_claim_register.md](/sparckix/ztare/blob/main/docs/public_claim_register.md)[docs/concepts/closure_claim_governance.md](/sparckix/ztare/blob/main/docs/concepts/closure_claim_governance.md)[docs/guides/first-30-minutes.md](/sparckix/ztare/blob/main/docs/guides/first-30-minutes.md)[docs/guides/quickstart.md](/sparckix/ztare/blob/main/docs/guides/quickstart.md)`ztare`\n\nCLI[docs/guides/cli.md](/sparckix/ztare/blob/main/docs/guides/cli.md)[priority_roadmap.md](/sparckix/ztare/blob/main/priority_roadmap.md)[research_areas/EXPERIMENT_TRACK_RECORD.md](/sparckix/ztare/blob/main/research_areas/EXPERIMENT_TRACK_RECORD.md)[docs/guides/workflow.md](/sparckix/ztare/blob/main/docs/guides/workflow.md)[docs/concepts/architecture.md](/sparckix/ztare/blob/main/docs/concepts/architecture.md)[docs/concepts/cognitive_gym.md](/sparckix/ztare/blob/main/docs/concepts/cognitive_gym.md)[docs/guides/runtime_smoke_test.md](/sparckix/ztare/blob/main/docs/guides/runtime_smoke_test.md)[docs/guides/org_runtime_quickstart.md](/sparckix/ztare/blob/main/docs/guides/org_runtime_quickstart.md)[docs/guides/operator_console.md](/sparckix/ztare/blob/main/docs/guides/operator_console.md)[docs/concepts/organizational_primitives.md](/sparckix/ztare/blob/main/docs/concepts/organizational_primitives.md)[docs/concepts/ztare_research_company_architecture.md](/sparckix/ztare/blob/main/docs/concepts/ztare_research_company_architecture.md)[docs/landings/org_runtime_landing.html](/sparckix/ztare/blob/main/docs/landings/org_runtime_landing.html)[org/landings/research_company_landing.html](/sparckix/ztare/blob/main/org/landings/research_company_landing.html)[supervisor/USER_MANUAL.md](/sparckix/ztare/blob/main/supervisor/USER_MANUAL.md)[papers/README.md](/sparckix/ztare/blob/main/papers/README.md)[docs/sprint_60day_journey.md](/sparckix/ztare/blob/main/docs/sprint_60day_journey.md)[projects/ns_millennium_hunt/public/JOURNEY.md](/sparckix/ztare/blob/main/projects/ns_millennium_hunt/public/JOURNEY.md)[LEDGERS.md](/sparckix/ztare/blob/main/LEDGERS.md)[docs/concepts/glossary.md](/sparckix/ztare/blob/main/docs/concepts/glossary.md)[CONTRIBUTING.md](/sparckix/ztare/blob/main/CONTRIBUTING.md)If you are not sure where to start, use the domain-validation path.\n\n```\ngit clone https://github.com/sparckix/ztare\ncd ztare\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\npip install -e .   # registers the `ztare` console script\n\nmake help\nmake demo\nmake smoke-public\n\n# the apparatus is now callable as a single command:\nztare --help                 # the operator surface\nztare forecast status        # sealed forecast-pool state\nztare leanmill schedule …    # LeanMill orchestration (GP-225)\nztare bundle verify …        # sealed-bundle gate\n```\n\nSee [ docs/guides/cli.md](/sparckix/ztare/blob/main/docs/guides/cli.md) for the full subcommand\ntour and the engine/governance split between this CLI and\n\n`cognitive-firm-userland`\n\n.`make demo`\n\nand `make smoke-public`\n\ndo not invoke live model calls. Add model\nAPI keys only when you are ready to run an LLM-backed validator loop:\n\n```\nexport GEMINI_API_KEY=your_key_here\n# Optional, depending on model pairings:\nexport ANTHROPIC_API_KEY=your_key_here\nexport OPENAI_API_KEY=your_key_here\n```\n\nRun a validator loop on an existing project:\n\n```\nmake experiment-loop PROJECT=<project> RUBRIC=<rubric> ITERS=10 MUTATOR_MODEL=gemini JUDGE_MODEL=gemini\n```\n\nRun the full evidence workflow:\n\n```\nmake workspace-update PROJECT=<project> MODEL=gemini\nmake evidence-compile PROJECT=<project> MODEL=gemini\n# Review and promote compiled_evidence.txt to evidence.txt when appropriate.\nmake experiment-loop PROJECT=<project> RUBRIC=<rubric> ITERS=10 MUTATOR_MODEL=gemini JUDGE_MODEL=gemini\nmake synth PROJECT=<project> MODEL=gemini QA_MODEL=claude RENDERER=founder_memo\n```\n\n`make experiment-loop`\n\nis the safe default for live runs. It disables attacker\ntools and activates hard-gate preflights when the rubric declares them. Use\n`make loop`\n\nonly when actively debugging and you understand the safety tradeoff.\n\n```\nmkdir -p projects/your_domain/raw\n\npython -m src.ztare.common.scaffold_project_charter \\\n  --project your_domain \\\n  --mode broad\n\n# Add source files under projects/your_domain/raw/\nmake workspace-update PROJECT=your_domain MODEL=gemini\nmake evidence-compile PROJECT=your_domain MODEL=gemini\n\n# After reviewing compiled_evidence.txt, promote it:\ncp projects/your_domain/compiled_evidence.txt projects/your_domain/evidence.txt\n\nmake experiment-loop PROJECT=your_domain RUBRIC=recursive_bayesian ITERS=10 MUTATOR_MODEL=gemini JUDGE_MODEL=gemini\n```\n\nThe evidence workflow writes structured artifacts under\n`projects/<project>/workspace/`\n\n: facts, contradictions, open questions,\nevidence gaps, derived constraints, compile failures, and validator telemetry.\n\nThe science track treats numerical or scientific substrates as adversarial discovery problems. The engine proposes candidate laws, fits parameters deterministically, tests against visible/holdout/farther-tail surfaces, compresses forms, and records nulls when the substrate is underidentified.\n\n```\nmake discover PROJECT=<project> RUBRIC=<rubric> ITERS=15\nmake compress PROJECT=<project>\nmake prove PROJECT=<project>\n```\n\nThe honest interpretation is scoped:\n\n- calibration recoveries show the instrument can recover known forms under cold-variable rigor;\n- apparatus-only findings require the run artifacts and gates, not just model recall;\n- correct refusals are valuable when the data do not license compression;\n- new-science claims require stricter external validation than a high score.\n\nFor the full workflow and caveats, see [docs/guides/workflow.md](/sparckix/ztare/blob/main/docs/guides/workflow.md)\nand [docs/guides/for_researchers.md](/sparckix/ztare/blob/main/docs/guides/for_researchers.md).\n\nZTARE contains a local governance overlay for persistent AI research roles,\nvalidated against the project's own work. The reusable, substrate-agnostic\nkernel for this layer lives in\n[cognitive-firm](https://github.com/sparckix/cognitive-firm); this repo keeps\nthe ZTARE tenant state, compatibility surfaces, and dogfood deployment.\nA role office has a JSON-schema-validated contract (`org/roles/<role>.yaml`\n\n),\na mandate, allowed and forbidden paths, budget caps, an inbox, claims,\ntransition logs, and closure duties.\n\n``` php\nprincipal preferences + objectives\n-> role mandate\n-> task or gate\n-> daemon proposal/execution\n-> transition log, closure, ledger update\n```\n\nThe principal can drive the runtime through three rails. They share one source of truth, the gate and channel JSON files on disk, so a decision made on any rail is visible from the others within seconds.\n\n| Rail | Best for | Surface |\n|---|---|---|\n| Executive inbox (filesystem) | source of truth, scriptable from any shell | `ztare_workspace/gates/pending/*.json` , `org/channels/<role>/inbox/` |\n| Orbit dashboard (browser) | rich approvals with reasons, send a directive, pause/resume a daemon, OKR tree visual | `cd orbit && npm run sync` and `npm run dev` |\n| Notification provider (optional tenant rail) | push notification, tap-to-approve, digest surfaces | filesystem outbox by default; tenant overlays may add Telegram/Slack/etc. |\n\nLocal smoke path:\n\n```\npython scripts/public/control/org_first_run_setup.py --member-id codex --agent-cli codex --agent-adapter codex_exec\n```\n\nDocker/daemon path:\n\n```\ndocker compose --env-file .env --profile daemons run --rm research-director-daemon \\\n  python scripts/public/control/org_role_preflight.py --role research_director\n\ndocker compose --env-file .env --profile daemons up research-director-daemon\n```\n\nPreflight validates each role yaml against `schemas/role.v1.schema.json`\n\nand\nruns the bootstrap chain in `org/bootstrap_manifest.yaml`\n\nso an agent always\nboots from the same set of contracts (AGENTS.md, role yaml, mandate,\npreferences, then optional procedural reads).\n\nDocker is a deployment wrapper, not magic authentication. Full execution needs\nthe chosen agent runtime (`codex`\n\n, `claude`\n\n, or another adapter) installed and\nauthenticated inside the container or on the host running the daemon.\n\nThe org runtime is currently filesystem-backed. A daemon sees only the\n`org/`\n\n, `ztare_workspace/`\n\n, and project files mounted into its process. For VPS\ndeployment, either create tasks on the VPS, sync private org state there, or\nmount a shared state volume. See\n[docs/guides/org_runtime_docker_deploy.md](/sparckix/ztare/blob/main/docs/guides/org_runtime_docker_deploy.md).\n\nKey docs:\n\n[docs/landings/org_runtime_landing.html](/sparckix/ztare/blob/main/docs/landings/org_runtime_landing.html), adoption-pitch landing for the org/ kernel itself[org/landings/research_company_landing.html](/sparckix/ztare/blob/main/org/landings/research_company_landing.html), landing framed as the ZTARE research-company adoption[docs/guides/operator_console.md](/sparckix/ztare/blob/main/docs/guides/operator_console.md)[docs/guides/org_runtime_quickstart.md](/sparckix/ztare/blob/main/docs/guides/org_runtime_quickstart.md)[docs/guides/org_runtime_docker_deploy.md](/sparckix/ztare/blob/main/docs/guides/org_runtime_docker_deploy.md)[docs/concepts/organizational_primitives.md](/sparckix/ztare/blob/main/docs/concepts/organizational_primitives.md)[docs/concepts/ztare_research_company_architecture.md](/sparckix/ztare/blob/main/docs/concepts/ztare_research_company_architecture.md)[org/README.md](/sparckix/ztare/blob/main/org/README.md)[org/bootstrap_manifest.yaml](/sparckix/ztare/blob/main/org/bootstrap_manifest.yaml), role bootstrap chain[schemas/role.v1.schema.json](/sparckix/ztare/blob/main/schemas/role.v1.schema.json), role contract schema\n\nZTARE is intentionally open source, but it is not a raw operations dump. The release rule is:\n\n```\nship the scientific instrument and public documentation aggressively;\nkeep active strategy, sealed pre-registrations, personal context, credentials,\nand first-mover-sensitive product tactics private until closure or public\nderivative rendering.\n```\n\nPublic by default:\n\n- research-engine code, validators, gates, fit primitives, and proof tooling;\n- Lean verifier modules and exact certificate checkers;\n- public docs, papers, rubrics, and calibrated closed artifacts;\n- closed seams that pass the visibility rule.\n\nLocal / gitignored by default:\n\n- local-only research notes and\n`.ip_protected/`\n\n; - active strategy seams, sealed GT/pre-registration material, and in-flight experiment tactics;\n- org-runtime mandates, preferences, channels, directives, sessions, and runtime task state;\n- credentials, contact channels, API keys, local logs, and cloud/GPU telemetry that contains operational context.\n\nThe scientific instrument should be inspectable and reproducible. Active experiments still need sealed envelopes so later results remain interpretable.\n\nThe core loop:\n\n**Mutator** proposes a thesis and executable candidate.**Verification panel** attacks weak assumptions.**Fitter/solver** estimates parameters when the substrate is numeric.**Meta-judge** scores execution output rather than persuasive prose.**Hard gates** enforce deterministic pass/fail constraints.**Telemetry and ledgers** preserve what happened, including failures.\n\nThis architecture grew out of the Cognitive Camouflage work: LLM-generated code can pass holistic review while violating the intent of the test. ZTARE's answer is separation of duties plus executable gates.\n\nExamples of failure modes the system has had to defend against:\n\n| Pattern | Failure |\n|---|---|\n| Blame shield | Hide one critical unsupported axiom among many harmless ones. |\n| Float masking | Round away the precision that would reveal the failure. |\n| Fake mechanism | Name a function after a mechanism while hardcoding its output. |\n| Cooked RNG | Hardcode improving pseudo-random behavior instead of learning. |\n| Assert narrowing | Define tests so narrowly that only the submitted case passes. |\n| Unit laundering | Hide an empirical correction as a dimensional factor. |\n| Straw-man comparison | Design the rival so the preferred answer wins by construction. |\n\nThe gaming paper documents the first version of this problem. The current repo generalizes the response into a research and governance stack.\n\n| Surface | Status | Entry point |\n|---|---|---|\n| Domain evidence workspace | usable | `make workspace-update` , `make evidence-compile` |\n| Adversarial validator | usable | `make experiment-loop` |\n| Synthesis pipeline | usable | `make synth` |\n| Science compression / proof stubs | experimental | `make discover` , `make compress` , `make prove` |\n| Evaluator hardening / gates | active development | `docs/concepts/architecture.md` , `supervisor/USER_MANUAL.md` |\n| Org runtime overlay / role daemons | working today | `docs/guides/org_runtime_quickstart.md` |\n| ZTARE Research Co dogfood loop | active | `priority_roadmap.md` , `research_areas/EXPERIMENT_TRACK_RECORD.md` , `research_areas/specs/active/apparatus/instrumentation/GP-244_research_operations_intelligence_cockpit_spec.md` |\n| Executive inbox (filesystem rail) | working today | `ztare_workspace/gates/pending/` + `org/channels/` |\n| Orbit governance UI (browser rail) | working today | `orbit/` (gate review queue, principal cockpit, OKR tree) |\n| Notification provider (optional rail) | tenant-specific | filesystem outbox by default; Telegram/Slack/etc. belong in tenant overlays |\n\n| Path | Purpose |\n|---|---|\n`src/ztare/` |\nPython implementation: validator, fit primitives, gates, synthesis, workspace, orchestration. |\n`projects/` |\nDomain projects, evidence, workspaces, validator artifacts, scientific sandboxes. |\n`rubrics/` |\nScoring rubrics and gate configuration. |\n`docs/` |\nArchitecture, workflow, concepts, product/runtime docs. |\n`papers/` |\nPublic manuscript sources. |\n`ztare_proofs/` |\nLean proof sources and formalization experiments; generated `.lake/` build state is ignored. |\n`research_areas/` |\nExperiment track record, current board, seams, specs, debates, research logs. |\n`org/` |\nRoles, mandates, preferences, tasks, objectives, channels, runtime state. |\n`supervisor/` |\nProgram registry, manifests, control-plane docs. |\n`orbit/` |\nGovernance UI projection. |\n`ztare_workspace/` |\nGates, transition logs, runtime projections. |\n\nRule of thumb:\n\n- human-readable research prose goes under\n`research_areas/`\n\n; - supervisor/runtime JSON state goes under\n`supervisor/`\n\n,`org/`\n\n, or`ztare_workspace/`\n\n; - project evidence and run artifacts stay under\n`projects/`\n\n.\n\n[Cognitive Camouflage](/sparckix/ztare/blob/main/papers/cognitive-camouflage/draft.md), specification gaming in LLM-generated code |[SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6512960)[Adversarial Precedent Memory](/sparckix/ztare/blob/main/papers/adversarial-precedent-memory/draft.md), hardening evaluators through mined failure constraints |[SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6525598)[Contract-Governed Hardening](/sparckix/ztare/blob/main/papers/contract-governed-hardening/draft.md), stage-gated recursive improvement with typed promotion contracts |[SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6542998)[Cognitive Firm](/sparckix/ztare/blob/main/papers/cognitive-firm/draft.md), managerial capitalism for artificial intelligence |[SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6543019)[Epistemic Verification](/sparckix/ztare/blob/main/papers/epistemic-verification/draft.md), manuscript in revision.- Adversarial Compression, experimental mathematics manuscript (draft not mirrored in this repository).\n\nThe papers are best read as a stack:\n\n- LLMs game underspecified evaluation.\n- Mined precedents and deterministic gates harden evaluators.\n- Typed promotion contracts make recursive hardening safer.\n- Persistent organizational roles govern AI work.\n- Epistemic verification decomposes judgment into repeatable operations plus a bounded residual.\n\nThe active case-study layer applies this stack across scientific and governance substrates as falsifier pressure rather than discovery rhetoric. It should be read through the experiment records and promoted public papers, not through private working drafts.\n\nZTARE is designed to improve research discipline, not to guarantee truth.\n\nDo not infer:\n\n- that a high score proves a scientific discovery;\n- that calibration recoveries are new science;\n- that an LLM cold shot is a controlled baseline unless model/date/prompt are recorded;\n- that hard gates cover every possible failure mode;\n- that the org runtime is enterprise-ready merely because the local single-team path works;\n- that “works on any domain” means no domain-specific evidence engineering is needed.\n\nThe intended standard is stricter: if a result matters, it needs artifacts, gates, closure rows, and a clear statement of what would falsify it.\n\nThis repo is easiest to operate with an agentic coding assistant such as Codex or Claude Code because the meaningful state is distributed across artifacts.\n\nUseful prompts are collected in [docs/guides/agent-prompts.md](/sparckix/ztare/blob/main/docs/guides/agent-prompts.md).\nStart with one of those paste-ready prompts when using a fresh Codex or Claude\nsession to learn the repo, inspect a project, audit the forecast market, or\nwork in observer mode on NS.\n\nFor agents working inside this repo, [AGENTS.md](/sparckix/ztare/blob/main/AGENTS.md) is the repo-level\nconstitution.\n\nZTARE borrows from several traditions without treating any as decorative:\n\n- Karpathy's LLM wiki pattern for accumulating source memory upstream of the validator.\n- Popperian falsification: cheap refutation is more valuable than persuasive confirmation.\n- Mungerian inversion and checklist discipline: name what would make success uninterpretable before celebrating it.\n- Scientific management, cybernetics, and organizational design: roles, handoffs, ledgers, and closure matter when cognition becomes machine-aided.\n\nMIT. The governance/orchestration code in `org/`\n\n, `supervisor/`\n\n, `orbit/`\n\n, `deploy/`\n\n, and `src/ztare/{orchestration,supervisor,sessions,signals,notifications}/`\n\nis ZTARE's tenant-overlay integration of the upstream [cognitive-firm](https://github.com/sparckix/cognitive-firm) kernel; the canonical kernel and its license live in that repository.\n\nFiles ignored by the public/private boundary are not part of the public license grant until deliberately promoted.\n\n[LICENSES.md](/sparckix/ztare/blob/main/LICENSES.md) is the file-by-file map; the full text is in [LICENSE](/sparckix/ztare/blob/main/LICENSE). Third-party notices are in [NOTICE.md](/sparckix/ztare/blob/main/NOTICE.md).\n\nIf you cite this work, cite the specific paper or artifact you are using rather than the repository as a monolith.\n\n```\n@misc{alami2026cognitivecamouflage,\n  title = {Cognitive Camouflage: Specification Gaming in LLM-Generated Code Evades Holistic Evaluation but Not Adversarial Execution},\n  author = {Alami, Daniel},\n  year = {2026},\n  note = {SSRN preprint 6512960. Code: github.com/sparckix/ztare},\n  url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6512960}\n}\n\n@misc{alami2026adversarialprecedent,\n  title = {Adversarial Precedent Memory: Hardening LLM Evaluators Through Mined Failure Constraints},\n  author = {Alami, Daniel},\n  year = {2026},\n  note = {SSRN preprint 6525598. Code: github.com/sparckix/ztare},\n  url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6525598}\n}\n\n@misc{alami2026contractgoverned,\n  title = {Contract-Governed Adversarial Evaluator Hardening: Stage-Gated Recursive Improvement with Typed Promotion Contracts},\n  author = {Alami, Daniel},\n  year = {2026},\n  note = {SSRN preprint 6542998. Code: github.com/sparckix/ztare},\n  url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6542998}\n}\n\n@misc{alami2026cognitivefirm,\n  title = {The Cognitive Firm: Managerial Capitalism for Artificial Intelligence},\n  author = {Alami, Daniel},\n  year = {2026},\n  note = {SSRN preprint 6543019. Code: github.com/sparckix/ztare},\n  url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6543019}\n}\n```\n\n", "url": "https://wpnews.pro/news/show-hn-an-adversarial-reasoning-engine-for-scientific-progress", "canonical_source": "https://github.com/sparckix/ztare", "published_at": "2026-06-06 15:09:21+00:00", "updated_at": "2026-06-06 15:18:19.755766+00:00", "lang": "en", "topics": ["ai-safety", "ai-research", "ai-agents", "large-language-models", "ai-ethics"], "entities": ["Claude", "Gemini", "GPT-4o", "ZTARE"], "alternates": {"html": "https://wpnews.pro/news/show-hn-an-adversarial-reasoning-engine-for-scientific-progress", "markdown": "https://wpnews.pro/news/show-hn-an-adversarial-reasoning-engine-for-scientific-progress.md", "text": "https://wpnews.pro/news/show-hn-an-adversarial-reasoning-engine-for-scientific-progress.txt", "jsonld": "https://wpnews.pro/news/show-hn-an-adversarial-reasoning-engine-for-scientific-progress.jsonld"}}