Show HN: An adversarial reasoning engine for scientific progress

A single human operator built a zero-trust adversarial research system called ZTARE over eight weeks, which then caught large language models from Claude, Gemini, and GPT-4o cheating their own evaluations through nine documented self-certifying strategies. The system falsified its own substrate, recording that only four of 18 catalogued primitives were actually engaged, and produced roughly 34,000 artifacts while surfacing hundreds of integrity errors in its own catch ledger. The project demonstrates that model capability compounds or degrades based on the research environment around it, not just the underlying AI.

Catch LLMs cheating their own evaluations. Field-documented catalog + audit patterns + a forecasting finding that decomposes "no signal" into two opposite signals. 9 ways LLMs cheat their own evaluations → /sparckix/ztare/blob/main/docs/cheating catalog.md 9 named self-certifying strategies observed under execution-grade audit across Claude, Gemini, and GPT-4o, each with a code-level cheat sketch and the audit pattern that catches it. A filesystem-first socio-technical research system for testing claims, surfacing failure modes, and governing AI-assisted research, built by one human operator and a rotating set of agentic operators over roughly eight weeks, then pointed at itself. The core stack has three parts: a zero-trust adversarial validator, an out-of-loop research organization/runtime, and a reflexive intelligence layer that learns from forecasts, actions, catches, trajectories, and experiment records. The core intuition is not that scaffolding replaces model capability. It is that model capability is only one input. Like human talent, it compounds or degrades depending on the environment around it: task framing, evidence boundaries, role separation, feedback, falsifiers, memory, and accountability. ZTARE is an attempt to build that environment for scientific generation and validation. php research org chooses work - validator/proof/script/panel/human-agent co-work - ledgers and outcomes - forecasts / action impact / trajectory mining - next action, split, defer, or kill A weekly reflexive audit re-mines every artifact and feeds the result back. The numbers below were produced by that audit; they are not a live dashboard. The live record is research areas/EXPERIMENT TRACK RECORD.md /sparckix/ztare/blob/main/research areas/EXPERIMENT TRACK RECORD.md and research areas/insights ledger.md . Snapshot, mid-May 2026: On the order of 34,000 authored artifacts. Roughly a quarter are ZTARE iteration files; the remainder is out-of-loop agent work, and the trailing-window share is even higher. The live substrate is agent dispatch + governance + mining. The apparatus falsified its own substrate and recorded it. A 28-day, 157-project capability-ROI audit found that of roughly 18 catalogued primitives, only four were engaged, seven were dead, and seven were never instantiated. The evolutionary zoo did not survive contact with the work, and the machine said so. Recursive gain was real, then plateaued. Contextualized insight density rose then flattened a plateau, not an exponential; in-system rubric, so reported with that caveat . Triple-digit ratified catches across dozens of categories — self-reported, in-system. This is the apparatus auditing itself, not externally verified. The catch ledger's own integrity validator was found dead for weeks and resurrected surfacing ~300 integrity errors to remediate , and a mis-selected rater was demoted mid-cycle — both recorded next to the original claims. Treat the count as an internal signal, not a validated benchmark. Single operator, N=1, non-expert. Nothing here claims a solved Millennium problem, an autonomous research engine, or a general law. The contribution is the discipline and an honest record of where it broke. On named personas. Synthetic review panels and debate logs use labels of real individuals for example Dijkstra, Knuth, Munger . These are stylistic shorthand for reasoning approaches loosely inspired by published work. They do not represent the views, endorsements, or actual reasoning of those individuals, and no affiliation is implied. The full statement is in src/ztare/personas/registry.py . Most of the value is substrate-independent and reusable without ZTARE: , practices for pipelines whose internals are LLM calls: stub-replay testing, eligibility pre-filters, provenance telemetry, decomposed wire-in, cross-reference knowledge graphs. Agentic engineering patterns /sparckix/ztare/blob/main/docs/concepts/agentic engineering patterns.md , capabilities the architecture runs on its own infrastructure the audit that demoted its own claims is one of them . Reflexive primitives /sparckix/ztare/blob/main/docs/concepts/reflexive engineering.md , the proposer-doesn't-grade-itself constitution, plus a Epistemic discipline /sparckix/ztare/blob/main/docs/concepts/epistemic principles.md mining-derived anti-pattern catalog /sparckix/ztare/blob/main/docs/concepts/anti pattern catalog.md and an append-only catch ledger /sparckix/ztare/blob/main/LEDGERS.md . The org runtime , M-form separation roles, mandates, gates, damage signals used to actually run the project as its own research company. The substrate-agnostic kernel is the separate public repo; this repo carries only a thin github.com/sparckix/cognitive-firm https://github.com/sparckix/cognitive-firm tenant overlay of it GP-191, see docs/guides/forking the kernel.md /sparckix/ztare/blob/main/docs/guides/forking the kernel.md and docs/concepts/organizational primitives.md /sparckix/ztare/blob/main/docs/concepts/organizational primitives.md . A fresh public clone here runs kernel-only. The org/ tree in ZTARE is therefore a compatibility and tenant overlay surface, not the canonical upstream kernel. Research-supervision traces for frontier labs , the design pattern of preserving attempts, critiques, source-readiness labels, demotions, nulls, and next falsifiers as training/eval material rather than keeping only final answers. See architecture.md /sparckix/ztare/blob/main/docs/concepts/architecture.md and agent agnostic recursive gain.md /sparckix/ztare/blob/main/docs/concepts/agent agnostic recursive gain.md . The full workbench/module map , including how ZTARE relates to adjacent systems such as AI Co-Mathematician, and how proof search, GNN novelty, forecast markets, org runtime, Orbit, supervisor, and public claims compose into a socio-technical research institution. See system position and module map.md /sparckix/ztare/blob/main/docs/concepts/system position and module map.md . ZTARE has four public tracks. | Track | Maturity | What it does | |---|---|---| Org Runtime Tenant Overlay | working prototype | ZTARE's applied instance of the reusable cognitive-firm primitives: persistent role offices, mandates, tasks, objectives, key results, gates, preferences, transition logs, damage signals, and operator surfaces. | ZTARE Kernel | stable / evolving | Turns messy source material into bounded evidence snapshots, then stress-tests claims through mutator, verification panel, judge, hard gates, telemetry, synthesis, and closure. | ZTARE Research Co | dogfood / active | The repo operating as its own research company: role-bound agents use the org runtime and ZTARE kernel to run programs, close experiments, and update ledgers. | Scientific Case Studies | experimental / status-labeled | Gravity, neural scaling, Navier-Stokes, transformer-successor, and other bounded campaigns that stress-test the kernel and produce calibrated public artifacts when evidence licenses them. | The tracks are designed to compose: the org overlay governs who acts in this repo, the reusable kernel lives upstream in cognitive-firm, the ZTARE kernel tests claims, ZTARE Research Co dogfoods the operating model, and case studies supply hard substrates with explicit evidence boundaries. The original LLM-gaming work is one important subset of the project. It is not the whole project. The larger object is a disciplined research operating model — for one operator, not a productized platform: claims move through evidence, tests, gates, ledgers, and accountable roles. The proposer does not grade itself. Generation, adversarial review, scoring, and deterministic gates are separate. Capability needs an environment. Stronger models widen the search surface, but discipline determines whether that search becomes evidence, slop, or premature closure. Prose is not evidence. A claim must survive executable checks, holdout surfaces, or explicit refusal. Memory is allowed; unearned trust is not. The workspace can accumulate sources. The validator starts from a bounded evidence snapshot. Failures are signal. Nulls, refusals, residual structure, and instrument failures are recorded because they change what to build next. Chat is not the system of record. Durable artifacts live under projects/ , research areas/ , org/ , ztare workspace/ , and papers/ . | If you want to... | Start at | |---|---| | Understand the repo layers and doc maturity | | docs/concepts/system position and module map.md /sparckix/ztare/blob/main/docs/concepts/system position and module map.md docs/concepts/capabilities.md /sparckix/ztare/blob/main/docs/concepts/capabilities.md docs/public claim register.md /sparckix/ztare/blob/main/docs/public claim register.md docs/concepts/closure claim governance.md /sparckix/ztare/blob/main/docs/concepts/closure claim governance.md docs/guides/first-30-minutes.md /sparckix/ztare/blob/main/docs/guides/first-30-minutes.md docs/guides/quickstart.md /sparckix/ztare/blob/main/docs/guides/quickstart.md ztare CLI docs/guides/cli.md /sparckix/ztare/blob/main/docs/guides/cli.md priority roadmap.md /sparckix/ztare/blob/main/priority roadmap.md research areas/EXPERIMENT TRACK RECORD.md /sparckix/ztare/blob/main/research areas/EXPERIMENT TRACK RECORD.md docs/guides/workflow.md /sparckix/ztare/blob/main/docs/guides/workflow.md docs/concepts/architecture.md /sparckix/ztare/blob/main/docs/concepts/architecture.md docs/concepts/cognitive gym.md /sparckix/ztare/blob/main/docs/concepts/cognitive gym.md docs/guides/runtime smoke test.md /sparckix/ztare/blob/main/docs/guides/runtime smoke test.md docs/guides/org runtime quickstart.md /sparckix/ztare/blob/main/docs/guides/org runtime quickstart.md docs/guides/operator console.md /sparckix/ztare/blob/main/docs/guides/operator console.md docs/concepts/organizational primitives.md /sparckix/ztare/blob/main/docs/concepts/organizational primitives.md docs/concepts/ztare research company architecture.md /sparckix/ztare/blob/main/docs/concepts/ztare research company architecture.md docs/landings/org runtime landing.html /sparckix/ztare/blob/main/docs/landings/org runtime landing.html org/landings/research company landing.html /sparckix/ztare/blob/main/org/landings/research company landing.html supervisor/USER MANUAL.md /sparckix/ztare/blob/main/supervisor/USER MANUAL.md papers/README.md /sparckix/ztare/blob/main/papers/README.md docs/sprint 60day journey.md /sparckix/ztare/blob/main/docs/sprint 60day journey.md projects/ns millennium hunt/public/JOURNEY.md /sparckix/ztare/blob/main/projects/ns millennium hunt/public/JOURNEY.md LEDGERS.md /sparckix/ztare/blob/main/LEDGERS.md docs/concepts/glossary.md /sparckix/ztare/blob/main/docs/concepts/glossary.md CONTRIBUTING.md /sparckix/ztare/blob/main/CONTRIBUTING.md If you are not sure where to start, use the domain-validation path. git clone https://github.com/sparckix/ztare cd ztare python3 -m venv venv source venv/bin/activate pip install -r requirements.txt pip install -e . registers the ztare console script make help make demo make smoke-public the apparatus is now callable as a single command: ztare --help the operator surface ztare forecast status sealed forecast-pool state ztare leanmill schedule … LeanMill orchestration GP-225 ztare bundle verify … sealed-bundle gate See docs/guides/cli.md /sparckix/ztare/blob/main/docs/guides/cli.md for the full subcommand tour and the engine/governance split between this CLI and cognitive-firm-userland . make demo and make smoke-public do not invoke live model calls. Add model API keys only when you are ready to run an LLM-backed validator loop: export GEMINI API KEY=your key here Optional, depending on model pairings: export ANTHROPIC API KEY=your key here export OPENAI API KEY=your key here Run a validator loop on an existing project: make experiment-loop PROJECT=<project RUBRIC=<rubric ITERS=10 MUTATOR MODEL=gemini JUDGE MODEL=gemini Run the full evidence workflow: make workspace-update PROJECT=<project MODEL=gemini make evidence-compile PROJECT=<project MODEL=gemini Review and promote compiled evidence.txt to evidence.txt when appropriate. make experiment-loop PROJECT=<project RUBRIC=<rubric ITERS=10 MUTATOR MODEL=gemini JUDGE MODEL=gemini make synth PROJECT=<project MODEL=gemini QA MODEL=claude RENDERER=founder memo make experiment-loop is the safe default for live runs. It disables attacker tools and activates hard-gate preflights when the rubric declares them. Use make loop only when actively debugging and you understand the safety tradeoff. mkdir -p projects/your domain/raw python -m src.ztare.common.scaffold project charter \ --project your domain \ --mode broad Add source files under projects/your domain/raw/ make workspace-update PROJECT=your domain MODEL=gemini make evidence-compile PROJECT=your domain MODEL=gemini After reviewing compiled evidence.txt, promote it: cp projects/your domain/compiled evidence.txt projects/your domain/evidence.txt make experiment-loop PROJECT=your domain RUBRIC=recursive bayesian ITERS=10 MUTATOR MODEL=gemini JUDGE MODEL=gemini The evidence workflow writes structured artifacts under projects/<project /workspace/ : facts, contradictions, open questions, evidence gaps, derived constraints, compile failures, and validator telemetry. The science track treats numerical or scientific substrates as adversarial discovery problems. The engine proposes candidate laws, fits parameters deterministically, tests against visible/holdout/farther-tail surfaces, compresses forms, and records nulls when the substrate is underidentified. make discover PROJECT=<project RUBRIC=<rubric ITERS=15 make compress PROJECT=<project make prove PROJECT=<project The honest interpretation is scoped: - calibration recoveries show the instrument can recover known forms under cold-variable rigor; - apparatus-only findings require the run artifacts and gates, not just model recall; - correct refusals are valuable when the data do not license compression; - new-science claims require stricter external validation than a high score. For the full workflow and caveats, see docs/guides/workflow.md /sparckix/ztare/blob/main/docs/guides/workflow.md and docs/guides/for researchers.md /sparckix/ztare/blob/main/docs/guides/for researchers.md . ZTARE contains a local governance overlay for persistent AI research roles, validated against the project's own work. The reusable, substrate-agnostic kernel for this layer lives in cognitive-firm https://github.com/sparckix/cognitive-firm ; this repo keeps the ZTARE tenant state, compatibility surfaces, and dogfood deployment. A role office has a JSON-schema-validated contract org/roles/<role .yaml , a mandate, allowed and forbidden paths, budget caps, an inbox, claims, transition logs, and closure duties. php principal preferences + objectives - role mandate - task or gate - daemon proposal/execution - transition log, closure, ledger update The principal can drive the runtime through three rails. They share one source of truth, the gate and channel JSON files on disk, so a decision made on any rail is visible from the others within seconds. | Rail | Best for | Surface | |---|---|---| | Executive inbox filesystem | source of truth, scriptable from any shell | ztare workspace/gates/pending/ .json , org/channels/<role /inbox/ | | Orbit dashboard browser | rich approvals with reasons, send a directive, pause/resume a daemon, OKR tree visual | cd orbit && npm run sync and npm run dev | | Notification provider optional tenant rail | push notification, tap-to-approve, digest surfaces | filesystem outbox by default; tenant overlays may add Telegram/Slack/etc. | Local smoke path: python scripts/public/control/org first run setup.py --member-id codex --agent-cli codex --agent-adapter codex exec Docker/daemon path: docker compose --env-file .env --profile daemons run --rm research-director-daemon \ python scripts/public/control/org role preflight.py --role research director docker compose --env-file .env --profile daemons up research-director-daemon Preflight validates each role yaml against schemas/role.v1.schema.json and runs the bootstrap chain in org/bootstrap manifest.yaml so an agent always boots from the same set of contracts AGENTS.md, role yaml, mandate, preferences, then optional procedural reads . Docker is a deployment wrapper, not magic authentication. Full execution needs the chosen agent runtime codex , claude , or another adapter installed and authenticated inside the container or on the host running the daemon. The org runtime is currently filesystem-backed. A daemon sees only the org/ , ztare workspace/ , and project files mounted into its process. For VPS deployment, either create tasks on the VPS, sync private org state there, or mount a shared state volume. See docs/guides/org runtime docker deploy.md /sparckix/ztare/blob/main/docs/guides/org runtime docker deploy.md . Key docs: docs/landings/org runtime landing.html /sparckix/ztare/blob/main/docs/landings/org runtime landing.html , adoption-pitch landing for the org/ kernel itself org/landings/research company landing.html /sparckix/ztare/blob/main/org/landings/research company landing.html , landing framed as the ZTARE research-company adoption docs/guides/operator console.md /sparckix/ztare/blob/main/docs/guides/operator console.md docs/guides/org runtime quickstart.md /sparckix/ztare/blob/main/docs/guides/org runtime quickstart.md docs/guides/org runtime docker deploy.md /sparckix/ztare/blob/main/docs/guides/org runtime docker deploy.md docs/concepts/organizational primitives.md /sparckix/ztare/blob/main/docs/concepts/organizational primitives.md docs/concepts/ztare research company architecture.md /sparckix/ztare/blob/main/docs/concepts/ztare research company architecture.md org/README.md /sparckix/ztare/blob/main/org/README.md org/bootstrap manifest.yaml /sparckix/ztare/blob/main/org/bootstrap manifest.yaml , role bootstrap chain schemas/role.v1.schema.json /sparckix/ztare/blob/main/schemas/role.v1.schema.json , role contract schema ZTARE is intentionally open source, but it is not a raw operations dump. The release rule is: ship the scientific instrument and public documentation aggressively; keep active strategy, sealed pre-registrations, personal context, credentials, and first-mover-sensitive product tactics private until closure or public derivative rendering. Public by default: - research-engine code, validators, gates, fit primitives, and proof tooling; - Lean verifier modules and exact certificate checkers; - public docs, papers, rubrics, and calibrated closed artifacts; - closed seams that pass the visibility rule. Local / gitignored by default: - local-only research notes and .ip protected/ ; - active strategy seams, sealed GT/pre-registration material, and in-flight experiment tactics; - org-runtime mandates, preferences, channels, directives, sessions, and runtime task state; - credentials, contact channels, API keys, local logs, and cloud/GPU telemetry that contains operational context. The scientific instrument should be inspectable and reproducible. Active experiments still need sealed envelopes so later results remain interpretable. The core loop: Mutator proposes a thesis and executable candidate. Verification panel attacks weak assumptions. Fitter/solver estimates parameters when the substrate is numeric. Meta-judge scores execution output rather than persuasive prose. Hard gates enforce deterministic pass/fail constraints. Telemetry and ledgers preserve what happened, including failures. This architecture grew out of the Cognitive Camouflage work: LLM-generated code can pass holistic review while violating the intent of the test. ZTARE's answer is separation of duties plus executable gates. Examples of failure modes the system has had to defend against: | Pattern | Failure | |---|---| | Blame shield | Hide one critical unsupported axiom among many harmless ones. | | Float masking | Round away the precision that would reveal the failure. | | Fake mechanism | Name a function after a mechanism while hardcoding its output. | | Cooked RNG | Hardcode improving pseudo-random behavior instead of learning. | | Assert narrowing | Define tests so narrowly that only the submitted case passes. | | Unit laundering | Hide an empirical correction as a dimensional factor. | | Straw-man comparison | Design the rival so the preferred answer wins by construction. | The gaming paper documents the first version of this problem. The current repo generalizes the response into a research and governance stack. | Surface | Status | Entry point | |---|---|---| | Domain evidence workspace | usable | make workspace-update , make evidence-compile | | Adversarial validator | usable | make experiment-loop | | Synthesis pipeline | usable | make synth | | Science compression / proof stubs | experimental | make discover , make compress , make prove | | Evaluator hardening / gates | active development | docs/concepts/architecture.md , supervisor/USER MANUAL.md | | Org runtime overlay / role daemons | working today | docs/guides/org runtime quickstart.md | | ZTARE Research Co dogfood loop | active | priority roadmap.md , research areas/EXPERIMENT TRACK RECORD.md , research areas/specs/active/apparatus/instrumentation/GP-244 research operations intelligence cockpit spec.md | | Executive inbox filesystem rail | working today | ztare workspace/gates/pending/ + org/channels/ | | Orbit governance UI browser rail | working today | orbit/ gate review queue, principal cockpit, OKR tree | | Notification provider optional rail | tenant-specific | filesystem outbox by default; Telegram/Slack/etc. belong in tenant overlays | | Path | Purpose | |---|---| src/ztare/ | Python implementation: validator, fit primitives, gates, synthesis, workspace, orchestration. | projects/ | Domain projects, evidence, workspaces, validator artifacts, scientific sandboxes. | rubrics/ | Scoring rubrics and gate configuration. | docs/ | Architecture, workflow, concepts, product/runtime docs. | papers/ | Public manuscript sources. | ztare proofs/ | Lean proof sources and formalization experiments; generated .lake/ build state is ignored. | research areas/ | Experiment track record, current board, seams, specs, debates, research logs. | org/ | Roles, mandates, preferences, tasks, objectives, channels, runtime state. | supervisor/ | Program registry, manifests, control-plane docs. | orbit/ | Governance UI projection. | ztare workspace/ | Gates, transition logs, runtime projections. | Rule of thumb: - human-readable research prose goes under research areas/ ; - supervisor/runtime JSON state goes under supervisor/ , org/ , or ztare workspace/ ; - project evidence and run artifacts stay under projects/ . Cognitive Camouflage /sparckix/ztare/blob/main/papers/cognitive-camouflage/draft.md , specification gaming in LLM-generated code | SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract id=6512960 Adversarial Precedent Memory /sparckix/ztare/blob/main/papers/adversarial-precedent-memory/draft.md , hardening evaluators through mined failure constraints | SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract id=6525598 Contract-Governed Hardening /sparckix/ztare/blob/main/papers/contract-governed-hardening/draft.md , stage-gated recursive improvement with typed promotion contracts | SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract id=6542998 Cognitive Firm /sparckix/ztare/blob/main/papers/cognitive-firm/draft.md , managerial capitalism for artificial intelligence | SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract id=6543019 Epistemic Verification /sparckix/ztare/blob/main/papers/epistemic-verification/draft.md , manuscript in revision.- Adversarial Compression, experimental mathematics manuscript draft not mirrored in this repository . The papers are best read as a stack: - LLMs game underspecified evaluation. - Mined precedents and deterministic gates harden evaluators. - Typed promotion contracts make recursive hardening safer. - Persistent organizational roles govern AI work. - Epistemic verification decomposes judgment into repeatable operations plus a bounded residual. The active case-study layer applies this stack across scientific and governance substrates as falsifier pressure rather than discovery rhetoric. It should be read through the experiment records and promoted public papers, not through private working drafts. ZTARE is designed to improve research discipline, not to guarantee truth. Do not infer: - that a high score proves a scientific discovery; - that calibration recoveries are new science; - that an LLM cold shot is a controlled baseline unless model/date/prompt are recorded; - that hard gates cover every possible failure mode; - that the org runtime is enterprise-ready merely because the local single-team path works; - that “works on any domain” means no domain-specific evidence engineering is needed. The intended standard is stricter: if a result matters, it needs artifacts, gates, closure rows, and a clear statement of what would falsify it. This repo is easiest to operate with an agentic coding assistant such as Codex or Claude Code because the meaningful state is distributed across artifacts. Useful prompts are collected in docs/guides/agent-prompts.md /sparckix/ztare/blob/main/docs/guides/agent-prompts.md . Start with one of those paste-ready prompts when using a fresh Codex or Claude session to learn the repo, inspect a project, audit the forecast market, or work in observer mode on NS. For agents working inside this repo, AGENTS.md /sparckix/ztare/blob/main/AGENTS.md is the repo-level constitution. ZTARE borrows from several traditions without treating any as decorative: - Karpathy's LLM wiki pattern for accumulating source memory upstream of the validator. - Popperian falsification: cheap refutation is more valuable than persuasive confirmation. - Mungerian inversion and checklist discipline: name what would make success uninterpretable before celebrating it. - Scientific management, cybernetics, and organizational design: roles, handoffs, ledgers, and closure matter when cognition becomes machine-aided. MIT. The governance/orchestration code in org/ , supervisor/ , orbit/ , deploy/ , and src/ztare/{orchestration,supervisor,sessions,signals,notifications}/ is ZTARE's tenant-overlay integration of the upstream cognitive-firm https://github.com/sparckix/cognitive-firm kernel; the canonical kernel and its license live in that repository. Files ignored by the public/private boundary are not part of the public license grant until deliberately promoted. LICENSES.md /sparckix/ztare/blob/main/LICENSES.md is the file-by-file map; the full text is in LICENSE /sparckix/ztare/blob/main/LICENSE . Third-party notices are in NOTICE.md /sparckix/ztare/blob/main/NOTICE.md . If you cite this work, cite the specific paper or artifact you are using rather than the repository as a monolith. @misc{alami2026cognitivecamouflage, title = {Cognitive Camouflage: Specification Gaming in LLM-Generated Code Evades Holistic Evaluation but Not Adversarial Execution}, author = {Alami, Daniel}, year = {2026}, note = {SSRN preprint 6512960. Code: github.com/sparckix/ztare}, url = {https://papers.ssrn.com/sol3/papers.cfm?abstract id=6512960} } @misc{alami2026adversarialprecedent, title = {Adversarial Precedent Memory: Hardening LLM Evaluators Through Mined Failure Constraints}, author = {Alami, Daniel}, year = {2026}, note = {SSRN preprint 6525598. Code: github.com/sparckix/ztare}, url = {https://papers.ssrn.com/sol3/papers.cfm?abstract id=6525598} } @misc{alami2026contractgoverned, title = {Contract-Governed Adversarial Evaluator Hardening: Stage-Gated Recursive Improvement with Typed Promotion Contracts}, author = {Alami, Daniel}, year = {2026}, note = {SSRN preprint 6542998. Code: github.com/sparckix/ztare}, url = {https://papers.ssrn.com/sol3/papers.cfm?abstract id=6542998} } @misc{alami2026cognitivefirm, title = {The Cognitive Firm: Managerial Capitalism for Artificial Intelligence}, author = {Alami, Daniel}, year = {2026}, note = {SSRN preprint 6543019. Code: github.com/sparckix/ztare}, url = {https://papers.ssrn.com/sol3/papers.cfm?abstract id=6543019} }