{"slug": "a-13-kb-text-file-beat-a-smarter-model-benchmarking-ai-codegen-across-5-angular", "title": "A 13 KB text file beat a smarter model: benchmarking AI codegen across 5 Angular state libraries", "summary": "A developer benchmarked AI code generation across five Angular state management libraries and found that a 13 KB text file boosted the worst-performing library's score from 49% to 91%, matching established libraries with years of training data. The test revealed that smaller, targeted context files improved codegen accuracy more effectively than larger documentation dumps, and that AI failures often reflected inconsistent API naming rather than model limitations.", "body_md": "**Disclosure up front:** I maintain one of the five libraries tested (SignalTree), and it's the one that scored *worst* in the cold run — so this isn't a \"look how good my thing is\" post. The cross-library pattern and the fix were interesting enough that I wanted to put the numbers in front of people who use Copilot/Cursor/Claude Code every day. The whole harness is reproducible (one command, link at the bottom); I'd rather it get torn apart than taken on faith.\n\n**What this measures: one-shot generation.** The agent gets the prompt, returns a file, we score it. Real interactive use — Cursor/Copilot with chat back-and-forth, where the model sees its own errors and gets a second try — is a different setting, and the lift could be larger or smaller there. This is the cold-shot case.\n\nNo context provided, just \"write this in library X\":\n\n| Library | Cold score |\n|---|---|\n| Akita | 94% |\n| Elf | 94% |\n| NgRx (classic) | 91% |\n| NgRx SignalStore | 86% |\n| SignalTree | 49% |\n\nThe libraries that have been around for years, with thousands of blog posts and Stack Overflow answers, score in the 90s. The youngest/smallest library in the set scores ~49%. That gap isn't really a quality signal — it's a *corpus* signal. The models have simply seen orders of magnitude more Akita than SignalTree. Worth keeping in mind any time you judge a library by how well your AI assistant writes it cold: you're partly measuring its age, not its design.\n\nI shipped a ~13.5 KB `llms.txt`\n\n(a plain-text API summary) inside the npm package and re-ran with it in context:\n\n| Mode | SignalTree score |\n|---|---|\n| Cold | 49% |\n+ `llms.txt` (13.5 KB) |\n91% |\n+ `llms.txt` + extra notes (~25 KB) |\n87% |\n\n+42 percentage points from one small file — enough to pull the least-known library up into the range of the well-established ones. Two things I didn't expect:\n\n| Library | Cold | With SignalTree's context loaded |\n|---|---|---|\n| SignalTree | 49 | 91 |\n| NgRx (classic) | 91 | 88 |\n| NgRx SignalStore | 86 | 80 |\n| Akita | 94 | 85 |\n| Elf | 94 | 87 |\n\n**Practical takeaway: more context is not better.** Past ~15 KB the numbers went down, not up. If you maintain or use a less-common library, a small retrievable context file does more for codegen accuracy than reaching for a \"smarter\" model — primed mid-tier models beat cold top-tier ones in my runs — but dumping your whole docs site in backfires.\n\nThe failures weren't random. Agents kept calling methods that didn't exist, and the pattern pointed straight at my own inconsistency — I'd named predicate accessors two different ways across the API:\n\n```\n// some markers used an is- prefix\nsaveStatus.isLoading()\nusers.isEmpty()\n\n// others used bare names\nprofile.dirty()\nfeed.loading()\n```\n\nAn agent that learned `isLoading()`\n\nwould confidently try `isDirty()`\n\n, which never existed. That's not an AI failure — it's a human one wearing an AI costume. Any developer reading the docs hits the same wall; they just fail more quietly and blame themselves. I standardized on bare names (matching `FormControl.dirty`\n\n/`.valid`\n\n), kept the old names as deprecated aliases, shipped it.\n\nThe generalizable takeaway, and the reason I think this is worth writing up rather than burying in a changelog: **an API surface a model can't keep straight is usually one a human can't either.** Codegen accuracy turns out to be a surprisingly good proxy for naming consistency, and a cheap one to measure.\n\nI'd rather list the holes than have them found, so here are the three I'd lead with:\n\nAnd one that's less a flaw than a \"yeah, obviously\": **cold score ≈ training-data volume is barely a finding** — it's close to a truism once you say it out loud. The only mildly non-obvious part is *how cheaply* a retrievable file substitutes for years of corpus presence.\n\nOne OpenRouter key, ~$15, ~30 minutes:\n\n```\ngit clone https://github.com/JBorgia/signaltree\nexport OPENROUTER_API_KEY=sk-or-...\nnode scripts/ai-codegen-benchmark/runner.mjs\n```\n\nPrompts (YAML), scoring rubric, adapters, and per-cell results all live in `scripts/ai-codegen-benchmark/`\n\n. The prompts and rubric are the parts most worth disagreeing with — if you spot one that's unfair to a particular library, that's the most useful feedback I can get.\n\nFor those of you using Copilot / Cursor / Claude Code daily: when the generated code for a library is bad, **what's actually fixed it for you** — a custom rules file, pasted docs, an MCP server, something else? I'm especially curious whether the \"ship a small context file\" result holds outside my own setup, or whether interactive back-and-forth makes it moot.", "url": "https://wpnews.pro/news/a-13-kb-text-file-beat-a-smarter-model-benchmarking-ai-codegen-across-5-angular", "canonical_source": "https://dev.to/jborgia/a-13-kb-text-file-beat-a-smarter-model-benchmarking-ai-codegen-across-5-angular-state-libraries-3p36", "published_at": "2026-05-30 00:37:51+00:00", "updated_at": "2026-05-30 00:41:43.096985+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "ai-tools", "ai-research"], "entities": ["SignalTree", "Akita", "Elf", "NgRx", "Copilot", "Cursor", "Claude Code", "Stack Overflow"], "alternates": {"html": "https://wpnews.pro/news/a-13-kb-text-file-beat-a-smarter-model-benchmarking-ai-codegen-across-5-angular", "markdown": "https://wpnews.pro/news/a-13-kb-text-file-beat-a-smarter-model-benchmarking-ai-codegen-across-5-angular.md", "text": "https://wpnews.pro/news/a-13-kb-text-file-beat-a-smarter-model-benchmarking-ai-codegen-across-5-angular.txt", "jsonld": "https://wpnews.pro/news/a-13-kb-text-file-beat-a-smarter-model-benchmarking-ai-codegen-across-5-angular.jsonld"}}