Show HN: Gave Claude Code ADHD.. Now it thinks 3x better

Researchers introduced ADHD, a method that fans out parallel divergent branches under different cognitive frames to prevent premature convergence in large language model agents. Across six open-ended engineering problems, ADHD won 5/6 against a single-shot baseline, achieving mean improvements of +5.17 in novelty, +4.17 in breadth, and +7.67 in trap detection on a 0–10 rubric. The method addresses the failure mode where models default to the first plausible answer, making it particularly valuable for creative and design-shaped tasks where the goal is to escape high-probability responses.

Preprint · v0.1 · 2026-05-25 Large language model agents exhibit premature convergence : when asked to ideate on an open-ended design problem they default to the first plausible candidate and polish it, producing competent but forgettable output. We introduce ADHD , a method that fans out N parallel divergent branches under structurally different cognitive frames e.g. regulator , speedrunner , biology , $0 budget , with no cross-branch context, then converges via a separate critic pass that scores, clusters, and deepens only the top- K survivors. ADHD differs from Chain-of-Thought along three load-bearing axes: branches are isolated rather than shared, branching is driven by vantage-point reframing rather than next-step variation, and the generator/critic split is enforced mechanically separate LLM calls with opposite system prompts rather than promised by a single context. Across six open-ended engineering problems judged by an independent LLM-as-judge, ADHD wins 5/6 against a single-shot baseline at the same model, with mean improvements of +5.17 in novelty , +4.17 in breadth , and +7.67 in trap detection on a 0–10 rubric. We argue ADHD is the right inference-time structure for creative, interdisciplinary, and design-shaped tasks where the failure mode is not wrong but obvious . A modern LLM, prompted with "give me a few ways to do X" , will almost always produce the same three answers a senior practitioner would. This is not a bug at the token level — those are the high-probability completions — but it is a failure at the task level whenever the user's purpose is to escape the high-probability answer. We call this failure mode premature convergence : the model evaluates as it generates, the early tokens anchor the late tokens, and the output is the centroid of the training distribution dressed up as a recommendation. Premature convergence is most costly in exactly the regimes where ideation matters most: architecture decisions, API and SDK design, debugging fuzzy intermittent failures, refactor planning, naming, positioning, and any task whose deliverable is a set of viable options rather than a single answer. In these tasks the textbook answer is often the trap, and the interesting answer lives in what the original divergent-ideation skill calls "the awkward middle, past the first three" . 1 ref-skill Existing inference-time methods address adjacent problems. Chain-of-Thought CoT 2 makes one head reason more slowly along one path, exposing the intermediate steps so the model does not skip them. We propose ADHD : a method that produces such a range by structurally preventing the generator from converging during divergence, and only converging in a separate, posterior critic pass. ADHD borrows the tree structure of ToT but replaces its branching driver next-step search with vantage-point reframing , and replaces ToT's intermingled generator/evaluator with two strictly separated LLM calls. The result, on the evaluations we report below, is a method that wins clearly against a single-shot baseline on novelty, breadth, and trap detection — the dimensions premature convergence destroys. CoT makes one head think slower. ToT makes one head search wider. ADHD makes many heads think differently , in parallel, then has a critic pick. Chain-of-Thought 2 elicits intermediate reasoning by prompting or fine-tuning the model to "think step by step". It is decisively useful on multi-step problems with verifiable answers arithmetic, symbolic reasoning but it is a single linear trace: each step is conditioned on the previous, which is precisely the anchoring dynamic ADHD is designed to break. Tree-of-Thought 3 generalises CoT to a tree of intermediate "thoughts" with explicit search BFS or DFS and an evaluator function that scores partial states. ToT is the closest neighbour of ADHD, and ADHD can be described as a ToT variant. The differences are not cosmetic: i ToT's branches share a single conversational context so anchoring still occurs across steps, ii ToT's branching driver is Multi-Agent Debate 6 has multiple instances critique each other across rounds; this can improve factuality but converges aggressively toward consensus, which is the opposite of what ideation needs. A separate strand of work assigns the model a role — "you are an expert X" — to bias output style or domain knowledge. ADHD's cognitive frames superficially resemble this but differ in intent: frames are not chosen for expertise but for structural distortion . The "10-year-old" frame is not asked to be correct; it is asked to ignore convention . The "speedrunner" frame is not asked to be authoritative; it is asked to look for glitches . Frames are vantage-point operators, not credentials. ADHD operationalises a written skill on divergent ideation 1 that prescribes a divergence/convergence loop with explicit anti-patterns "convergence disguised as divergence", "weird-for-weird's-sake with no convergence", "refusing to commit" . Our contribution is to turn that prose into a mechanically enforceable runtime: separate LLM calls, isolated branches, and scoring-then-deepening rather than scoring-during-generating. ADHD is a two-phase loop with a hard mechanical separation between phases. Given a problem p , we select N frames F 1, …, FN from a library of 15 e.g. Critically, the N calls do not share context. The regulator branch never reads what the speedrunner branch produced. Anchoring is eliminated by construction, not by prompting. The frame library is tagged code , design , general , wild . When codeMode is enabled the default we bias selection toward engineering-relevant tags but always reserve one slot for a wild frame to preserve range. With the pool of N × k ideas in hand, we run three further calls: The final output is the wide set clustered , a 2–4 idea shortlist with the non-obvious-but-viable pick flagged explicitly, the trap list, the deepened sketches with their child ideas, and one wildcard provocation drawn from the highest-novelty leaf. Three invariants are load-bearing. Removing any of them collapses ADHD into a method that already exists. We implement ADHD as a Node/TypeScript library on top of the Claude Agent SDK 8 . The package ships a CLI adhd "<problem " , a programmatic API run opts → RunResult , and a frame library that is extensible in five lines per frame. A default run uses Each phase uses a system prompt tuned for its posture. Divergence prompts begin with "You are in DIVERGENT mode. You are a generator, not a critic" and enumerate constraints JSON only, no prose, no ranking, the first three obvious answers are banned, push past them . The scoring prompt begins with "You are in CONVERGENT mode. You are now the critic" and supplies the rubric. The deepen prompt begins with "You are in FOCUS mode" . These prompts are designed to be self-evidently incompatible, so the model cannot drift between them within a single call. The implementation is roughly 600 lines of TypeScript and is released under MIT licence at github.com/UditAkhourii/adhd https://github.com/UditAkhourii/adhd . The package is published to npm as adhd-agent https://www.npmjs.com/package/adhd-agent and is installable with npm install adhd-agent library or npm install -g adhd-agent CLI binary adhd .We compare ADHD against a single-shot baseline at the same underlying model. The baseline receives a senior-engineer system prompt and the problem statement and is asked to produce a useful answer with approaches, tradeoffs, and a recommendation. This baseline is deliberately strong: it is what an experienced practitioner would actually do at a chat prompt. Six open-ended engineering problems were used, chosen to span systems, distributed systems, UX/reliability, debugging, refactor, and naming: Each pair ADHD output, baseline output is scored by an independent LLM-as-judge call with a skeptical staff engineer system prompt. The judge sees both outputs blinded as A/B in randomised order per problem recorded for de-bias , and scores on five dimensions: breadth range of structurally distinct angles , novelty non-obvious-but-viable ideas , trap detection does it name ideas that look good but aren't, with reasons , actionability does the top pick have a sketch + named risk + first concrete step , and builder usefulness which is more useful to the engineer who actually has to ship . Each dimension is 0–10. The judge then declares an overall winner of A, B, or tie, and writes a one-line summary. To reduce same-model bias, the judge system prompt is explicit about adversarial reading and the rubric. A/B labels are de-anonymised only after all six runs are complete. We acknowledge that LLM-as-judge can favour outputs of similar surface character to its own training distribution; we address this in § 7 discussion . Aggregate results across the six problems mean score per dimension : | Dimension | ADHD | Baseline | Δ | |---|---|---|---| | breadth | 9.00 | 4.83 | +4.17 | | novelty | 7.83 | 2.67 | +5.17 | | trap detection | 9.50 | 1.83 | +7.67 | | actionability | 9.50 | 6.50 | +3.00 | | builder usefulness | 7.67 | 6.83 | +0.83 | Per-problem overall winners: ADHD wins on lru-100ms , rate-limit-leader , fuzzy-bug , monolith-split , and naming-feature-flag . The baseline wins on llm-hang-cli . Final tally: ADHD 5W / 1L / 0T . Full per-problem verdicts and transcripts are committed to the repository as EVALS.md and bench/results.json . The largest delta is trap detection +7.67 . The baseline rarely names ideas that look good but are wrong; ADHD's scoring pass explicitly flags traps with reasons. Two examples from the evaluation runs: The novelty delta +5.17 is driven by the cross-domain frames. The most striking example, on llm-hang-cli , is a first-byte vs chunk-idle dual timer design — distinguishing NEVER CONNECTED, STALLED MID STREAM, and COMPLETED SLOW failure modes — which the baseline did not surface and which is, in our judgement, the correct architecture for streaming LLM clients. It arose from the regulator frame's question "what must be distinguishable in the audit trail?". Similarly, on fuzzy-bug , the biology frame surfaced a "fever-response circuit-breaker" idea that resolves to progressive degradation tiers Opus → Sonnet → Haiku → cached — concrete, shippable, and not in the baseline. The breadth delta +4.17 reflects the cluster pass: the baseline tends to list four or five variations on a single underlying angle, while ADHD surfaces 6–9 structurally different angles per problem. The one loss, on llm-hang-cli , is informative. The judge wrote: "B ADHD explores vastly more creative territory and expertly identifies traps, but A baseline delivers a pragmatic, immediately implementable solution that an engineer can ship today." ADHD scored higher on breadth, novelty, and trap detection on this problem, but lost on builder usefulness — the judge preferred the baseline's tighter, polished, ship-today shape over ADHD's richer but rougher pile. This matches the failure mode we expect. When the problem is well-understood with a known good answer , a single polished answer beats a wide set with the same answer buried in it. ADHD pays its cost in presentation overhead; that cost is worth it precisely when the wide set contains ideas the polished answer missed. On llm-hang-cli the baseline already knew the right answer; on the other five problems it did not. A default ADHD run uses ≈10 LLM calls 5 divergence + 1 score + 1 cluster + 3 deepen versus 1 for the baseline. Wall-clock latency at concurrency 4 is typically 30–90 s. We frame this honestly: ADHD is for decision points, not inner loops. The right mental model is spend US$0.30 to widen a US$50k architecture decision . Same-model judging. Our LLM-as-judge runs on the same model family as the generator. We mitigated with adversarial system prompts and randomised A/B order, but we cannot exclude familiarity bias entirely. A useful follow-up is cross-model judging e.g. judge with a different vendor's model and human ratings on a held-out subset. Small problem set. Six problems is enough to see consistent direction but not enough to make strong quantitative claims. We released the harness so the set can be extended; adding a new problem is a four-line change. Frame library is hand-authored. The 15 frames in the current library reflect our judgement about which vantage points produce distinct outputs on engineering problems. A frame can fail silently — producing paraphrases of another frame's ideas — without the harness catching it. Frame-quality evaluation is future work. Confounded by deepen quality. The actionability delta is partly explained by the deepen pass, which gives ADHD a structural advantage sketch + risk + first step that the baseline's free-form prose does not enforce. A fairer ablation would equip the baseline with the same output schema; we expect ADHD's lead to shrink on this dimension but not on breadth, novelty, or trap detection. Domain. All six problems are engineering-shaped. The frame library is biased toward engineering when codeMode is enabled. Whether ADHD's wins generalise to product strategy, scientific brainstorming, or pure creative writing is plausible but not demonstrated here. ADHD is the right tool when i the problem is open-ended, ii the cost of the obvious answer being wrong is high, iii the user cannot articulate a ground truth in advance, and iv breadth and trap detection are worth a 5–10× LLM-call premium. It is the wrong tool for lookup questions, bug fixes with a known root cause, and any task where the answer is one search query away. The one-sentence test we propose: if a junior would Google it and find the answer, baseline wins; if a senior would say "hm, let me think about this differently for a minute", ADHD is the moment that replaces. The most interesting application surface is not the standalone CLI but as a callable subroutine inside larger coding agents at decision points. A planning agent at a branch point with high uncertainty, a code-review agent asked "what could go wrong here", a debugging agent stuck after three patches, and a test-generation agent searching for adversarial inputs all benefit from a pause-and-widen step before committing to the next move. The library API run {...} is designed for this. We have argued that LLM coding agents systematically converge prematurely on open-ended ideation tasks, and that this failure is structural rather than capability-bounded. We presented ADHD, an inference-time method that prevents convergence during a divergence phase by running N isolated parallel branches under cognitive-frame distortions, and converges in a separate critic pass that scores, clusters, and deepens only the survivors. ADHD differs from existing tree-of-thought methods along three load-bearing axes: branch isolation, frame-based branching, and mechanical generator–critic separation. On six open-ended engineering problems, ADHD wins 5/6 against a single-shot baseline at the same model, with the largest gains concentrated in trap detection, novelty, and breadth — the dimensions premature convergence destroys. The implementation is small, open-source, and intended to be used as a subroutine inside larger agents at decision points where the cost of the obvious answer is high. The full release is available at github.com/UditAkhourii/adhd https://github.com/UditAkhourii/adhd . SKILL.md at the project repository.