# SubQ Releases Its 1.1 Small Model Card as Dangel and Whedon Try to Prove Long Context Can Beat RAG

> Source: <https://runtimewire.com/article/subq-1-1-small-long-context-model-card>
> Published: 2026-06-16 15:38:40+00:00

[Subquadratic](https://subq.ai/?ref=runtimewire) co-founders Justin Dangel and Alexander Whedon are moving SubQ from a provocative May launch claim into a more testable phase: the company on June 16 [released the model card and technical report](https://subq.ai/subq-1-1-small-technical-report?ref=runtimewire) for SubQ 1.1 Small, a long-context model built around what it calls Subquadratic Sparse Attention.

The release matters because SubQ is not pitching a better chatbot. Dangel and Whedon are pitching a different cost structure for enterprise AI: models that can reason across complete codebases, contract sets, financial filings and long-running agent state without first chopping the work into retrieval pipelines and summaries.

That is an architecture bet, not a feature launch. It also fits the founders. Dangel is a repeat operator whose prior companies span insurance and home-based health care, including Ready, the on-demand health care startup that raised a $54 million Series C in 2020. Whedon, Subquadratic's CTO, previously worked as a software engineer at Meta and led generative AI work at Tribe AI, according to prior public reporting. The pairing explains the company's unusual posture: a founder-led infrastructure company selling a research claim directly into the pain of operators who have spent the last two years gluing retrieval, vector search and agents around models that still cannot reliably hold the full problem in memory.

SubQ says 1.1 Small is the second iteration of its SSA model at its smallest size. It is not generally available. Subquadratic says it is deploying the model with select design partners and plans a broader lineup of 2 million to 12 million token models later in 2026. Access remains invite-only via a request form on its [homepage](https://subq.ai/?ref=runtimewire).

### The benchmark claim

The headline number is retrieval at scale. In the June 16 release, SubQ says SubQ 1.1 Small scored 100% on needle-in-a-haystack retrieval at 1 million and 2 million tokens, then 98% at 6 million and 12 million tokens. On Nvidia's RULER benchmark at 128,000 tokens, the company reports 99.12%.

SubQ also reports 85.4% pass@1 on GPQA Diamond, 89.7% pass@4 on LiveCodeBench v6 and 13% on AutomationBench Finance. Those are company-selected benchmarks, but they are not limited to synthetic retrieval. SubQ is trying to show that the model did not trade away general reasoning and coding performance to achieve a giant context window.

The efficiency claim is the bigger swing. In the [technical report](https://subq.ai/docs/subq-1-1-small-model-card.pdf?ref=runtimewire), SubQ says SSA reduces attention FLOPs by 64.5x at a 1 million token context window compared with dense attention, and the June 16 post says SubQ 1.1 Small runs 56x faster than FlashAttention-2 on a single attention layer at that length. At 12 million tokens, SubQ says the model attends to only 0.13% of token pairs, which the company frames as nearly a 1,000x reduction in attention relationships.

SubQ says [Appen](https://www.appen.com/whitepapers/subquadratic-preview-model-benchmark-evaluation?ref=runtimewire) independently evaluated its preview models across long-context retrieval, code generation, business-workflow automation and graduate-level reasoning benchmarks. Appen's own benchmark brief says it was engaged by Subquadratic and reports 100% retrieval accuracy and exact match at the 1 million and 2 million token tiers, plus 98% exact match at 6 million and 12 million tokens on the tested variant. That is useful third-party validation, but it is not the same thing as broad public replication by outside researchers using an openly available model.

### What SubQ says it changed

Standard Transformer attention is expensive because each token attends across every other token. As context length grows, the attention cost grows quadratically. That is the math behind much of the modern AI stack: chunk the corpus, retrieve a few passages, summarize aggressively, then hope the model saw the right pieces.

SubQ's claim is that SSA changes the scaling law by making attention content-dependent and sparse. Instead of processing every token relationship, SSA learns which relationships matter and routes attention there. In the report's framing, the question is not whether retrieval pipelines are useful. It is whether some of the scaffolding around them exists because the model architecture could not afford to look at the whole artifact directly.

The training recipe is part of the story. SubQ says it started with an existing open-weight frontier model, replaced dense attention with SSA, then extended context in stages from 262,000 tokens to 512,000, 1 million and 2 million tokens. The company says it followed that with roughly one trillion tokens of continued pretraining on naturally long artifacts, including books, documents and repository-scale code. The report also says the team ran more than 100 experiments across six to seven model generations.

That detail is more important than the benchmark table. If the architecture actually lowers the cost of million-token experiments, it does not just make inference cheaper. It lets a small company iterate on long-context training recipes that would otherwise be too expensive to test repeatedly. That is the founder bet underneath the release: not only that SubQ has one model with a large context window, but that the company can compound research faster at context lengths where dense attention becomes uneconomic.

### The product wedge is code and enterprise documents

Subquadratic is aiming SubQ at workloads where the pain is obvious: software engineering, legal review, financial diligence and enterprise knowledge work. The company gives examples on its [homepage](https://subq.ai/?ref=runtimewire) that translate the abstract token counts into operator terms: the Python 3.13 standard library at about 5.1 million tokens, and six months of React pull requests at about 7.5 million tokens.

The product surface is still early. SubQ lists a full-context API with a 12 million token context window, streaming, tool use and OpenAI-compatible endpoints. It also lists SubQ Code, a long-context layer for coding agents that plugs into Claude Code, Codex and Cursor, maps codebases, gathers context and redirects expensive model turns. Both are access-gated.

That go-to-market choice is rational. The first buyers for a 12 million token context window are not casual developers. They are teams with painfully large artifacts and enough budget to compare full-context reasoning against existing retrieval systems. But it also means the public cannot yet judge the gap between model-card performance and production behavior.

Subquadratic's funding history raises the stakes. [VentureBeat reported in May](https://venturebeat.com/technology/miami-startup-subquadratic-claims-1-000x-ai-efficiency-gain-with-subq-model-researchers-demand-independent-proof?ref=runtimewire) that the Miami startup had raised $29 million in seed funding from investors including Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar and early investors in Anthropic, OpenAI, Stripe and Brex. The same report cited a reported $500 million valuation. That is a large price for a seed-stage architecture company, and it makes the burden of proof higher, not lower.

### The unanswered question is deployment

The strongest version of the SubQ story is straightforward: Dangel and Whedon are trying to replace a meaningful slice of context-management infrastructure with learned attention that can afford to see the full artifact. If that works, it changes how teams build coding agents, diligence tools, compliance systems and long-memory enterprise agents.

The weaker version is also familiar: a long-context company with striking benchmark charts, limited access, no public pricing, unnamed design partners and no broad independent replication yet. SubQ's June 16 technical report narrows that gap by putting more detail in public, and Appen's evaluation gives the company more than a self-published table. But the commercial test is still ahead.

For now, SubQ 1.1 Small should be read as a proof package, not a finished platform announcement. The numbers are strong enough that serious AI infrastructure teams will pay attention. The missing evidence is whether design partners can use SubQ against real repositories, contracts and financial records with fewer retrieval workarounds, lower latency and predictable cost. That is where Dangel and Whedon's architecture claim either becomes a company or stays a benchmark result.
