{"slug": "subq-a-sub-quadratic-llm-built-for-multi-million-token-reasoning", "title": "SubQ – a sub-quadratic LLM built for multi-million token reasoning", "summary": "Subquadratic launched SubQ, a sub-quadratic sparse-attention LLM with a 12-million-token context window, enabling multi-million token reasoning at linear cost. The model reduces attention compute by nearly 1,000× at 12M tokens and outperforms GPT-5.5 and Opus 4.8 on benchmarks like LiveCodeBench v6 and GPQA Diamond.", "body_md": "### API\n\nFor developers and teamsThe full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.\n\n- → 12M token context window\n- → Streaming + tool use\n- → OpenAI-compatible endpoints\n\nSubQ is a sub-quadratic LLM built for *multi-million token reasoning*, allowing agents to work across full repositories, long histories, and persistent state without quality loss.\n\nUse Cases\n\nReason across millions of tokens in one prompt: entire repos, whole artifacts, and long-running agent state, with room to spare at a *fraction of the cost*.\n\n~ Approximate token counts.\n\nArchitecture\n\nSubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.\n\nSubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.\n\nBenchmarks\n\nSubQ has near-perfect performance on single-fact retrieval and multi-task retrieval, both at scale.\n\nSubQ balances long-context retrieval without compromising on reasoning and knowledge.\n\n| Benchmark | SubQ 1.1 Small | GPT-5.5 | Opus 4.8 | Sonnet 4.6 | GPT-5.4-mini | GPT-5.4-nano | Haiku 4.5 |\n|---|---|---|---|---|---|---|---|\nGraduate-level science GPQA Diamond · pass@1 | 85.4 | 93.2 | 92 | 87.5 | 87.5 | 81.7 | 67.2 |\nAgentic finance AutomationBench | 13% | 18% | 16% | 8% | 0% | n/r | 3% |\nCompetitive programming LiveCodeBench v6 · pass@4 | 89.7 | 92 | 92.2 | 88.9 | 78.6 | 78.2 | 69.7 |\n\nSubQ uses **64.5x** less compute than dense attention, and is **56×** faster than FlashAttention-2 at 1M-token context.\n\nProducts\n\nThe full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.\n\nThe long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.\n\nAbout\n\n**Subquadratic** is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.\n\nBuilt by researchers from\n\nEarly Access\n\nJoin the private preview.", "url": "https://wpnews.pro/news/subq-a-sub-quadratic-llm-built-for-multi-million-token-reasoning", "canonical_source": "https://subq.ai/", "published_at": "2026-06-18 21:52:58+00:00", "updated_at": "2026-06-18 22:00:35.239900+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "ai-research", "ai-products"], "entities": ["Subquadratic", "SubQ", "GPT-5.5", "Opus 4.8", "Sonnet 4.6", "GPT-5.4-mini", "GPT-5.4-nano", "Haiku 4.5"], "alternates": {"html": "https://wpnews.pro/news/subq-a-sub-quadratic-llm-built-for-multi-million-token-reasoning", "markdown": "https://wpnews.pro/news/subq-a-sub-quadratic-llm-built-for-multi-million-token-reasoning.md", "text": "https://wpnews.pro/news/subq-a-sub-quadratic-llm-built-for-multi-million-token-reasoning.txt", "jsonld": "https://wpnews.pro/news/subq-a-sub-quadratic-llm-built-for-multi-million-token-reasoning.jsonld"}}