cd /news/large-language-models/subq-a-sub-quadratic-llm-built-for-m… · home topics large-language-models article
[ARTICLE · art-33272] src=subq.ai ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

SubQ – a sub-quadratic LLM built for multi-million token reasoning

Subquadratic launched SubQ, a sub-quadratic sparse-attention LLM with a 12-million-token context window, enabling multi-million token reasoning at linear cost. The model reduces attention compute by nearly 1,000× at 12M tokens and outperforms GPT-5.5 and Opus 4.8 on benchmarks like LiveCodeBench v6 and GPQA Diamond.

read2 min views1 publishedJun 18, 2026

API

For developers and teamsThe full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

  • → 12M token context window

  • → Streaming + tool use

  • → OpenAI-compatible endpoints SubQ is a sub-quadratic LLM built for multi-million token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss.

Use Cases Reason across millions of tokens in one prompt: entire repos, whole artifacts, and long-running agent state, with room to spare at a fraction of the cost.

~ Approximate token counts.

Architecture

SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.

SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.

Benchmarks

SubQ has near-perfect performance on single-fact retrieval and multi-task retrieval, both at scale.

SubQ balances long-context retrieval without compromising on reasoning and knowledge.

Benchmark SubQ 1.1 Small GPT-5.5 Opus 4.8 Sonnet 4.6 GPT-5.4-mini GPT-5.4-nano Haiku 4.5
Graduate-level science GPQA Diamond · pass@1 85.4 93.2 92 87.5 87.5 81.7 67.2
Agentic finance AutomationBench 13% 18% 16% 8% 0% n/r 3%
Competitive programming LiveCodeBench v6 · pass@4 89.7 92 92.2 88.9 78.6 78.2 69.7

SubQ uses 64.5x less compute than dense attention, and is 56× faster than FlashAttention-2 at 1M-token context.

Products

The full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

The long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.

About

Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.

Built by researchers from

Early Access

Join the private preview.

── more in #large-language-models 4 stories · sorted by recency
── more on @subquadratic 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/subq-a-sub-quadratic…] indexed:0 read:2min 2026-06-18 ·