SubQ – a sub-quadratic LLM built for multi-million token reasoning

wpnews.pro

cd /news/large-language-models/subq-a-sub-quadratic-llm-built-for-m… · home › topics › large-language-models › article

[ARTICLE · art-33272] src=subq.ai ↗ pub=2026-06-18T21:52Z topic=large-language-models verified=true sentiment=↑ positive

SubQ – a sub-quadratic LLM built for multi-million token reasoning

Subquadratic launched SubQ, a sub-quadratic sparse-attention LLM with a 12-million-token context window, enabling multi-million token reasoning at linear cost. The model reduces attention compute by nearly 1,000× at 12M tokens and outperforms GPT-5.5 and Opus 4.8 on benchmarks like LiveCodeBench v6 and GPQA Diamond.

read2 min views31 publishedJun 18, 2026

API

For developers and teamsThe full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

→ 12M token context window
→ Streaming + tool use
→ OpenAI-compatible endpoints SubQ is a sub-quadratic LLM built for multi-million token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss.

Use Cases Reason across millions of tokens in one prompt: entire repos, whole artifacts, and long-running agent state, with room to spare at a fraction of the cost.

~ Approximate token counts.

Architecture

SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.

SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.

Benchmarks

SubQ has near-perfect performance on single-fact retrieval and multi-task retrieval, both at scale.

SubQ balances long-context retrieval without compromising on reasoning and knowledge.

Benchmark	SubQ 1.1 Small	GPT-5.5	Opus 4.8	Sonnet 4.6	GPT-5.4-mini	GPT-5.4-nano	Haiku 4.5
Graduate-level science GPQA Diamond · pass@1	85.4	93.2	92	87.5	87.5	81.7	67.2
Agentic finance AutomationBench	13%	18%	16%	8%	0%	n/r	3%
Competitive programming LiveCodeBench v6 · pass@4	89.7	92	92.2	88.9	78.6	78.2	69.7

SubQ uses 64.5x less compute than dense attention, and is 56× faster than FlashAttention-2 at 1M-token context.

Products

The full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

The long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.

About

Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.

Built by researchers from

Early Access

Join the private preview.

source & further reading

subq.ai — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/subq-a-sub-quadratic-llm…

Read original on subq.ai → subq.ai/

mentioned entities

Subquadratic

SubQ

GPT-5.5

Opus 4.8

Sonnet 4.6

GPT-5.4-mini

GPT-5.4-nano

Haiku 4.5

metadata

slugsubq-a-sub-quadratic-llm-built-for-multi-million-token-reasoning

topic#large-language-models

secondary3 topics

sentimentpositive

canonicalsubq.ai

navigation

← prev$102 million offer emerges for O…

next →Stanford, UC Berkeley rank among…

── more in #large-language-models 4 stories · sorted by recency

cryptobriefing.com · 3 Aug · #large-language-models

DeepSeek’s AI model identified as cheapest among leading models, and the ripple effects are hitting markets

marktechpost.com · 3 Aug · #large-language-models

Onton Releases Ontology 1: A Neurosymbolic Search Model That is 2.7x More Accurate than the World’s Best E-commerce Search Engines

byteiota.com · 2 Aug · #large-language-models

GLM-5.2 Beats GPT-5.5 on SWE-bench — And You Can Self-Host It

thomsonreuters.com · 31 Jul · #large-language-models

Thomson Reuters built its own AI model that now ranks among the best

── more on @subquadratic 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required