Ai2 ships Tmax-27B terminal agent

wpnews.pro

cd /news/artificial-intelligence/ai2-ships-tmax-27b-terminal-agent · home › topics › artificial-intelligence › article

[ARTICLE · art-37478] src=runagentrun.co.uk ↗ pub=2026-06-24T00:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

Ai2 ships Tmax-27B terminal agent

Ai2 released Tmax-27B on 23 June 2026, an open-weight terminal-agent model built on Qwen3.6-27B that scores 43% on Terminal Bench 2.0 and 69% on TB Lite. The dense 27B model outperforms the sparse 397B model on coding benchmarks like SWE-bench Verified (77.2% vs 76.2%) and SkillsBench (48.2% vs 30.0%), offering a practical local alternative for small teams despite requiring quantization for consumer GPUs.

read3 min views1 publishedJun 24, 2026

Ai2 ships Tmax-27B terminal agent — Image: Runagentrun (auto-discovered)

Ai2 released Tmax-27B on 23 June 2026, an open-weight terminal-agent model built on Qwen3.6-27B. The point of the release is narrow and useful: it works inside a shell, edits files, runs tests and completes real developer tasks in a container. On Terminal Bench 2.0 — an agentic benchmark where the model navigates a Linux box and finishes a job end-to-end — it scores about 43%. On TB Lite, it hits roughly 69%.

The release matters because the underlying base is dense, not a mixture-of-experts — a model that only uses some of its weights on each pass. Every parameter is active on every forward pass. The practical effect, according to detailed write-ups, is that this 27B checkpoint beats Qwen3.5-397B-A17B — a sparse model with nearly fifteen times more parameters — on the coding benchmarks developers actually use.

43%on Terminal Bench 2.0 — a 27B dense terminal agent competitive with much larger models.

What dense buys you #

The headline numbers for the base Qwen3.6-27B:

SWE-bench Verified— 77.2% versus 76.2% for the 397B sparse model** Terminal-Bench 2.0**— 59.3% versus 52.5% for the sparse model** SkillsBench**— 48.2% versus 30.0% for the sparse model

That 18-point SkillsBench gap matters most — it measures messy coding work that mirrors what real teams ship every day. One forum participant running both put it plainly: the larger sparse model can follow instructions that already correctly identify what should be done, but it can’t come up with a good plan on its own for a non-trivial task

. The smaller dense model finished real jobs faster because it made fewer mistakes.

Tmax takes that base and applies a training run by Ai2, focused on terminal work. The result is a model that gets shell navigation, edits and test runs right more often than the base alone — with the trade-off that the headline Terminal Bench score sits lower because the harness and task distribution differ.

The hardware catch #

Twenty-seven billion parameters is too big to casually run. At full precision the model needs around 54GB of memory — more than any single consumer card can hold. A compressed version fits one.

Quantisation — reducing the precision of each weight so the model takes up less memory — shrinks it enough to fit a 24GB card with room left for working memory. The community has been testing compressed versions on small hardware; the throughput numbers and the formats that work on a single card live in the box below.

For a UK small team, the trade is straightforward: slower than a $20-a-month Claude or ChatGPT seat, but no per-token bill, no data leaving the building, and the model improves as your hardware does.

What to do with this #

If you are a UK small team running a local model on a single consumer card: Try the compressed Qwen3.6-27B base first. Tmax is built on it and the base is broadly available now; a compressed version fits a 24GB card. SeeQwen 3.6 Might Be the New Local Default for a 24GB GPU.Watch for Tmax-specific compressed versions. Ai2 has shipped open weights; community-built compressed versions (GGUF, MLX — the standard formats for running open models on a single card) typically follow within days. TheNVIDIA Spark forum threadtracks what runs on small hardware.Set realistic expectations. A local 27B will not feel as snappy as Claude or ChatGPT. It will run 24/7 without a subscription and keep code and prompts inside your building.Use it for shell work, not chat. Tmax is trained for terminal-style agentic tasks. For chat, summarisation and short Q&A, the free tiers inFree AI Tiers Got Goodremain faster and cheaper.

If you do not yet own a 24GB card, this release is not the reason to buy one — see our business assistant for under £50 a month for a cheaper route. If you already have one, Tmax-27B is the strongest open terminal agent you can run without a cloud bill.

Sources & quotes #

Every quotation in this article is verbatim from a named source — click any 1 to see where it came from. It's part of how we keep an AI-run newsroom honest. How we verify →

source & further reading

runagentrun.co.uk — original article Sage Router: one endpoint, every model A business assistant for under £50 a month Wave power joins the AI energy race

~/api · this article 200

$curl api.wpnews.pro/v1/news/ai2-ships-tmax-27b-termi…

Read original on runagentrun.co.uk → www.runagentrun.co.uk/articles/ai2-ships-tmax-27…

mentioned entities

Ai2

Tmax-27B

Qwen3.6-27B

Terminal Bench 2.0

SWE-bench Verified

SkillsBench

NVIDIA

metadata

slugai2-ships-tmax-27b-terminal-agent

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalrunagentrun.co.uk

navigation

← prevBuild Cross-Language Multi-Agent…

next →Creditors in aggressive push to …

── more in #artificial-intelligence 4 stories · sorted by recency

letsdatascience.com · 24 Jun · #artificial-intelligence

Movate and MelodyArc Announce Applied AI Partnership

dev.to · 24 Jun · #artificial-intelligence

How to Integrate Zerodha Kite Connect with MCP Servers

supabase.com · 24 Jun · #artificial-intelligence

The State of Startups 2026 – Survey results

letsdatascience.com · 24 Jun · #artificial-intelligence

Majesco highlights AI reshaping insurance operating models

── more on @ai2 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 22 Jun · #large-language-models

MCP vs Skills: Why Skills Save Context Tokens

wpnews · 22 Jun · #artificial-intelligence

Value for Money Is All You Need

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required