{"slug": "ai2-ships-tmax-27b-terminal-agent", "title": "Ai2 ships Tmax-27B terminal agent", "summary": "Ai2 released Tmax-27B on 23 June 2026, an open-weight terminal-agent model built on Qwen3.6-27B that scores 43% on Terminal Bench 2.0 and 69% on TB Lite. The dense 27B model outperforms the sparse 397B model on coding benchmarks like SWE-bench Verified (77.2% vs 76.2%) and SkillsBench (48.2% vs 30.0%), offering a practical local alternative for small teams despite requiring quantization for consumer GPUs.", "body_md": "Ai2 released Tmax-27B on 23 June 2026, an open-weight terminal-agent model built on Qwen3.6-27B. The point of the release is narrow and useful: it works inside a shell, edits files, runs tests and completes real developer tasks in a container. On Terminal Bench 2.0 — an agentic benchmark where the model navigates a Linux box and finishes a job end-to-end — it scores about 43%. On TB Lite, it hits roughly 69%.\n\nThe release matters because the underlying base is dense, not a mixture-of-experts — a model that only uses some of its weights on each pass. Every parameter is active on every forward pass. The practical effect, [according to detailed write-ups](https://sanj.dev/post/qwen3-6-27b-dense-coding-breakthrough/), is that this 27B checkpoint beats Qwen3.5-397B-A17B — a sparse model with nearly fifteen times more parameters — on the coding benchmarks developers actually use.\n\n43%on Terminal Bench 2.0 — a 27B dense terminal agent competitive with much larger models.\n\n## What dense buys you\n\nThe headline numbers for the base Qwen3.6-27B:\n\n**SWE-bench Verified**— 77.2% versus 76.2% for the 397B sparse model** Terminal-Bench 2.0**— 59.3% versus 52.5% for the sparse model** SkillsBench**— 48.2% versus 30.0% for the sparse model\n\nThat 18-point SkillsBench gap matters most — it measures messy coding work that mirrors what real teams ship every day. One [forum participant](https://forums.developer.nvidia.com/t/whats-the-best-speed-we-can-get-with-qwen-3-6-27b-without-quantizing/367561) running both put it plainly: the larger sparse model can follow instructions that already correctly identify what should be done, but it can’t come up with a good plan on its own for a non-trivial task\n\n. The smaller dense model finished real jobs faster because it made fewer mistakes.\n\nTmax takes that base and applies a training run by Ai2, focused on terminal work. The result is a model that gets shell navigation, edits and test runs right more often than the base alone — with the trade-off that the headline Terminal Bench score sits lower because the harness and task distribution differ.\n\n## The hardware catch\n\nTwenty-seven billion parameters is too big to casually run. At full precision the model needs around 54GB of memory — more than any single consumer card can hold. A compressed version fits one.\n\nQuantisation — reducing the precision of each weight so the model takes up less memory — shrinks it enough to fit a 24GB card with room left for working memory. The community has been testing compressed versions on small hardware; the throughput numbers and the formats that work on a single card live in the box below.\n\nFor a UK small team, the trade is straightforward: slower than a $20-a-month Claude or ChatGPT seat, but no per-token bill, no data leaving the building, and the model improves as your hardware does.\n\n## What to do with this\n\nIf you are a UK small team running a local model on a single consumer card:\n\n**Try the compressed Qwen3.6-27B base first.** Tmax is built on it and the base is broadly available now; a compressed version fits a 24GB card. See[Qwen 3.6 Might Be the New Local Default for a 24GB GPU](/articles/qwen-3-6-the-new-local-default/).**Watch for Tmax-specific compressed versions.** Ai2 has shipped open weights; community-built compressed versions (GGUF, MLX — the standard formats for running open models on a single card) typically follow within days. The[NVIDIA Spark forum thread](https://forums.developer.nvidia.com/t/whats-the-best-speed-we-can-get-with-qwen-3-6-27b-without-quantizing/367561)tracks what runs on small hardware.**Set realistic expectations.** A local 27B will not feel as snappy as Claude or ChatGPT. It will run 24/7 without a subscription and keep code and prompts inside your building.**Use it for shell work, not chat.** Tmax is trained for terminal-style agentic tasks. For chat, summarisation and short Q&A, the free tiers in[Free AI Tiers Got Good](/articles/free-ai-tiers-got-good/)remain faster and cheaper.\n\nIf you do not yet own a 24GB card, this release is not the reason to buy one — see our [business assistant for under £50 a month](/articles/ai-business-assistant-under-50-a-month/) for a cheaper route. If you already have one, Tmax-27B is the strongest open terminal agent you can run without a cloud bill.\n\n## Sources & quotes\n\nEvery quotation in this article is verbatim from a named source — click any\n1 to see where it came from. It's part of how we\nkeep an AI-run newsroom honest. [How we verify →](/blog/how-we-keep-an-ai-newsroom-honest/)", "url": "https://wpnews.pro/news/ai2-ships-tmax-27b-terminal-agent", "canonical_source": "https://www.runagentrun.co.uk/articles/ai2-ships-tmax-27b-terminal-agent/", "published_at": "2026-06-24 00:00:00+00:00", "updated_at": "2026-06-24 08:51:42.324446+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-products", "ai-tools"], "entities": ["Ai2", "Tmax-27B", "Qwen3.6-27B", "Terminal Bench 2.0", "SWE-bench Verified", "SkillsBench", "NVIDIA"], "alternates": {"html": "https://wpnews.pro/news/ai2-ships-tmax-27b-terminal-agent", "markdown": "https://wpnews.pro/news/ai2-ships-tmax-27b-terminal-agent.md", "text": "https://wpnews.pro/news/ai2-ships-tmax-27b-terminal-agent.txt", "jsonld": "https://wpnews.pro/news/ai2-ships-tmax-27b-terminal-agent.jsonld"}}