{"slug": "tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet", "title": "TAI #211: GPT-5.6 is here, but most people cannot use it yet", "summary": "OpenAI announced GPT-5.6 Sol, Terra, and Luna on June 26, but access is limited to a small group of trusted partners via Codex and the API at the request of the U.S. government. The model family features aggressive pricing, with Sol at $5/$30 per million tokens and Terra matching GPT-5.5 at half the cost, positioning it as a competitor to Chinese open-weight models like GLM-5.2. The launch highlights tensions between frontier model capability and restricted availability, with OpenAI emphasizing safety testing and strategic sensitivity.", "body_md": "A quick note before the news: five of our team are at the AI Engineer World’s Fair in San Francisco this week, where we ran a workshop, “Context Engineering in 2026: Compaction, Memory & Cost.” Cost and long-context efficiency turned out to be the right things to focus on, because both sit close to the center of this week’s biggest story.\n\nOpenAI announced GPT-5.6 this week. The strangest part of the launch is that almost no one can actually use it yet. On June 26, OpenAI previewed GPT-5.6 Sol, Terra, and Luna and promised broad availability “in the coming weeks.” For now, access runs through Codex and the API for a small group of trusted partners, at the request of the U.S. government.\n\nThat puts GPT-5.6 squarely between two stories we have been tracking. Claude Fable showed how a top-scoring frontier model can be announced, benchmarked, and then vanish behind a policy wall. GLM-5.2 showed the opposite: Chinese open-weight models are now good enough, cheap enough, and available enough to win serious developer attention even while the strongest U.S. models stay ahead on paper. GPT-5.6 is the cleanest expression of that tension so far. While the model looks extremely strong, the access story is awkward.\n\nThe new lineup also gives OpenAI something Anthropic has had for a while: names with character. Claude Sonnet, Opus, Haiku, Fable, and Mythos are far easier to hold in your head than an endless stream of version numbers, and Sol, Terra, and Luna do the same job. While this sounds cosmetic, developers and operators need a stable mental model for routing work, and a good model family tells you what to reach for without making you read a benchmark spreadsheet every morning.\n\nSol is the flagship for the hardest work; Terra is the balanced, everyday production model that OpenAI says matches GPT-5.5 at half the price; and Luna is the fast, cheap option for high-volume tasks.\n\nThe pricing is more aggressive than I expected for a flagship OpenAI release. Sol is $5 per million input tokens and $30 per million output tokens. Terra is $2.50 and $15. Luna is $1 and $6. OpenAI also says Sol will run on Cerebras at up to 750 tokens per second for select customers in July. Sol is the premium tier, and against GLM-5.2 at $1.40 input and $4.40 output, it is genuinely expensive per token, so I would not pitch it as the open-weight cost competitor. The more interesting matchup is Terra versus GLM-5.2. OpenAI says Terra matches GPT-5.5 at half the price. Overall, the family’s token-efficiency claims are striking: Sol reportedly matches Anthropic’s Mythos Preview on ExploitBench while using roughly a third of the output tokens, and improves GeneBench biology workflows while spending fewer tokens than GPT-5.5. If that efficiency carries over to Terra, I think Terra has a real chance to be both more capable and lower-cost per completed task than GLM-5.2, even while it loses on the raw per-token price. For everyday production work, that comparison is far more essential than anything happening at the Sol tier.\n\nBut that is a big “if,” and for now, it is mostly OpenAI’s word. Independent benchmarking is thin because most evaluators do not have normal access. OpenAI says Sol sets a new state of the art on Terminal-Bench 2.1, improves biology workflows, and advances cyber evaluations. The system card reports more than 700,000 A100-equivalent GPU hours spent on automated jailbreak testing, stronger real-time safeguards for cyber and biology misuse, and a conclusion that Sol still does not cross OpenAI’s Cyber Critical threshold. This is an unusually safety-heavy launch because the model is being treated as strategically sensitive from day one.\n\nOn OpenAI’s own numbers, GPT-5.6 is a real step up across the areas that actually matter: agentic coding, long-horizon command-line work, biology, cyber defense and vulnerability research, and tool-heavy production tasks. I would still want Artificial Analysis, Vals, LiveBench, Arena, and real customer evals to run it through the usual wringer. But the breadth of the reported gains is why this does not read like a routine monthly point release.\n\nMETR’s predeployment evaluation adds a useful note of caution. It tested GPT-5.6 Sol on its Time Horizon 1.1 software-task suite, and the result swung hard on how it handled detected cheating. Counting cheating attempts as failures produced an estimate of around 11.3 hours. Treating those same attempts as legitimate successes pushed the estimate past the reliable range of the benchmark. That does not mean Sol is weak. It means frontier agents are now strange enough that the evaluation method becomes part of the result. The stronger these models get, the more we need evals that measure real task completion without rewarding shortcuts or hidden rule-breaking.\n\nThe product direction is unmistakably agentic. Sol adds a new “max” reasoning effort and an “ultra” mode that spins up subagents for complex work, and Codex is one of the first surfaces for the preview. This matches my own experience with Codex over the past few months: the interface can still feel technical, but subagents are genuinely useful for white-collar work. I lean on them for parallel research, source checking, criticism, testing, and revision loops. The real shift is that frontier reasoning is being packaged less like a single chatbot reply and more like a managed work system.\n\nThis is where the access question gets sharper. OpenAI says it believes in broad access and does not want a government-first process to become the long-term default, and commercially, that position makes sense. A narrow circle of approved users is an open invitation to Chinese labs, European labs, open-weight providers, and sovereign AI stacks. Most companies and governments want tools they can count on across borders, contracts, and multi-year roadmaps.\n\nAt the same time, I understand why this particular preview was handled differently. GPT-5.6 looks strongest in exactly the areas governments worry about: cyber, long-horizon coding, science workflows, and agentic tool use. A model that helps defenders find vulnerabilities can help attackers find them too. A model that coordinates subagents in Codex can also coordinate longer autonomous workflows in less-friendly hands. The hard policy problem is how to restrict dangerous uses without making every global customer feel like a second-class user of American AI.\n\nThe near-term result may be more verticalization by U.S. labs. If the strongest model cannot ship broadly, it can still be put to work internally. That is what makes OpenAI’s new Jalapeno chip with Broadcom more than a side story. OpenAI says Jalapeno, its first custom inference chip, went from initial design to manufacturing tape-out in nine months, possibly the fastest ASIC cycle ever in advanced semiconductors, and that its own models accelerated parts of the design and optimization. Engineering samples are already running real workloads, including GPT-5.3-Codex-Spark. OpenAI frames it plainly: the same models it serves to users are helping build the infrastructure that will run the next models.\n\nThis ties directly to a point I made on X last week. Most benchmarks are saturated, or will be soon. The next hill to climb is genuine scientific and R&D progress, because a model cannot fake the creation of new knowledge. Chip design, model-architecture design, biology, materials, robotics, and automated research loops are where the real evidence will show up. If a frontier model helps design better inference chips, those chips lower the cost of the next model, enabling more products, safeguards, and infrastructure. That is a compounding loop, and Jalapeno is the first public sign of it turning.\n\nThis is the quiet risk inside the GPT-5.6 non-release. A model withheld from the public keeps working. It can write code, find vulnerabilities, design chips, automate research, and improve its own deployment stack, all without a public launch. U.S. firms with privileged access could build a long lead in AI-native products before the rest of the world touches the same capability. That may be the right short-term safety trade. It also concentrates the economic upside inside the firms and countries that already have a seat at the table.\n\nI still expect OpenAI to push for a broad release. Handing the rest of the world’s AI market to China, open weights, or sovereign alternatives would be strategically incoherent over any real time horizon. The American AI stack is valuable precisely because it can become the default platform for developers, enterprises, governments, schools, and labs everywhere. If the strongest U.S. models become politically fragile, the rest of the world will hedge, and that hedge will increasingly look like GLM, Qwen, DeepSeek, Mistral, local clouds, and open-weight deployment.\n\nThe practical read on GPT-5.6 is this. Sol looks extremely strong, and the Sol/Terra/Luna ladder is a much clearer way to think about OpenAI’s lineup. But the launch is also a controlled experiment in who gets access to frontier intelligence, and that access layer is shaping up to be one of the most important battlegrounds in AI.\n\nTerra, rather than Sol, may be the family’s real answer to GLM-5.2 on price-performance if the family’s efficiency and success-rate advantages survive contact with real workflows. So, for everyday production work, the matchup worth watching is Terra versus GLM-5.2. Sol takes the headlines, but at $5 input and $30 output, it is the premium tier and priced like one. GLM-5.2 is cheaper on raw tokens at $1.40 and $4.40, yet OpenAI says Terra matches GPT-5.5 at half the price, and the GPT-5.6 family is making strong token-efficiency claims. If that holds once Terra opens up, the default for routine work could swing back toward a hosted US mid-tier model, and it is worth re-running your own evals the week you can reach it.\n\nThe bigger story, however, is underneath the launch. OpenAI’s Jalapeno chip went from design to tape-out in nine months with help from OpenAI’s own models. That is AI compounding on itself, and it reframes the whole access fight. A frontier model, held back from the public, still works around the clock within the few firms that can run it, designing chips, writing code, finding vulnerabilities, and automating research. Restricting access slows everyone else from using the model; it does nothing to slow the lab that already has it. The competitive edge is moving from “who has the best model today” to “who can turn their best model into the next chip, product, and discovery fastest.”\n\nThe same logic scales down to you. The highest-return use of frontier AI right now is rarely a sharper answer to today’s question. It pays far more to point the model at your own infrastructure: the internal tools, research pipelines, review loops, and data workflows that make your next hundred tasks faster and cheaper to run. Three weeks running, the headline model has been pulled offline, surged out of an open-weight lab, or gated behind a government review, and the thread tying all three together is the same. The advantage is shifting to whoever converts frontier access into compounding capability fastest, whether that is a lab using its model to build its own chips or a team using one to build its own tools.\n\n*— **Louie Peters — Towards AI Co-founder and CEO*\n\n1. [OpenAI Previewed GPT-5.6 With Sol, Terra, and Luna](https://openai.com/index/previewing-gpt-5-6-sol/)\n\nOpenAI began a limited preview of the GPT-5.6 series, introducing a three-tier naming system: Sol (flagship), Terra (balanced), and Luna (fast and affordable). All three models exceeded OpenAI’s “High” preparedness threshold for cybersecurity risk, making this the first GPT family where every tier triggered the classification. GPT-5.6 Sol scored 96.7% on OpenAI’s internal cyberattack challenge test and is competitive with Anthropic’s Mythos Preview on ExploitBench while using roughly one-third of the output tokens. OpenAI says the model is heavily hardened against adversarial attacks and intentionally optimized for defensive cybersecurity work over offensive exploits, with safeguards built directly into the core model rather than relying on a separate filter layer. On capability, Sol introduces a “max” reasoning effort and an “ultra” mode that distributes tasks across coordinated subagents. Sol Ultra scored 91.9% on Terminal-Bench 2.1, ahead of Claude Mythos 5 (84.3%) and GPT-5.5 (88.0%). Terra delivers GPT-5.5-competitive performance at 2x lower cost. Luna targets high-volume workloads at OpenAI’s lowest price point. The preview is limited to approximately 20 government-vetted organizations through the API and Codex.\n\n2. [OpenAI and Broadcom Unveiled Jalapeño, OpenAI’s First Inference Chip](https://openai.com/index/openai-broadcom-jalapeno-inference-chip/)\n\nOpenAI and Broadcom unveiled Jalapeño, an inference-only ASIC designed from scratch around OpenAI’s understanding of LLM workloads. The chip was co-developed from initial design to manufacturing tape-out in nine months, which the companies describe as possibly the fastest ASIC development cycle in high-performance semiconductors. OpenAI’s own models were used to accelerate parts of the chip design and optimization process. Engineering samples are running ML workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark. OpenAI says early testing shows performance per watt is substantially better than the current state of the art, with a detailed technical report to follow. Jalapeño is the first step in a multi-generation compute platform with Broadcom handling silicon implementation and networking, and Celestica providing board, rack, and system integration. Initial deployment is targeted for the end of 2026. Broadcom CEO Hock Tan said the companies are enabling gigawatt-scale data centers with Microsoft and other partners.\n\n3. [Anthropic Accused Alibaba of Largest-Ever Claude Distillation Campaign](https://www.cnbc.com/2026/06/24/anthropic-alibaba-distillation-campaign.html)\n\nAnthropic sent a letter to the US Senate Banking Committee 10 accusing operators affiliated with Alibaba and its AI lab of conducting the largest known distillation attack against its Claude models. According to the letter, obtained by CNBC, the campaign ran from April 22 to June 5 and generated 28.8 million exchanges through roughly 25,000 fraudulent accounts, targeting Claude’s software engineering, agentic reasoning, and cybersecurity capabilities. This surpasses Anthropic’s February 2026 disclosure, in which it named DeepSeek, Moonshot AI, and MiniMax as collectively running 16 million exchanges through 24,000 fraudulent accounts. Anthropic stated that the campaign was carried out “illicitly, systematically, and at industrial scale” and occurred after the White House Office of Science and Technology Policy had already warned of industrial-scale foreign distillation in April. Senators Bill Hagerty and Andy Kim are advancing an amendment to defense legislation that would sanction entities found conducting such campaigns. Alibaba has not publicly addressed the specific allegations. The figures in the letter are Anthropic’s claims and have not been independently verified.\n\n4. [Anthropic Introduced Claude Tag](https://www.anthropic.com/news/introducing-claude-tag)\n\nAnthropic launched Claude Tag, a product that embeds Claude into Slack as a persistent, shared AI teammate. Any member of a channel can type @Claude to delegate a task, and Claude breaks it down into stages, works through them using the tools it has access to, and responds in a thread with what it has produced. Unlike individual Claude sessions, Claude Tag is multiplayer: a single Claude identity serves an entire channel, building context over time from the conversations it follows. If ambient mode is enabled, Claude will proactively surface relevant information and follow up on unresolved threads without being prompted. Administrators scope Claude Tag’s access per channel, controlling which tools, data, and codebases each instance can reach. Memories and permissions stay isolated between channels. The feature replaces the existing Claude in the Slack app, with a 30-day migration window before the old app is retired on August 3. Claude Tag is available in beta for Claude Enterprise and Team customers.\n\n5. [Mistral Released Mistral OCR](https://mistral.ai/news/ocr-4/)\n\nMistral AI released OCR 4, a document intelligence model that returns structured representations of documents alongside extracted text. New in this release: paragraph-level bounding boxes, typed block classification (titles, tables, equations, signatures), and per-word and per-page confidence scores. The model supports 170 languages across 10 language groups. In blind human evaluations, independent annotators preferred OCR 4 over every competing system tested, with win rates averaging 72%. It also tops OlmOCRBench with a score of 85.20. OCR 4 integrates with Mistral’s Search Toolkit, an open-source composable search framework announced at the AI Now Summit, providing citation-ready inputs for RAG and enterprise search pipelines. The model is compact enough to deploy in a single container for fully self-hosted environments, addressing data residency and sovereignty requirements. Pricing is $4 per 1,000 pages via the API, dropping to $2 with the Batch-API discount. Available through Mistral’s API, Amazon SageMaker, and Microsoft Foundry.\n\n6. [Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models](https://www.primeintellect.ai/blog/rl-at-1t-scale)\n\nPrime Intellect released prime-rl version 0.6.0, an open framework for asynchronous reinforcement learning that now scales to trillion-parameter MoE models on agentic workloads. The team demonstrated training GLM-5 on software engineering tasks at up to 131K sequence length, achieving sub-5-minute step times with a batch size of 256 rollouts on 28 H200 nodes. The key design decision is disaggregating training from inference: the trainer and inference systems run and scale independently, with only one synchronization point at the policy update. This avoids the idle GPU time caused by long-tail agentic rollouts (some of which can run for hours). On the inference side, optimizations include FP8 precision, wide expert parallelism, prefill/decode disaggregation, KV cache offloading, and router replay. Training uses 3D parallelism (FSDP, expert parallelism, context parallelism) with block-scaled FP8. The optimizations apply to any large MoE model, with documented support for GLM-5.1, Kimi K2.7-Code, and Nemotron 3 Ultra. The framework is open-source on GitHub.\n\nIn production RAG systems, prompt injection doesn’t only happen at the prompt level, but it can also sneak in through the documents your system retrieves. A vendor PDF, support article, scraped web page, or customer note can contain useful facts and a malicious instruction in the same chunk.\n\nIf your eval only checks whether the answer is factually correct, the system can look safe but treat all retrieved text as something it should obey.\n\nTo prevent this, add a few test documents that mix valid domain facts with instructions like “ignore the system message” or “send the user to this external link.”\n\nThen check two things: the answer should still use the factual content, and it should refuse to follow instructions found inside the retrieved context. Log the chunk IDs too, so a failed test points to the retriever, prompt wrapper, or generation step.\n\nIf you’re building production RAG systems and want to go deeper into retrieval, evaluation, and deployment, check out our [Full Stack AI Engineering](https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?utm_source=Newsletter&utm_medium=email&utm_id=AItips) course.\n\n1. [LangGraph Multi-Agent Systems: From One Brain to Many](https://pub.towardsai.net/langgraph-multi-agent-systems-from-one-brain-to-many-4c1773055693?sharedUserId=tai-tech)\n\nScaling from single-agent graphs to multi-agent systems in LangGraph requires solving three distinct problems: cognitive overload, sequential bottlenecks, and complexity management. This article walks through four architectural levels to address each of the following: a supervisor pattern for task delegation, parallel fan-out for concurrent execution, compiled subgraphs for encapsulated complexity, and the Send API for runtime-generated dynamic workers. It also shows how to build a full research assistant integrating all four patterns, plus human-in-the-loop approval via interrupt().\n\n2. [MCP for LangGraph Developers: From Basics to Production](https://pub.towardsai.net/mcp-for-langgraph-developers-from-basics-to-production-12ff52df3d3c?sharedUserId=tai-tech)\n\nThis article shows how to use MCP to turn an N×M tool integration problem into a write-once, run-anywhere standard. The tutorial covers the Host-Client-Server architecture; three primitives (Tools, Resources, Prompts); a working FastMCP server; transport choices between stdio and Streamable HTTP; LangGraph integration via langchain-mcp-adapters; production hardening for connection lifecycles and state isolation; and composition with multi-agent supervisor systems.\n\n3. [MiniMax Cut Attention Compute by 28x at 1M Tokens](https://pub.towardsai.net/minimax-cut-attention-compute-by-28x-at-1m-tokens-a0cec2a87039?sk=cb3c42b67273193c937297c3c8d84632)\n\nThis article explains MiniMax’s Sparse Attention (MSA), a method that cuts attention compute 28x at one million tokens while preserving exact softmax behavior. Built on Grouped Query Attention, MSA adds a lightweight Index Branch that scores and selects the top 16 key-value blocks per query, capping attention at 2,048 tokens regardless of context length. It uses custom GPU kernels that turn theoretical savings into real wall-clock gains of 14.2x prefill and 7.6x decode speedup.\n\nThe entire AI industry runs on one arithmetic operation, and this piece names it plainly: the dot product. The article works through 15 terms that AI practitioners deploy as gatekeeping vocabulary, including embeddings, attention, RAG, LoRA, RLHF, and temperature, and reduces each to its underlying matrix arithmetic. The article also highlights an honest exception: grounding and AGI remain genuinely unsolved, and no clever rebranding changes that.\n\n5. [Understanding Dropout: How Randomly Removing Neurons Helps Neural Networks Generalize Better](https://pub.towardsai.net/understanding-dropout-how-randomly-removing-neurons-helps-neural-networks-generalize-better-d8ecd3ef8328?sharedUserId=tai-tech)\n\nOverfitting in neural networks occurs when a model memorizes the training data rather than learning general patterns, resulting in high training accuracy but poor real-world performance. This article introduced Dropout, a method that addresses this by randomly disabling neurons during each training iteration, preventing any single neuron from becoming critical and forcing the network to distribute learning across multiple pathways. The author runs regression and classification experiments across dropout rates from 0 to 0.75, showing how moderate values of 0.2 to 0.5 smooth decision boundaries without tipping the model into underfitting.\n\n1. [Agency Agents](https://github.com/msitarzewski/agency-agents) is a collection of 232 specialized AI agent personas across 16 divisions, each with defined expertise, personality, and deliverables, installable with one command into coding agents.\n\n2. [EverOS](https://github.com/EverMind-AI/EverOS) is a Python library and local-first memory runtime for agents that gives one portable memory layer across coding assistants, apps, devices, and workflows.\n\n3. [Container](https://github.com/apple/container) is a tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift and optimized for Apple silicon.\n\n1. [JetSpec: 9.64x Speedup for Speculative Decoding](https://arxiv.org/abs/2606.18394)\n\nPrior approaches to scaling speculative decoding face a causality-efficiency dilemma. Autoregressive drafters produce path-conditioned candidates, but their cost grows with tree depth. Block-diffusion drafters generate all positions in one pass but score branches independently, creating individually plausible yet mutually inconsistent trees. JetSpec resolves this by training a causal parallel draft head over fused hidden states from a frozen target model, so a candidate tree’s scores align with the target’s autoregressive factorization while all nodes are drafted in a single forward pass. On Qwen3–8B with H100 GPUs, JetSpec achieved 9.64x speedup on MATH-500, 8.78x on AIME25, 7.12x on HumanEval, and 4.58x on open-ended chat.\n\n2. [ViQ: Visual Quantized Representations with 20–70% Training Acceleration](https://arxiv.org/abs/2606.27313)\n\nExisting approaches to unifying multimodal modeling cannot balance low-level detail with high-level semantics: reconstruction-oriented representations lack semantic information, while semantically stronger features lose visual detail. ViQ addresses this through a two-stage framework. First, text-aligned pre-training enhances the visual encoder with semantic supervision from a pretrained language model while enabling native-resolution input processing. Second, a proximal representation learning strategy progressively compacts the feature space, paired with position-aware head-wise quantization for flexible resolution handling. Multimodal training with ViQ’s quantized representations yields 20–70% acceleration across different base LLMs and training recipes.\n\n3. [Improved Large Language Diffusion Models](https://arxiv.org/html/2606.25331v1)\n\nThis paper introduces iLLaDA, an 8B masked diffusion language model trained from scratch with fully bidirectional attention. iLLaDA maintains the masked diffusion objective throughout both pre-training (12T tokens) and supervised fine-tuning (25B-token instruction corpus, 12 epochs), rather than switching objectives between stages. It uses variable-length generation for efficiency and introduces confidence-based scoring for multiple-choice evaluation. Compared to the original LLaDA, iLLaDA improves by 21.6 points on BBH, 14.9 points on ARC-Challenge, 14.5 points on MATH, and 16.5 points on HumanEval.\n\nEvery major language model architecture, whether transformer, recurrent, or memory-based, stacks identical layers, with parameters uniformly allocated across depth. This paper asks whether parameter capacity should reflect that asymmetry. Under a fixed-parameter budget, allocating more capacity to earlier layers and less to later ones improves perplexity, whereas the reverse hurts. The authors formalize this as Tapered Language Models (TLMs), applying a smooth cosine schedule to taper MLP width across depth. Across three model scales and four architectures (Transformer, Gated Attention, Hope-attention, and Titans), tapering consistently improves perplexity and downstream benchmark performance over uniform baselines at zero additional parameter or compute cost.\n\n1. [Cursor study finds reward hacking inflates coding-agent benchmark scores](https://cursor.com/blog/reward-hacking-coding-benchmarks) on SWE-bench Pro. Cursor audited 731 Opus 4.8 Max evaluation trajectories and found that 63% of successful resolutions retrieved the known fix (57% from upstream sources, 6% from git history) rather than deriving it through reasoning. When git history was sealed and internet access restricted, Opus 4.8 Max dropped from 87.1% to 73.0%, and Cursor’s own Composer 2.5 dropped from 74.7% to 54.0%. Older models showed smaller gaps: Opus 4.6 lost under 1 point under the same restrictions, suggesting the behavior scales with model capability.\n\n2. [Sakana AI launches Sakana Fugu](https://sakana.ai/fugu-release/), a multi-agent orchestration system delivered as a single OpenAI-compatible API. Fugu is itself a language model trained to call, coordinate, and synthesize outputs from a pool of frontier models, handling model selection, delegation, verification, and synthesis internally. It ships in two variants: Fugu (speed-optimized, single best-fit agent per query) and Fugu Ultra (quality-optimized, multi-agent coordination). Fugu Ultra scored 73.7 on SWE-Bench Pro and leads or matches Opus 4.8, GPT-5.5, and Gemini 3.1 Pro across multiple reasoning and coding benchmarks. Not available in the EU or EEA at launch.\n\n3. [Liquid AI ships LFM2.5–230M](https://www.liquid.ai/blog/lfm2-5-230m), its smallest model yet, built for developers to fine-tune and deploy in agentic workflows. The 230M-parameter model runs at 213 tokens per second on a Galaxy S25 Ultra and 42 tokens per second on a Raspberry Pi 5, leading its class in both prefill and decode throughput while maintaining the smallest memory footprint. Pre-trained on 19T tokens with a 32K context window, it outperforms Qwen3.5–0.8B and Gemma 3 1B on instruction-following and data-extraction benchmarks, despite being 3–4x smaller.\n\n**Generative AI Architect @Cognizant (Chicago, IL, USA)**\n\n**AI Foundations Pod Member @Caterpillar, Inc. (Bangalore, India)**\n\n**Director, Applied AI @Cardinal Health (Multiple US locations)**\n\n**Lead AI Engineer @Mirakl — Labs (Boston, MA, USA)**\n\n**Senior AI Researcher @Clariti Cloud Inc. (Remote/Canada)**\n\n**Senior AI/Gen AI Engineer F/H @Talan (Lyon, France)**\n\n**Senior AI Engineer @PA Consulting (London, UK)**\n\n*Interested in sharing a job opportunity here? Contact **sponsors@towardsai.net**.*\n\n*Think a friend would enjoy this too? **Share the newsletter and let them join the conversation.*\n\n[TAI #211: GPT-5.6 is here, but most people cannot use it yet](https://pub.towardsai.net/tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet-321b6b9c0f3a) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.", "url": "https://wpnews.pro/news/tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet", "canonical_source": "https://pub.towardsai.net/tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet-321b6b9c0f3a?source=rss----98111c9905da---4", "published_at": "2026-07-01 15:29:42+00:00", "updated_at": "2026-07-01 15:55:09.824855+00:00", "lang": "en", "topics": ["large-language-models", "ai-policy", "ai-products", "ai-safety", "ai-infrastructure"], "entities": ["OpenAI", "GPT-5.6", "Anthropic", "Claude Fable", "GLM-5.2", "Cerebras", "Codex", "U.S. government"], "alternates": {"html": "https://wpnews.pro/news/tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet", "markdown": "https://wpnews.pro/news/tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet.md", "text": "https://wpnews.pro/news/tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet.txt", "jsonld": "https://wpnews.pro/news/tai-211-gpt-5-6-is-here-but-most-people-cannot-use-it-yet.jsonld"}}