Jensen scores a huge win.
Today’s podcast guest was the lead on NVIDIA Cosmos over a year ago, discussing training videogen and world models. Fittingly, Cosmos 3 launched today, unifying language, image, video, audio and action in a Mixture-of-Transformers architecture that pairs an autoregressive reasoner with a diffusion generator in:
base Nano(16B: 8B reasoner tower + 8B generator tower)** Super**(64B: 32B reasoner tower + 32B generator tower) models, andSuper finetunes for
Text2Image andImage2Video, which are now thenew SOTA open weights imagegen and videogen models, justbelow Nano Banana 2
At Computex in Taiwan, Jensen also brought the heat with Nemotron 3 Ultra, their 550B-A55B, remarkably efficient/fast open weights LLM that is the new US SoTA:
Finally, the RTX Spark personal computer 1 petaflop superchip, was previewed with Microsoft and OpenClaw and Hermes Agent as a launch partner (good analysis here)
AI News for 5/30/2026-6/1/2026. We checked 12 subreddits,
[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!
AI Twitter Recap
NVIDIA’s Cosmos 3, Nemotron 3 Ultra, and the Push for Open Physical AI
NVIDIA’s open-source week: NVIDIA dominated the open-model conversation with** Cosmos 3**, an open family of** omnimodal world models for physical AI**, plus the announcement of** Nemotron 3 Ultra**, a** 550Bopen-weight model that several posters called the strongest U.S. open model so far. Cosmos 3 was framed as a full-stack release—weights, code, datasets, and fine-tuning recipes—with NVIDIA also launching the Cosmos Coalitionalongside partners including Runwayto build an open ecosystem for world models@NVIDIAAI ecosystem context,@runwayml coalition announcement,@kimmonismus Cosmos thread,@ClementDelangue on NVIDIA’s HF footprint.Why Cosmos 3 mattered technically: Beyond robotics rhetoric, the more concrete details were that Cosmos 3 unifies language, image, video, audio, and actionin a single Mixture-of-Transformersdesign pairing an autoregressive reasonerwith a diffusion generator**.Artificial Analysissaid Cosmos 3 reached**#1 among open-weight models** on both theirText-to-Image andImage-to-Video leaderboards, noting the generator usesstructured JSON prompts and can be driven either by an external prompt-upsampling harness or its own reasoner branch. Separately, NVIDIA’s hardware + software push extended to adoption of theOpenMDW framework and partner ecosystem integrations on platforms like fal@ArtificialAnlys,@fal.Nemotron 3 Ultra reception: Community reaction to** Nemotron 3 Ultrawas unusually strong for a fresh open release. Posters highlighted both capability and serving characteristics, including claims that it is already topping some open evals and may be serving at300+ tok/s** in some setups—far faster than large DeepSeek/Kimi-class models@scaling01,@ctnzr,@caspar_br. There was also some technical discussion that Nemotron appearsless sparse than peers like Kimi K2 / DeepSeek V4—roughly**~10% active** vs**~3%**—which could affect both economics and behavior@eliebakouch.
MiniMax M3, Qwen3.7-Plus, and JetBrains Mellum2 Expand the Open Agent Model Field
MiniMax M3’s launch was the day’s biggest model release: M3 was presented as an open-weight multimodal agent/coding model with** 1M context**,** native multimodality**, and competitive agent benchmarks. The headline figures repeated across launch partners were** 59.0% SWE-Bench Pro**,** 66.0% Terminal Bench 2.1**, and** 74.2% MCP Atlas**@MiniMax_AI,@PBDTokenRouter,@kimmonismus. Multiple infra vendors shipped day-0 support—Novita,** Vercel AI Gateway**,** Cloudflare AI Gateway**,** OpenClaude**,** Flowith**, and others—suggesting unusually fast ecosystem adoption@MiniMax_AI on Novita,@rauchg,@gitlawb.Benchmarks vs practical experience were mixed: M3 earned praise for frontend generation, visual/game tasks, and price-performance, with side-by-side demos showing strong one-shot UI/game outputs and notable benchmark placement for Next.js agent evals@notjazii,@lostinlatencyX,@rauchg. But several evaluators also reportedhigh token consumption,** verbose self-check loops**, and occasional** requirement drifton long tasks, making M3 look more like a “quality first, efficiency later” model@ZhihuFrontier review,@teortaxesTex skepticism.Qwen3.7-Plus: Alibaba launched Qwen3.7-Plusas a multimodal interactive hybrid agentthat unifies GUI and CLI operation**, visual reasoning, coding, and search-augmented QA. It is** API-availablevia Alibaba Cloud Model Studio and was quickly added to tools like Cline**@Alibaba_Qwen launch,@cline. The launch reinforces the trend that open-ish Asian labs are no longer releasing “just chat models,” but fullagent-capable multimodal systems.** JetBrains Mellum2**: JetBrains released** Mellum2**, a** 12B MoEmodel with 2.5B active parameters**, trained on roughly** 11T tokensand post-trained with RLVR**, shipping** base / SFT / RL checkpointsand a technical report@nv_pavlichenko,@jetbrains. The intended niche is especially interesting:ultra-low-latency inference forrouting, RAG, sub-agents, and IDE use**, and it landed in** vLLM**immediately@vllm_project. This looks like a serious “small fast open model for developer workflows” play rather than a benchmark-chasing frontier release.
Agents, Sandboxes, Memory, and Search Are Becoming the Real Product Surface
The stack is shifting from model calls to agent runtimes: Several launches converged on the idea that the main engineering leverage is now in the** harnessrather than the model. Perplexity’s “Search as Code”is the clearest example: instead of iterative search tool calls, the model writes Pythonagainst a search SDK, enabling custom ranking pipelines, map-reduce over indexes, batching, aggregation, and lower token overhead. Perplexity reports a jump on its internalWANDR** benchmark from0.152 to0.386 with this architecture@perplexity_ai,@AravSrinivas.Managed agents + sandboxes are becoming standard: Google detailed** Managed Agents in the Gemini API**, where a single API call can spin up an agent that reasons, writes/runs code, manages files, and operates inside a hostedLinux sandbox@_philschmid,@GoogleAIStudio. LangChain pushed similar ideas aroundDeep Agents,** Context Hub**, and** LangSmith Sandboxes/Engine**, emphasizing persistent context, agent lifecycle tooling, and automated failure triage@LangChain,@hwchase17.Memory remains a missing primitive: One recurring complaint was that enormous context windows still don’t solve** cross-session memory**. A thread on** HydraDBargued that “RAG + manual context injection” has been misnamed as memory, while actual persistent session knowledge remains underserved@kimmonismus. Related research threads pointed to reusable context management policies likeAdaCoM**, which trains a separate LLM via RL to prune/preserve context for frozen agents@dair_ai.Security remains the gating issue for enterprise agents: There was a notable warning from Microsoft Security Intelligence about a major** npm supply chain compromiseaffecting 90+ redhat-cloud-services packages**, including a self-propagating worm stealing npm/GitHub/AWS/SSH credentials@MsftSecIntel. At the same time, enterprise agent vendors highlightedsandboxing,** runtime isolation**, and** security stack integrationas prerequisites for deployment, including discussion of NVIDIA OpenShell**and LangChain’s sandbox keynote@shannholmberg,@LangChain.
Codex, Claude Code, and the Competitive Coding-Agent Race
OpenAI extended Codex into more places: OpenAI announced that** frontier models and Codex are now generally available on AWS / Amazon Bedrock**, aimed squarely at enterprises that want OpenAI capabilities inside existing AWS security/compliance workflows@OpenAI,@OpenAIDevs. OpenAI also shipped aCodex Python SDK supporting threads, turns, streaming, resume, images, and sandbox control@reach_vb, plus support for Bedrock-backed Codex workflows@reach_vb on Bedrock config.Claude Code had a real ops incident: Anthropic reset** 5-hour and weekly rate limitsfor Pro and Max users after fixing a bug where some Opus 4.8sessions spawned too many parallel subagents/tool calls**, burning usage unexpectedly@ClaudeDevs,follow-up. That’s a notable reminder that coding-agent product quality is increasingly determined by orchestration behavior, not just raw model IQ.Behavioral differences across coding models remain material: Developers highlighted large qualitative differences between GPT, Claude, and other models on benchmarks like** ProgramBenchand WeirdML**, with Opus sometimes preferring exploration over score-maximization or showing benchmark-specific quirks@OfirPress,@htihle. A separate long thread argued newerClaude Opus 4.6–4.8 variants can fabricate plausible but fictional concepts in non-coding domains, suggesting possible truthfulness/alignment regressions rather than ordinary hallucinations@distributionat.
Infra, Hardware, and Local AI Systems
NVIDIA is coming for the PC: The most-discussed hardware launch was** RTX Spark**, an NVIDIA/Microsoft “personal AI computer” built around** Grace + Blackwell**, with up to** 128GB unified memoryand claimed 1 PFLOP FP4**. The key strategic read: NVIDIA is no longer just selling accelerators, but an end-to-end local AI system that competes withApple Silicon, x86 PCs, and Qualcomm simultaneously@kimmonismus,@swyx.Cluster/networking updates: On the datacenter side,** Lambdasaid it is first to adopt NVIDIA Quantum-X InfiniBand Photonics Q3450-LDswitches, pushing co-packaged optics to reduce network power and failures in large AI clusters@LambdaAPI.OpenAI also announcedStargate Michigan**, a planned** 1GWdata center using closed-loop cooling and paired with workforce/education commitments@OpenAINewsroom.Local open-model tooling is improving fast: The MLX-VLM v0.6.0**release was one of the more substantive local inference/tooling updates, adding speculative decoding, Anthropic-style and responses-style APIs, tool calls, support for many new multimodal models, and image/audio features with the explicit pitch of turning Apple devices into “real local agent machines”@Prince_Canuma. That pairs well with growing DGX Spark +vLLM experimentation for local NVFP4 MoE serving@vllm_project.
Top Tweets (by engagement, filtered for technical relevance)
Anthropic’s IPO path: Anthropic said it has** confidentially submitted a draft S-1to the SEC, opening the door to an IPO pending review@AnthropicAI.Claude Code usage incident: Anthropic reset user rate limits after an Opus 4.8 parallel subagent/tool-call bugcaused excessive quota burn@ClaudeDevs. Qwen3.7-Plus**: Alibaba launched a** multimodal agent modelspanning GUI/CLI operation, coding, and visual tasks@Alibaba_Qwen.OpenAI on Bedrock: OpenAI models and Codexare now available through Amazon Bedrockfor enterprise workflows@OpenAI. ARC-AGI-3 movement**:** Claude Opus 4.8posted a new SOTA on ARC-AGI-3at 1.5%**, still tiny in absolute terms but a meaningful jump on that benchmark@arcprize.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. New Frontier Model Releases and Early Tests
(Activity: 1090):MiniMax M3 - Coding & Agentic Frontier, 1M Context, MultimodalMiniMax M3 is announced as anopen-weightfrontier model with coding/agentic focus, native multimodality/vision, and MiniMax Sparse Attention for up to1M
tokens of context with a guaranteed512K
minimum (MiniMax M3). Claimed long-horizon agentic results include 12-hour ICLR paper reproduction, Hopper FP8 GEMM CUDA/Triton optimization reaching9.4×
speedup after147
**iterations, and PostTrainBench ranking third behind Opus 4.7 and GPT-5.5; access is currently via API/MiniMax Code, with HuggingFace/GitHub weights/local deployment planned.*Commenters are cautiously interested in the combination of cheap/efficient vision plus long-context agentic coding, but skeptical because the announcement calls it“open-weight”*while not yet exposing weights or even parameter count. One technical debate is whether the results imply a much larger-than-~250B
model, extreme benchmark optimization, or a genuine open-weight breakthrough.Commenters focused on the missing release details: despite the claim of being
“the first open-weight model with three frontier capabilities”, users could not find actual weights, parameter count, or sizing information forMiniMax M3. One commenter linked a preview image from the announcement (Reddit image), but the thread still lacked confirmation of model scale or downloadable artifacts.A technically substantive concern was that the advertised capability level implies one of three possibilities:
a much larger-than-expected model, unusually strong benchmark optimization, or a major open-weights breakthrough. The speculation centered on whether MiniMax M3 is actually around~250B
parameters or significantly larger, and whether its coding/agentic/multimodal claims will hold once weights and independent benchmarks are available.
(Activity: 621):NVIDIA announces Nemotron 3 UltraTheimageis a technical announcement slide for NVIDIA Nemotron 3 Ultra, described in comments as a MoE550B-A55
**model. The slide positions Nemotron 3 Ultra against open/open-weight competitors including GLM 5.1, Kimi K2.6, and Qwen3.5 across “Frontier Smart” benchmark categories such as agent productivity, coding, instruction following, knowledge work, and long-context capability.**Commenters viewed the comparison against other open-source/open-weight models positively, while one noted an “artificial analysis score” of48
, placing it just below frontier-tier models and around the MiniMax 2.7 range, with the expectation that it could be the strongest U.S. open-weight model.NVIDIA Nemotron 3 Ultra is identified as a
MoE550B-A55
model, implying roughly550B
total parameters with about55B
active parameters per token. This architecture detail is the most concrete technical spec mentioned in the thread.A commenter cites an
Artificial Analysis score of48
, placing Nemotron 3 Ultra “one notch less than frontier” and roughly in theMiniMax 2.7 range, while suggesting it may be the strongestUS open-weight model by that metric.Technical references shared include NVIDIA’s official Nemotron 3 Ultra Base usage cookbook on GitHub:
NVIDIA-NeMo/Nemotron, plus the LifeArchitect model comparison table:lifearchitect.ai/models-table. One commenter argues the comparison againstQwen3.5 is notable because Nemotron may be NVIDIA’s best open-weight model while still trailing several non-US/open models.
(Activity: 473):Stepfun 3.7 Flash is very goodTheGIFis a technical visual demo, not a meme: it shows the output of Stepfun 3.7 Flash for the promptcreate a beautiful, relaxing flight simulator in a single html page
, rendering a low-poly 3D flight scene with HUD-style speed/altitude indicators. The OP says this was the officialQ4_X_S
quant and claims the model feels near GLM 5.1 in aesthetics and about80%
of its 3D world understanding, while using only roughly25%
of GLM 5.1’s parameters and including built-in vision. Commenters mostly reacted with comparisons and nostalgia rather than deep benchmarks: one referenced the old Excel flight simulator, while another compared interest inQwen 3.7 Max / 27B and asked whether it beatsQwen3.6 27B.A commenter draws a model-comparison angle by referencing
Qwen 3.7 Max and hoping for a futureQwen 3.7 27B release, while another asks whether Stepfun 3.7 Flash is better thanQwen3.6-27B. The thread includes screenshot evidence for the Qwen3.6-27B reference (image), but no quantitative benchmark scores or reproducible eval details are provided.
Keep reading with a 7-day free trial #
Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.