# [AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

> Source: <https://www.latent.space/p/ainews-nvidia-cosmos-3-nemotron-3>
> Published: 2026-06-02 03:28:10+00:00

# [AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

### Jensen scores a huge win.

[Today’s podcast guest](https://www.latent.space/p/video-agents) was the lead on NVIDIA Cosmos over a year ago, discussing training videogen and world models. Fittingly, Cosmos 3 launched today, unifying language, image, video, audio and action in a [Mixture-of-Transformers architecture ](https://x.com/victormustar/status/2061354267546427595?s=20)that pairs an autoregressive reasoner with a diffusion generator in:

**base Nano**(16B: 8B reasoner tower + 8B generator tower)** Super**(64B: 32B reasoner tower + 32B generator tower) models, andSuper finetunes for

**Text2Image** and**Image2Video**, which are now the[new SOTA open weights imagegen and videogen models](https://x.com/ArtificialAnlys/status/2061494719998546206?s=20), just[below Nano Banana 2](https://x.com/victormustar/status/2061354267546427595?s=20)

At Computex in Taiwan, Jensen also brought the heat with [Nemotron 3 Ultra](https://x.com/NVIDIAAI/status/2061495149872771568/photo/1), their 550B-A55B, remarkably efficient/[fast](https://x.com/ArtificialAnlys/status/2061304911565144230?s=20) open weights LLM that is the new US SoTA:

Finally, the RTX Spark personal computer 1 petaflop superchip, was previewed with [Microsoft](https://x.com/satyanadella/status/2061315017589600699) and [OpenClaw](https://x.com/openclaw/status/2061331260279054801?s=20) and [Hermes Agent](https://x.com/NousResearch/status/2061323987804713083?s=20) as a launch partner (good analysis [here](https://x.com/PatrickMoorhead/status/2061452151944274167))

AI News for 5/30/2026-6/1/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

**AI Twitter Recap**

**NVIDIA’s Cosmos 3, Nemotron 3 Ultra, and the Push for Open Physical AI**

**NVIDIA’s open-source week**: NVIDIA dominated the open-model conversation with** Cosmos 3**, an open family of** omnimodal world models for physical AI**, plus the announcement of** Nemotron 3 Ultra**, a** 550B**open-weight model that several posters called the strongest U.S. open model so far. Cosmos 3 was framed as a full-stack release—**weights, code, datasets, and fine-tuning recipes**—with NVIDIA also launching the** Cosmos Coalition**alongside partners including** Runway**to build an open ecosystem for world models[@NVIDIAAI ecosystem context](https://x.com/NVIDIAAI/status/2061498958283968735),[@runwayml coalition announcement](https://x.com/runwayml/status/2061315089869721682),[@kimmonismus Cosmos thread](https://x.com/kimmonismus/status/2061432501223162241),[@ClementDelangue on NVIDIA’s HF footprint](https://x.com/ClementDelangue/status/2061487081315094906).**Why Cosmos 3 mattered technically**: Beyond robotics rhetoric, the more concrete details were that Cosmos 3 unifies** language, image, video, audio, and action**in a single** Mixture-of-Transformers**design pairing an** autoregressive reasoner**with a** diffusion generator**.[Artificial Analysis](https://x.com/ArtificialAnlys/status/2061494719998546206)said Cosmos 3 reached**#1 among open-weight models** on both their**Text-to-Image** and**Image-to-Video** leaderboards, noting the generator uses**structured JSON prompts** and can be driven either by an external prompt-upsampling harness or its own reasoner branch. Separately, NVIDIA’s hardware + software push extended to adoption of the**OpenMDW** framework and partner ecosystem integrations on platforms like fal[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2061494719998546206),[@fal](https://x.com/fal/status/2061604121786876307).**Nemotron 3 Ultra reception**: Community reaction to** Nemotron 3 Ultra**was unusually strong for a fresh open release. Posters highlighted both capability and serving characteristics, including claims that it is already topping some open evals and may be serving at**300+ tok/s** in some setups—far faster than large DeepSeek/Kimi-class models[@scaling01](https://x.com/scaling01/status/2061379856433107135),[@ctnzr](https://x.com/ctnzr/status/2061483152741175757),[@caspar_br](https://x.com/caspar_br/status/2061505720907182280). There was also some technical discussion that Nemotron appears**less sparse** than peers like Kimi K2 / DeepSeek V4—roughly**~10% active** vs**~3%**—which could affect both economics and behavior[@eliebakouch](https://x.com/eliebakouch/status/2061607195268038777).

**MiniMax M3, Qwen3.7-Plus, and JetBrains Mellum2 Expand the Open Agent Model Field**

**MiniMax M3’s launch was the day’s biggest model release**: M3 was presented as an open-weight multimodal agent/coding model with** 1M context**,** native multimodality**, and competitive agent benchmarks. The headline figures repeated across launch partners were** 59.0% SWE-Bench Pro**,** 66.0% Terminal Bench 2.1**, and** 74.2% MCP Atlas**[@MiniMax_AI](https://x.com/MiniMax_AI/status/2061425142795034794),[@PBDTokenRouter](https://x.com/PBDTokenRouter/status/2061463048485838935),[@kimmonismus](https://x.com/kimmonismus/status/2061473350766170420). Multiple infra vendors shipped day-0 support—**Novita**,** Vercel AI Gateway**,** Cloudflare AI Gateway**,** OpenClaude**,** Flowith**, and others—suggesting unusually fast ecosystem adoption[@MiniMax_AI on Novita](https://x.com/MiniMax_AI/status/2061398427121201648),[@rauchg](https://x.com/rauchg/status/2061593874498531707),[@gitlawb](https://x.com/gitlawb/status/2061581678871806083).**Benchmarks vs practical experience were mixed**: M3 earned praise for frontend generation, visual/game tasks, and price-performance, with side-by-side demos showing strong one-shot UI/game outputs and notable benchmark placement for Next.js agent evals[@notjazii](https://x.com/notjazii/status/2061407087293313210),[@lostinlatencyX](https://x.com/lostinlatencyX/status/2061409696649548165),[@rauchg](https://x.com/rauchg/status/2061593874498531707). But several evaluators also reported**high token consumption**,** verbose self-check loops**, and occasional** requirement drift**on long tasks, making M3 look more like a “quality first, efficiency later” model[@ZhihuFrontier review](https://x.com/ZhihuFrontier/status/2061493401019957337),[@teortaxesTex skepticism](https://x.com/teortaxesTex/status/2061432151183171702).**Qwen3.7-Plus**: Alibaba launched** Qwen3.7-Plus**as a** multimodal interactive hybrid agent**that unifies** GUI and CLI operation**, visual reasoning, coding, and search-augmented QA. It is** API-available**via Alibaba Cloud Model Studio and was quickly added to tools like** Cline**[@Alibaba_Qwen launch](https://x.com/Alibaba_Qwen/status/2061506641120641494),[@cline](https://x.com/cline/status/2061580233778790439). The launch reinforces the trend that open-ish Asian labs are no longer releasing “just chat models,” but full**agent-capable multimodal systems**.** JetBrains Mellum2**: JetBrains released** Mellum2**, a** 12B MoE**model with** 2.5B active parameters**, trained on roughly** 11T tokens**and post-trained with** RLVR**, shipping** base / SFT / RL checkpoints**and a technical report[@nv_pavlichenko](https://x.com/nv_pavlichenko/status/2061438808290172935),[@jetbrains](https://x.com/jetbrains/status/2061444430884675791). The intended niche is especially interesting:**ultra-low-latency inference** for**routing, RAG, sub-agents, and IDE use**, and it landed in** vLLM**immediately[@vllm_project](https://x.com/vllm_project/status/2061621691995005301#m). This looks like a serious “small fast open model for developer workflows” play rather than a benchmark-chasing frontier release.

**Agents, Sandboxes, Memory, and Search Are Becoming the Real Product Surface**

**The stack is shifting from model calls to agent runtimes**: Several launches converged on the idea that the main engineering leverage is now in the** harness**rather than the model.** Perplexity’s “Search as Code”**is the clearest example: instead of iterative search tool calls, the model writes** Python**against a search SDK, enabling custom ranking pipelines, map-reduce over indexes, batching, aggregation, and lower token overhead. Perplexity reports a jump on its internal**WANDR** benchmark from**0.152** to**0.386** with this architecture[@perplexity_ai](https://x.com/perplexity_ai/status/2061506359326384319),[@AravSrinivas](https://x.com/AravSrinivas/status/2061575845056278971).**Managed agents + sandboxes are becoming standard**: Google detailed** Managed Agents in the Gemini API**, where a single API call can spin up an agent that reasons, writes/runs code, manages files, and operates inside a hosted**Linux sandbox**[@_philschmid](https://x.com/_philschmid/status/2061457703210197273),[@GoogleAIStudio](https://x.com/GoogleAIStudio/status/2061452967530701090). LangChain pushed similar ideas around**Deep Agents**,** Context Hub**, and** LangSmith Sandboxes/Engine**, emphasizing persistent context, agent lifecycle tooling, and automated failure triage[@LangChain](https://x.com/LangChain/status/2061432934993674267),[@hwchase17](https://x.com/hwchase17/status/2061496556608504043).**Memory remains a missing primitive**: One recurring complaint was that enormous context windows still don’t solve** cross-session memory**. A thread on** HydraDB**argued that “RAG + manual context injection” has been misnamed as memory, while actual persistent session knowledge remains underserved[@kimmonismus](https://x.com/kimmonismus/status/2061454202883432501). Related research threads pointed to reusable context management policies like**AdaCoM**, which trains a separate LLM via RL to prune/preserve context for frozen agents[@dair_ai](https://x.com/dair_ai/status/2061455253325971789).**Security remains the gating issue for enterprise agents**: There was a notable warning from Microsoft Security Intelligence about a major** npm supply chain compromise**affecting** 90+ redhat-cloud-services packages**, including a self-propagating worm stealing npm/GitHub/AWS/SSH credentials[@MsftSecIntel](https://x.com/MsftSecIntel/status/2061485730958848188). At the same time, enterprise agent vendors highlighted**sandboxing**,** runtime isolation**, and** security stack integration**as prerequisites for deployment, including discussion of** NVIDIA OpenShell**and LangChain’s sandbox keynote[@shannholmberg](https://x.com/shannholmberg/status/2061368566256189656),[@LangChain](https://x.com/LangChain/status/2061448130806116827).

**Codex, Claude Code, and the Competitive Coding-Agent Race**

**OpenAI extended Codex into more places**: OpenAI announced that** frontier models and Codex are now generally available on AWS / Amazon Bedrock**, aimed squarely at enterprises that want OpenAI capabilities inside existing AWS security/compliance workflows[@OpenAI](https://x.com/OpenAI/status/2061564502160892138),[@OpenAIDevs](https://x.com/OpenAIDevs/status/2061564710173224985). OpenAI also shipped a**Codex Python SDK** supporting threads, turns, streaming, resume, images, and sandbox control[@reach_vb](https://x.com/reach_vb/status/2061569472792572163), plus support for Bedrock-backed Codex workflows[@reach_vb on Bedrock config](https://x.com/reach_vb/status/2061572961451094191).**Claude Code had a real ops incident**: Anthropic reset** 5-hour and weekly rate limits**for Pro and Max users after fixing a bug where some** Opus 4.8**sessions spawned too many** parallel subagents/tool calls**, burning usage unexpectedly[@ClaudeDevs](https://x.com/ClaudeDevs/status/2061501787769893055),[follow-up](https://x.com/ClaudeDevs/status/2061501790131265803). That’s a notable reminder that coding-agent product quality is increasingly determined by orchestration behavior, not just raw model IQ.**Behavioral differences across coding models remain material**: Developers highlighted large qualitative differences between GPT, Claude, and other models on benchmarks like** ProgramBench**and** WeirdML**, with Opus sometimes preferring exploration over score-maximization or showing benchmark-specific quirks[@OfirPress](https://x.com/OfirPress/status/2061458258821251081),[@htihle](https://x.com/htihle/status/2061412097720774679). A separate long thread argued newer**Claude Opus 4.6–4.8** variants can fabricate plausible but fictional concepts in non-coding domains, suggesting possible truthfulness/alignment regressions rather than ordinary hallucinations[@distributionat](https://x.com/distributionat/status/2061362406971060244).

**Infra, Hardware, and Local AI Systems**

**NVIDIA is coming for the PC**: The most-discussed hardware launch was** RTX Spark**, an NVIDIA/Microsoft “personal AI computer” built around** Grace + Blackwell**, with up to** 128GB unified memory**and claimed** 1 PFLOP FP4**. The key strategic read: NVIDIA is no longer just selling accelerators, but an end-to-end local AI system that competes with**Apple Silicon**, x86 PCs, and Qualcomm simultaneously[@kimmonismus](https://x.com/kimmonismus/status/2061484174088007739),[@swyx](https://x.com/swyx/status/2061567877879369953).**Cluster/networking updates**: On the datacenter side,** Lambda**said it is first to adopt** NVIDIA Quantum-X InfiniBand Photonics Q3450-LD**switches, pushing co-packaged optics to reduce network power and failures in large AI clusters[@LambdaAPI](https://x.com/LambdaAPI/status/2061319330433032658).**OpenAI** also announced**Stargate Michigan**, a planned** 1GW**data center using closed-loop cooling and paired with workforce/education commitments[@OpenAINewsroom](https://x.com/OpenAINewsroom/status/2061533639138316314).**Local open-model tooling is improving fast**: The** MLX-VLM v0.6.0**release was one of the more substantive local inference/tooling updates, adding speculative decoding, Anthropic-style and responses-style APIs, tool calls, support for many new multimodal models, and image/audio features with the explicit pitch of turning Apple devices into “real local agent machines”[@Prince_Canuma](https://x.com/Prince_Canuma/status/2061541992790683726). That pairs well with growing DGX Spark +**vLLM** experimentation for local NVFP4 MoE serving[@vllm_project](https://x.com/vllm_project/status/2061530659160838549).

**Top Tweets (by engagement, filtered for technical relevance)**

**Anthropic’s IPO path**: Anthropic said it has** confidentially submitted a draft S-1**to the SEC, opening the door to an IPO pending review[@AnthropicAI](https://x.com/AnthropicAI/status/2061478052257841495).**Claude Code usage incident**: Anthropic reset user rate limits after an** Opus 4.8 parallel subagent/tool-call bug**caused excessive quota burn[@ClaudeDevs](https://x.com/ClaudeDevs/status/2061501787769893055).** Qwen3.7-Plus**: Alibaba launched a** multimodal agent model**spanning GUI/CLI operation, coding, and visual tasks[@Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2061506641120641494).**OpenAI on Bedrock**: OpenAI models and** Codex**are now available through** Amazon Bedrock**for enterprise workflows[@OpenAI](https://x.com/OpenAI/status/2061564502160892138).** ARC-AGI-3 movement**:** Claude Opus 4.8**posted a new SOTA on** ARC-AGI-3**at** 1.5%**, still tiny in absolute terms but a meaningful jump on that benchmark[@arcprize](https://x.com/arcprize/status/2061512025638121516).

**AI Reddit Recap**

**/r/LocalLlama + /r/localLLM Recap**

**1. New Frontier Model Releases and Early Tests**

(Activity: 1090):[MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal](https://www.reddit.com/r/LocalLLaMA/comments/1ttdiq0/minimax_m3_coding_agentic_frontier_1m_context/)**MiniMax M3 is announced as an****open-weight****frontier model with coding/agentic focus, native multimodality/vision, and MiniMax Sparse Attention for up to**`1M`

**tokens of context with a guaranteed**`512K`

**minimum (**[MiniMax M3](https://www.minimax.io/models/text/m3)). Claimed long-horizon agentic results include 12-hour ICLR paper reproduction, Hopper FP8 GEMM CUDA/Triton optimization reaching`9.4×`

**speedup after**`147`

**iterations, and PostTrainBench ranking third behind Opus 4.7 and GPT-5.5; access is currently via API/MiniMax Code, with HuggingFace/GitHub weights/local deployment planned.**Commenters are cautiously interested in the combination of cheap/efficient vision plus long-context agentic coding, but skeptical because the announcement calls it*“open-weight”*while not yet exposing weights or even parameter count. One technical debate is whether the results imply a much larger-than-`~250B`

model, extreme benchmark optimization, or a genuine open-weight breakthrough.Commenters focused on the missing release details: despite the claim of being

*“the first open-weight model with three frontier capabilities”*, users could not find actual weights, parameter count, or sizing information for**MiniMax M3**. One commenter linked a preview image from the announcement ([Reddit image](https://preview.redd.it/fej3vn94qk4h1.jpeg?width=3808&format=pjpg&auto=webp&s=83ef24ab093520eb3118dd918259adff4f42a569)), but the thread still lacked confirmation of model scale or downloadable artifacts.A technically substantive concern was that the advertised capability level implies one of three possibilities:

**a much larger-than-expected model**, unusually strong benchmark optimization, or a major open-weights breakthrough. The speculation centered on whether MiniMax M3 is actually around`~250B`

parameters or significantly larger, and whether its coding/agentic/multimodal claims will hold once weights and independent benchmarks are available.

(Activity: 621):[NVIDIA announces Nemotron 3 Ultra](https://www.reddit.com/r/LocalLLaMA/comments/1tthkh5/nvidia_announces_nemotron_3_ultra/)**The**[image](https://i.redd.it/f79wu6dnml4h1.jpeg)is a technical announcement slide for NVIDIA Nemotron 3 Ultra, described in comments as a MoE`550B-A55`

**model. The slide positions Nemotron 3 Ultra against open/open-weight competitors including GLM 5.1, Kimi K2.6, and Qwen3.5 across “Frontier Smart” benchmark categories such as agent productivity, coding, instruction following, knowledge work, and long-context capability.**Commenters viewed the comparison against other open-source/open-weight models positively, while one noted an “artificial analysis score” of`48`

, placing it just below frontier-tier models and around the MiniMax 2.7 range, with the expectation that it could be the strongest U.S. open-weight model.NVIDIA Nemotron 3 Ultra is identified as a

**MoE**`550B-A55`

model, implying roughly`550B`

total parameters with about`55B`

active parameters per token. This architecture detail is the most concrete technical spec mentioned in the thread.A commenter cites an

**Artificial Analysis score of**`48`

, placing Nemotron 3 Ultra “one notch less than frontier” and roughly in the**MiniMax 2.7** range, while suggesting it may be the strongest**US open-weight** model by that metric.Technical references shared include NVIDIA’s official Nemotron 3 Ultra Base usage cookbook on GitHub:

[NVIDIA-NeMo/Nemotron](https://github.com/NVIDIA-NeMo/Nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra-Base), plus the LifeArchitect model comparison table:[lifearchitect.ai/models-table](https://lifearchitect.ai/models-table/). One commenter argues the comparison against**Qwen3.5** is notable because Nemotron may be NVIDIA’s best open-weight model while still trailing several non-US/open models.

(Activity: 473):[Stepfun 3.7 Flash is very good](https://www.reddit.com/r/LocalLLaMA/comments/1tss9nq/stepfun_37_flash_is_very_good/)**The**[GIF](https://i.redd.it/k37ol07vfg4h1.gif)is a technical visual demo, not a meme: it shows the output of Stepfun 3.7 Flash for the prompt`create a beautiful, relaxing flight simulator in a single html page`

**, rendering a low-poly 3D flight scene with HUD-style speed/altitude indicators. The OP says this was the official**`Q4_X_S`

**quant and claims the model feels near GLM 5.1 in aesthetics and about**`80%`

**of its 3D world understanding, while using only roughly**`25%`

**of GLM 5.1’s parameters and including built-in vision.** Commenters mostly reacted with comparisons and nostalgia rather than deep benchmarks: one referenced the old Excel flight simulator, while another compared interest in**Qwen 3.7 Max / 27B** and asked whether it beats**Qwen3.6 27B**.A commenter draws a model-comparison angle by referencing

**Qwen 3.7 Max** and hoping for a future**Qwen 3.7 27B** release, while another asks whether Stepfun 3.7 Flash is better than**Qwen3.6-27B**. The thread includes screenshot evidence for the Qwen3.6-27B reference ([image](https://preview.redd.it/h1jbx5tz4j4h1.png?width=1523&format=png&auto=webp&s=c4bd572a0741fcffc65f2b75153efbb603ede82b)), but no quantitative benchmark scores or reproducible eval details are provided.

## Keep reading with a 7-day free trial

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.
