Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

wpnews.pro

cd /news/large-language-models/nemotron-3-ultra-open-efficient-mixt… · home › topics › large-language-models › article

[ARTICLE · art-28954] src=arxiv.org ↗ pub=2026-06-16T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

NVIDIA released Nemotron 3 Ultra, a 550B-parameter hybrid Mamba-Transformer model with 55B active parameters, achieving up to 6x higher inference throughput than state-of-the-art LLMs while maintaining accuracy. The model, trained on 20 trillion tokens with a 1M token context, is optimized for autonomous agentic reasoning and is open-sourced on HuggingFace.

read1 min views30 publishedJun 16, 2026

arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context length to 1M tokens, and post-trained using Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD). Nemotron 3 Ultra is our most capable model yet, employing multiple key technologies - LatentMoE, Multi Token Prediction (MTP), NVFP4 pre-training, multi-environment RLVR, MOPD, and reasoning budget control. Nemotron 3 Ultra achieves up to ~6x higher inference throughput as compared to state-of-the-art publicly available LLMs while attaining on-par accuracy. The state-of-the-art accuracy, high inference throughput, and 1M token context length make Nemotron 3 Ultra ideal for long-running autonomous agentic tasks. We open-source the base, post-trained, and quantized checkpoints, along with the training data and recipe on HuggingFace.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/nemotron-3-ultra-open-ef…

Read original on arxiv.org → arxiv.org/abs/2606.15007

mentioned entities

NVIDIA

Nemotron 3 Ultra

HuggingFace

LatentMoE

Multi Token Prediction

NVFP4

MOPD

RLVR

metadata

slugnemotron-3-ultra-open-efficient-mixture-of-experts-hybrid-mamba-transformer-for

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevShould you buy a Mac mini now or…

next →Could a diamond wafer as wide as…

── more in #large-language-models 4 stories · sorted by recency

discuss.huggingface.co · 3 Aug · #large-language-models

IntelShed — open-source system combining hybrid RAG, GNN entity resolution, federated learning, and LLM-compiled multi-agent orchestration

github.com · 2 Aug · #large-language-models

Show HN: I implemented the Kimi K3 paper from scratch in PyTorch

sebastianraschka.com · 28 Jul · #large-language-models

Kimi K3 Architecture Notes

leanpub.com · 2 Aug · #large-language-models

Leanpub Book LAUNCH 🚀 Rethinking Performance Engineering for Agentic AI by Kandasamy Selvaraj

── more on @nvidia 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

DeepSeek V4 Flash Outperforms Fable 5 On Terminal Bench While Being 99% Cheaper

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required