cd /news/large-language-models/nvidia-nemotron-3-ultra-is-here-and-… · home topics large-language-models article
[ARTICLE · art-21561] src=blog.kilo.ai pub= topic=large-language-models verified=true sentiment=↑ positive

NVIDIA Nemotron 3 Ultra is Here – and it’s Free to Use in Kilo

NVIDIA released Nemotron 3 Ultra, a 550-billion-parameter open-weights model, for free use in Kilo Code for a limited time. The model, announced by CEO Jensen Huang at Computex 2026, is the top open model on the PinchBench agentic benchmarking tool and achieves 5x higher throughput than comparable models. The release signals the growing viability of open-weight models for agentic coding tasks.

read3 min publishedJun 4, 2026

The open-weight future is looking bright

We are incredibly excited to announce that ** NVIDIA Nemotron 3 Ultra** is now available to use in Kilo Code!

NVIDIA just dropped a game-changer for agentic coding, and you can experience the most powerful open-weights model right now directly in your terminal, VS Code, or JetBrains IDE, powered by Kilo Code.

Even better?** NVIDIA Nemotron 3 Ultra is FREE in Kilo for a limited time.**

Meet the 550B Heavyweight: Nemotron 3 Ultra

Introduced by NVIDIA CEO Jensen Huang during his keynote last weekend at Computex 2026 in Taipei, **Nemotron 3 Ultra **is NVIDIA’s flagship open-weights model. But its size isn’t just for show—it is incredibly efficient. On stage, Huang noted the model’s high PinchBench score—it’s currently the top open model on the agentic benchmarking tool. As he put it, the model is “frontier smart” and achieves 5x higher throughput compared to other open models in its class.

Nemotron 3 Super, a 120B-parameter open hybrid MoE model NVIDIA released earlier this year, has become daily driver for many on Kilo. But it has its limitations around planning and long-horizon tasks. The release of Nemotron 3 Ultra sends a signal to the industry that open-weight models are here to stay.

Built on a hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture, it boasts a massive 550 billion total parameters, but only activates 55 billion parameters per token during a forward pass. This means you get the reasoning capabilities of a frontier-class model while maintaining blazing-fast inference speeds, delivering over 300 tokens per second.

Benchmarks & Why it Shines in Kilo Code

Built with contributions from the Nemotron Coalition, Nemotron 3 Ultra was explicitly engineered for tool use, agentic reasoning, and complex coding environments. Here is why it pairs perfectly with Kilo:

The Open-Weights Champion: Nemotron 3 Ultra currently holds the “Best Open-Weights” title on thePinchBench Agentic leaderboardwith an impressive90% median success rate.** High Intelligence and Speed:**It scores a 48 on the Artificial Analysis Intelligence Index, making it the smartest open model from the US to date and placing it in the optimal quadrant for both high capability and fast output speed.A strong Qwen Competitor: In KiloBench, our internal evals, 3 Ultra performed very similarly to Qwen 3.7 Plus. It will be interesting to see who wins for agentic tasks like coding an planning on theKilo Leaderboard.1 Million Token Context Window: Nemotron 3 Ultra natively supports up to 1,000,000 tokens of context. You can load entire codebases, deep API documentation, and massive error logs into Kilo without worrying about forced truncations or the model losing the plot mid-session.Built for Agentic Workflows: The Nemotron 3 family is heavily optimized for multi-environment reinforcement learning (including SWE-RL). It excels at the exact operations coding agents run in their inner loops: multi-step planning, codebase navigation, tool calling, and structured code generation.

Open and Customizable, Deployable Anywhere

At Kilo, we believe in the power of open source. That is why Nemotron 3 Ultra is such a natural fit for our platform. Just as Kilo provides an open-source foundation for your coding workflows, Nemotron 3 Ultra is fully open and customizable, and deployable anywhere.

The Nemotron models are released with open weights, datasets, and recipes, giving organizations total transparency and control to customize models for domain-specific workflows and deploy them exactly where their applications and data reside. Developers can leverage tools like NVIDIA NeMo to customize, evaluate, and optimize the model for their specific use cases. Because the Nemotron family of models is open, organizations can deploy them in entirely self-hosted environments that meet strict regulatory, sovereignty, or data localization requirements—putting you firmly in the driver’s seat.

Give Nemotron 3 Ultra a spin today wherever you use Kilo. It’s totally free in all of our products and features for a limited time!

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/nvidia-nemotron-3-ul…] indexed:0 read:3min 2026-06-04 ·