# NVIDIA Nemotron 3 Ultra is Here – and it’s Free to Use in Kilo

> Source: <https://blog.kilo.ai/p/nvidia-nemotron-3-ultra>
> Published: 2026-06-04 14:05:58+00:00

# NVIDIA Nemotron 3 Ultra is Here – and it’s Free to Use in Kilo

### The open-weight future is looking bright

We are incredibly excited to announce that ** NVIDIA Nemotron 3 Ultra** is now available to use in Kilo Code!

NVIDIA just dropped a game-changer for agentic coding, and you can experience the most powerful open-weights model right now directly in your [terminal, VS Code, or JetBrains IDE](https://kilo.ai/code), powered by Kilo Code.

Even better?** NVIDIA Nemotron 3 Ultra is FREE in Kilo for a limited time.**

**Meet the 550B Heavyweight: Nemotron 3 Ultra**

Introduced by NVIDIA CEO Jensen Huang during his keynote last weekend at Computex 2026 in Taipei, **Nemotron 3 Ultra **is NVIDIA’s flagship open-weights model. But its size isn’t just for show—it is incredibly efficient. On stage, Huang noted the model’s high [PinchBench score](https://pinchbench.com/)—it’s currently the top open model on the agentic benchmarking tool. As he put it, the model is “frontier smart” and achieves [5x higher throughput](https://developer.nvidia.com/blog/?p=117924) compared to other open models in its class.

[Nemotron 3 Super](https://kilo.ai/models/nvidia-nemotron-3-super-120b-a12b-free), a 120B-parameter open hybrid MoE model NVIDIA released earlier this year, has become daily driver for many on Kilo. But it has its limitations around planning and long-horizon tasks. **The release of Nemotron 3 Ultra sends a signal to the industry that open-weight models are here to stay.**

Built on a hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture, it boasts a massive **550 billion total parameters**, but only activates **55 billion parameters per token** during a forward pass. This means you get the reasoning capabilities of a frontier-class model while maintaining blazing-fast inference speeds, delivering over 300 tokens per second.

**Benchmarks & Why it Shines in Kilo Code**

Built with contributions from the [Nemotron Coalition](https://nvidianews.nvidia.com/news/nvidia-launches-nemotron-coalition-of-leading-global-ai-labs-to-advance-open-frontier-models), Nemotron 3 Ultra was explicitly engineered for tool use, agentic reasoning, and complex coding environments. Here is why it pairs perfectly with Kilo:

**The Open-Weights Champion:** Nemotron 3 Ultra currently holds the “Best Open-Weights” title on the[PinchBench Agentic leaderboard](https://pinchbench.com/)with an impressive**90% median success rate**.** High Intelligence and Speed:**It scores a 48 on the Artificial Analysis Intelligence Index, making it the smartest open model from the US to date and placing it in the optimal quadrant for both high capability and fast output speed.**A strong Qwen Competitor:** In KiloBench, our internal evals, 3 Ultra performed very similarly to Qwen 3.7 Plus. It will be interesting to see who wins for agentic tasks like coding an planning on the[Kilo Leaderboard](https://kilo.ai/leaderboard).**1 Million Token Context Window:** Nemotron 3 Ultra natively supports up to 1,000,000 tokens of context. You can load entire codebases, deep API documentation, and massive error logs into Kilo without worrying about forced truncations or the model losing the plot mid-session.**Built for Agentic Workflows:** The Nemotron 3 family is heavily optimized for multi-environment reinforcement learning (including SWE-RL). It excels at the exact operations coding agents run in their inner loops: multi-step planning, codebase navigation, tool calling, and structured code generation.

**Open and Customizable, Deployable Anywhere**

At Kilo, we believe in the power of open source. That is why Nemotron 3 Ultra is such a natural fit for our platform. Just as Kilo provides an open-source foundation for your coding workflows, Nemotron 3 Ultra is fully open and customizable, and deployable anywhere.

The Nemotron models are released with open weights, datasets, and recipes, giving organizations total transparency and control to customize models for domain-specific workflows and deploy them exactly where their applications and data reside. Developers can leverage tools like NVIDIA NeMo to customize, evaluate, and optimize the model for their specific use cases. Because the Nemotron family of models is open, organizations can deploy them in entirely self-hosted environments that meet strict regulatory, sovereignty, or data localization requirements—putting you firmly in the driver’s seat.

Give Nemotron 3 Ultra a spin today [wherever you use Kilo](https://kilo.ai/code). It’s totally **free in all of our products and features** for a limited time!