{"slug": "nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark", "title": "NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark", "summary": "NVIDIA achieved leading agentic coding performance on the first agentic AI benchmark, AA-AgentPerf, delivering up to 20x better performance than previous generations. The benchmark, created by Artificial Analysis, measures concurrent AI agents an inference system can support while meeting service level objectives.", "body_md": "AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how inference systems perform under these conditions. Artificial Analysis [AgentPerf](https://artificialanalysis.ai/benchmarks/hardware) (AA-AgentPerf) offers the industry’s first multi-vendor open benchmarks profiling trajectories that are representative of real-world [AI agent](https://www.nvidia.com/en-us/glossary/ai-agents/) coding tasks.\n\nThis post explains how AA-AgentPerf sets a new standard for measuring agentic workload performance, and how NVIDIA extreme co-design helps deliver up to 20x better agentic coding performance than previous generations.\n\n## What is AA-AgentPerf?\n\nAA-AgentPerf is a hardware benchmark created by [Artificial Analysis](https://artificialanalysis.ai/) that measures the number of concurrent AI agents an inference system can support while meeting predefined, model-specific performance service level objective (SLO) tiers. An SLO is defined as a specific threshold of output token speed and time-to-first-token (TTFT). The benchmark results are normalized per accelerator and per megawatt to enable comparison across hardware configurations.\n\n## Measuring representative agentic coding performance\n\nAgentic workloads are unique because LLM-driven decisions often produce non-deterministic sequences of requests and tool calls. The most difficult part of measuring agent performance is to accurately capture this non-determinism in a representative agent trajectory—the complete sequence of actions, decisions, and observations made by an agent as it traverses through a task from beginning to end (Figure 2).\n\nAA-AgentPerf captures this by measuring GPU performance across prerecorded agentic coding trajectories with interleaved reasoning and tool use, while simulating interturn latency with a representative baseline for CPU tool-call performance. These trajectories are built around solving issues in public code repositories across several use-cases,12+ programming languages, and response from frontier models. In addition to rigorous definition of the trajectories, the Artificial Analysis team also:\n\n- Leveraged representative cached, input, and output sequence lengths for requests, ranging from 5K to 131K with a mean of approximately 27K.\n- Mapped tool calls to representative CPU-side tasks in agentic coding workflows and simulated tool calls across a distribution with a one-second median delay time. The same CPU tool-call baseline was then applied across all systems tested.\n- Keeps the test-set private to prevent benchmark-targeted optimization.\n\n## AA-AgentPerf testing and measurement methodology\n\nThe AA-AgentPerf harness measures the number of concurrent agents an inference system can support while meeting SLO requirements (Figure 3). At launch, this benchmark focuses on testing DeepSeek-V4-Pro across multiple SLO tiers derived from Artificial Analysis serverless API benchmarking data. This ensures that the benchmarks reflect quality-of-service levels observed in production providers today.\n\nDuring a benchmarking run, AA-AgentPerf sends GPUs thousands of concurrent requests drawn from its prerecorded agent trajectory dataset. To ensure independent results for each run, dynamic prefixes are added at the start of every trajectory phase. Strict SLO thresholds are enforced throughout the trajectory, and the highest concurrency level that satisfies those requirements is recorded as the official benchmark result for a given SLO (Figure 3). This process is then repeated across multiple SLO tiers to capture different user experience targets (Table 1).\n\nModel | SLO tier | P25 output speed (tokens/second) | P95 TTFT (seconds) |\n| DeepSeek-V4-Pro | SLO #1 | 30 | 10 |\n| SLO #2 | 100 | 5 | |\n| SLO #3 | 300 | 3 |\n\n*Table 1. SLO tiers and TTFT requirements for AA-AgentPerf DeepSeek-V4-PRO tests*## How to interpret AA-AgentPerf results\n\nThe core AA-AgentPerf metric is runtime power per megawatt—a practical normalization for representing data center scale performance. Table 2 outlines how to leverage the reported performance to estimate how many agentic sessions could be supported for a given power budget.\n\nBenchmark | Value of metric | NVIDIA GB300 NVL72 | NVIDIA H200 |\n| Concurrent agents per MW | Energy efficiency: How many active agents a system can support for a given power budget | 61.4K | 2.6K |\n| Concurrent agents per GPU | Hardware efficiency: How much serving capacity is achieved per GPU | 57.5 | 1.4 |\n\n*Table 2. How to leverage the metrics reported by AgentPerf to aid in capacity planning for data centers aiming to support agentic applications at scale. Numbers reflect AA-AgentPerf results for SLO=30 configurations*\n\nOn launch day, [NVIDIA GB300 NVL72](https://www.nvidia.com/en-us/data-center/gb300-nvl72/) delivers up to 20x more concurrent agents per megawatt than the previous generation, [NVIDIA H200](https://www.nvidia.com/en-us/data-center/h200/) (Figure 4).\n\nThis performance highlights how GB300 NVL72 is able to deliver across large-scale agentic coding workloads, from routing long-lived sessions efficiently to keeping [mixture of experts (MoEs)](https://www.nvidia.com/en-us/glossary/mixture-of-experts/) and GPUs fully utilized across many concurrent agent sessions..\n\n**SGLang,****TensorRT LLM****, or vLLM:** Agent runtimes apply optimizations such as WideEP and DeepEP to spread MoE expert execution across the full NVL72 domain, maximizing effective batch sizes and scaling effectively to thousands of agents.**DeepGEMM and Mega MoE optimizations:** MXFP4/MXFP8 kernels and fused MoE overlap NVLink communication with tensor core compute to boost token throughput for reasoning and code generation.**NVIDIA NVLink****scale-up domain:** GB300 NVL72 links 72 GPUs into a single high-bandwidth NVLink fabric, so every GPU can rapidly share parameters, KV cache, and intermediate results—critical for fast, coordinated execution of agentic coding systems.\n\n## Looking forward: NVIDIA Vera Rubin platform\n\nAA-AgentPerf establishes the standard for evaluating agentic inference, and the results highlight how tightly integrated hardware and software can unlock step-function gains in concurrency and efficiency. NVIDIA GB300 NVL72 demonstrates up to 20x higher agentic coding performance.\n\nThe [NVIDIA Vera Rubin platform](https://www.nvidia.com/en-us/data-center/technologies/rubin/) is expected to extend these gains by leveraging 50 PFLOPs of [NVFP4](https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/) compute and leveraging the Vera CPU to accelerate LLM tool calls and improve end-to-end performance, economics, and efficiency for agentic workflows.\n\nTo learn more about why agentic workloads place unique demands on inference infrastructure and how the [NVIDIA Vera Rubin platform](https://www.nvidia.com/en-us/data-center/technologies/rubin/) optimizes performance, see [Building for the Rising Complexity of Agentic Systems with Extreme Co-Design](https://developer.nvidia.com/blog/building-for-the-rising-complexity-of-agentic-systems-with-extreme-co-design/).\n\n### Acknowledgments\n\n*This work was made possible through the expertise and engineering contributions of Jatin Gangani, Iman Tabrizian, Xiaoming Chen, Peiheng Hu, Taizhong Wu, Shichen Li, Manu Maheswari, and many other talented NVIDIA engineers.*", "url": "https://wpnews.pro/news/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark", "canonical_source": "https://developer.nvidia.com/blog/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark/", "published_at": "2026-06-12 21:12:40+00:00", "updated_at": "2026-06-12 21:25:33.349870+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "ai-infrastructure", "ai-chips", "large-language-models"], "entities": ["NVIDIA", "Artificial Analysis", "AA-AgentPerf", "DeepSeek-V4-Pro"], "alternates": {"html": "https://wpnews.pro/news/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark", "markdown": "https://wpnews.pro/news/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark.md", "text": "https://wpnews.pro/news/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark.txt", "jsonld": "https://wpnews.pro/news/nvidia-achieves-leading-agentic-coding-performance-on-first-agentic-ai-benchmark.jsonld"}}