The Friday before a long weekend, I asked an agent to migrate a legacy webhook handler while I closed my laptop. It came back with a diff that compiled, ran the tests, and left a note about a fixture it did not want to change without me. That is the shape of the work these agents are pitched for now, and it is the shape GitLab is aiming at with the arrival of Anthropic's Claude Sonnet 5 on the Duo Agent Platform.
GitLab has added Claude Sonnet 5 to Duo Agent Platform across all tiers and every deployment model the platform supports, routed through GitLab's AI Gateway. GitLab positions the model for the kind of work agents already carry inside a CI/CD loop: multi-step tasks, code that holds up under review, and workflows the vendor is willing to call affordable at scale.
The number GitLab wants you to notice is a benchmark one. Sonnet 5 is the first model in GitLab's own evaluation suite to complete all of its benchmark tasks. Its predecessor, Sonnet 4.6, completed 93.8% of them. Read that carefully, because it is GitLab's benchmark, not yours, and benchmarks are a floor, not a ceiling.
If you already use Duo, the delivery detail matters as much as the model change. The AI Gateway is the single hop your requests take before they reach whichever Anthropic endpoint fulfils them, and having that hop means a few things a developer on a normal Tuesday actually feels. It means one place decides which model version you are on. When the vendor ships a point release, the gateway can be pointed at it without every consumer rewriting a config. It means one place handles logging, quota, and (in self-managed shops) authentication. It also means the platform team can pin a project to a specific model when governance requires it, without asking every team to change editor settings.
Nothing about that pattern is unique to GitLab. Every serious platform that wraps a third-party model is running a gateway of some kind now, whether it is a model picker inside an editor plugin, a vendor-run inference proxy, or the growing pile of self-hosted OSS gateways people run to keep prompts out of provider logs. The interesting shift is that the gateway pattern is now a default assumption instead of a preview feature.
The place a DX-minded person cares about is the boring middle of the loop. A multi-step task on an agent is where you either close the laptop with confidence or check back every ten minutes to make sure it has not silently invented a function name. If GitLab's evaluation number holds up in the wild, the second half of that sentence gets rarer.
Two habits worth carrying over regardless of which model you land on:
The 93.8% figure GitLab quotes for Sonnet 4.6 is honest reporting, and it is also a reminder that a full pass on a vendor's own suite does not translate directly to your monorepo. Nothing about a hosted model change fixes the classic sharp edges. A flaky test suite still flakes. An under-documented service still confuses a fresh agent. A merge queue that is already saturated will not suddenly get faster because the model behind the PR review got smarter.
There is also the plain fact of platform lock. Once a team writes agent workflows against Duo's model routing, moving the same workflow onto a different platform means rewriting plumbing, not just prompts. That is not new, and it is not a reason to sit out, but it is worth naming so nobody is surprised eighteen months from now.
Two things. First, whether GitLab publishes any real-world numbers, not benchmark ones, from teams running Sonnet 5 on their own Duo pipelines over the coming weeks. That is the data an engineer can act on. Second, whether the "all deployment models" line holds cleanly for self-managed customers, because the AI Gateway is the surface where self-managed usually diverges from SaaS in ways that hurt on a Tuesday morning.
If you have already moved your Duo agents onto the new model, I would love to hear which of your everyday tasks got quieter, and which are still slow enough that you tab out to wait.