Sort providers by cost, latency, or throughput on AI Gateway

wpnews.pro

cd /news/ai-products/sort-providers-by-cost-latency-or-th… · home › topics › ai-products › article

[ARTICLE · art-13483] src=vercel.com ↗ pub=2026-05-15T00:00Z topic=ai-products verified=true sentiment=↑ positive

Sort providers by cost, latency, or throughput on AI Gateway

Vercel has added a new sorting feature to AI Gateway that allows developers to rank model providers by cost, latency (time to first token), or throughput (tokens per second). The sorting is computed at request time, automatically incorporating new providers, price changes, and performance shifts without code changes. This enables developers to optimize for cost-sensitive workloads, latency-sensitive applications, or long-output generation scenarios.

read2 min views9 publishedMay 15, 2026

You can now sort the providers behind a model by cost, time to first token (TTFT), or throughput (TPS) in AI Gateway. The default provider order blends provider reliability, quality of model output, cost, and speed of response. You can now use sort

for explicit control over ranking criteria.

For models with many providers and noticeable cost or speed variation, you can use `sort`

to optimize on your dimension of choice. Ranking is computed at request time, so newly added providers, price changes, and shifts in observed latency or throughput flow through automatically without any code changes.

Set sort

on providerOptions.gateway

to one of the three values:

Use sort to ensure optimizing for your metric of choice.

In this example, AI Gateway has over five providers for GPT OSS 120B with different prices, so sorting by cost is a useful option for requests that want to route through the lowest price provider.

Providers are tried in sort order. Fallback to the next provider only happens when the higher-ranked one is unavailable.

sort

is compatible with other gateway routing options like Zero Data Retention (ZDR).

The example below uses deepseek/deepseek-v4-pro

for an interactive request where latency and data retention matter: AI Gateway filters to only providers for Deepseek V4 Pro that have zero data retention, and then sorts the remaining providers by time to first token (TTFT). sort

also composes with order

: providers listed in order

are promoted to the front, and the remaining providers follow the requested sort criterion.

See exactly why each request landed where it did. Every response includes a sort

block in the routing metadata showing which providers were considered, the metric values used to rank them, the order they were attempted, and any that were deprioritized due to degraded health.

For more information on sorting via AI Gateway, read the documentation.

source & further reading

vercel.com — original article Grok 4.5 now available on AI Gateway Chat SDK adds Dial support Use any Chat SDK adapter with eve

~/api · this article 200

$curl api.wpnews.pro/v1/news/sort-providers-by-cost-l…

Read original on vercel.com → vercel.com/changelog/sort-providers-by-cost-late…

mentioned entities

AI Gateway

Vercel

GPT OSS 120B

metadata

slugsort-providers-by-cost-latency-or-throughput-on-ai-gateway

topic#ai-products

secondary4 topics

sentimentpositive

canonicalvercel.com

navigation

← prevWhile Court Was In Session

next →[AINews] Everything is Conductor

── more in #ai-products 4 stories · sorted by recency

cryptobriefing.com · 8 Jul · #ai-products

OpenAI’s GPT-5.6 achieves inference breakthrough powered by Cerebras wafer-scale compute

dev.to · 8 Jul · #ai-products

AI Gateway Fees Compared: Who Marks Up Your Tokens?

dev.to · 8 Jul · #ai-products

8 Best AI Gateways in 2026 (Compared)

vercel.com · 8 Jul · #ai-products

Grok 4.5 now available on AI Gateway

── more on @ai gateway 3 stories trending now

wpnews · 25 May · #large-language-models

A drop-in replacement chat template for Qwen/Qwen3.6-27B tuned for open-source agentic coding harnesses.

wpnews · 8 Jul · #artificial-intelligence

Anthropic's "J-lens" reveals workspace in Claude mirrors theory of consciousness

wpnews · 8 Jul · #large-language-models

OpenAI clears government hurdle to launch GPT-5.6 for everyone Thursday

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required