{"slug": "sort-providers-by-cost-latency-or-throughput-on-ai-gateway", "title": "Sort providers by cost, latency, or throughput on AI Gateway", "summary": "Vercel has added a new sorting feature to AI Gateway that allows developers to rank model providers by cost, latency (time to first token), or throughput (tokens per second). The sorting is computed at request time, automatically incorporating new providers, price changes, and performance shifts without code changes. This enables developers to optimize for cost-sensitive workloads, latency-sensitive applications, or long-output generation scenarios.", "body_md": "You can now sort the providers behind a model by cost, time to first token (TTFT), or throughput (TPS) in [AI Gateway](https://vercel.com/ai-gateway).\n\nThe default provider order blends provider reliability, quality of model output, cost, and speed of response. You can now use `sort`\n\nfor explicit control over ranking criteria.\n\nFor models with many providers and noticeable cost or speed variation, you can use `sort`\n\nto optimize on your dimension of choice. Ranking is computed at request time, so newly added providers, price changes, and shifts in observed latency or throughput flow through automatically without any code changes.\n\nSet `sort`\n\non `providerOptions.gateway`\n\nto one of the three values:\n\n|\n|\n|\n|\n| Sort by the provider's listed input price per million tokens | Lowest price first | High-volume, cost-sensitive work |\n| Sort by median time to first token, in ms | Lowest latency first | Latency-sensitive workloads where response speed matters |\n| Sort by median tokens per second throughput | Highest first | Long-output generation where total response time matters most |\n\nUse `sort`\n\nto ensure optimizing for your metric of choice.\n\nIn this example, AI Gateway has over five providers for [GPT OSS 120B](https://vercel.com/ai-gateway/models/gpt-oss-120b) with different prices, so sorting by cost is a useful option for requests that want to route through the lowest price provider.\n\nProviders are tried in sort order. Fallback to the next provider only happens when the higher-ranked one is unavailable.\n\n`sort`\n\nis compatible with other gateway routing options like Zero Data Retention (ZDR).\n\nThe example below uses `deepseek/deepseek-v4-pro`\n\nfor an interactive request where latency and data retention matter: AI Gateway filters to only providers for [Deepseek V4 Pro](https://vercel.com/ai-gateway/models/deepseek-v4-pro) that have zero data retention, and then sorts the remaining providers by time to first token (TTFT).\n\n`sort`\n\nalso composes with `order`\n\n: providers listed in `order`\n\nare promoted to the front, and the remaining providers follow the requested sort criterion.\n\nSee exactly why each request landed where it did. Every response includes a `sort`\n\nblock in the routing metadata showing which providers were considered, the metric values used to rank them, the order they were attempted, and any that were deprioritized due to degraded health.\n\nFor more information on sorting via AI Gateway, read the [documentation](https://vercel.com/docs/ai-gateway/models-and-providers/provider-filtering-and-ordering#provider-sorting).", "url": "https://wpnews.pro/news/sort-providers-by-cost-latency-or-throughput-on-ai-gateway", "canonical_source": "https://vercel.com/changelog/sort-providers-by-cost-latency-or-throughput-on-ai-gateway", "published_at": "2026-05-15 00:00:00+00:00", "updated_at": "2026-05-25 00:26:32.570864+00:00", "lang": "en", "topics": ["ai-products", "ai-infrastructure", "ai-tools", "large-language-models", "artificial-intelligence"], "entities": ["AI Gateway", "Vercel", "GPT OSS 120B"], "alternates": {"html": "https://wpnews.pro/news/sort-providers-by-cost-latency-or-throughput-on-ai-gateway", "markdown": "https://wpnews.pro/news/sort-providers-by-cost-latency-or-throughput-on-ai-gateway.md", "text": "https://wpnews.pro/news/sort-providers-by-cost-latency-or-throughput-on-ai-gateway.txt", "jsonld": "https://wpnews.pro/news/sort-providers-by-cost-latency-or-throughput-on-ai-gateway.jsonld"}}