{"slug": "i-benchmarked-lynkr-against-litellm-on-the-same-backends", "title": "I Benchmarked Lynkr Against LiteLLM on the Same Backends.", "summary": "Lynkr, an AI gateway built for coding workflows, outperformed LiteLLM in cost efficiency across nine benchmark scenarios using the same backends—Ollama, Moonshot, and Azure OpenAI. Lynkr achieved a 53% reduction in input tokens through smart tool selection, an 87.6% token reduction via TOON compression for large JSON payloads, and 171ms semantic cache hits that avoided repeat model calls, resulting in significantly lower costs for agentic coding tasks. The gateway also used multi-dimensional request scoring for tier routing, ensuring hard prompts escalated to stronger models rather than defaulting to the cheapest path.", "body_md": "*Founder disclosure: I built Lynkr, so take this as a technical benchmark write-up, not a neutral industry report. The numbers below come from the same backend providers on both gateways.*\n\nIf you're routing AI coding traffic through a gateway, just switching providers is not enough. The real savings come from reducing the tokens that ever reach the model in the first place.\n\nI ran Lynkr and LiteLLM against the same backends — Ollama locally, Moonshot, and Azure OpenAI — across 9 scenarios. On the scenarios that actually look like agentic coding work, Lynkr was cheaper because it does three things before forwarding the request upstream: smart tool selection, TOON compression, and semantic caching.\n\nLynkr was measurably better on the cost-sensitive parts of the workload:\n\n| Area | Lynkr result | Why it mattered |\n|---|---|---|\n| Tool selection | 53% fewer tokens | Removes irrelevant tool schemas |\n| TOON compression | 87.6% fewer tokens | Shrinks large JSON tool outputs |\n| Semantic cache | 171ms cache hit | Avoids repeat model calls |\n| Tier routing | Escalates hard prompts | Doesn’t over-optimize for cheapest path |\n\nThis matters if you're running Claude Code, Codex, Cursor, or similar agent workflows where tools, file reads, grep output, and repeated context dominate your token bill.\n\nSame benchmark inputs, same providers, same request shape.\n\nEach scenario sent the same HTTP request to both gateways at `POST /v1/messages`\n\n.\n\nA lot of coding requests are read-only, but the model still gets handed the full tool universe: write, edit, bash, git, file ops, everything.\n\nLynkr classifies the request first and strips irrelevant tool schemas before forwarding upstream. So a read-only question does not pay to carry write-capable tools.\n\n**Benchmark setup:** 14 tool definitions attached to every request, which is pretty realistic for a Claude Code or Cursor style session.\n\n**Result:** 53% fewer input tokens and 52% lower cost on the same model and prompt.\n\nThis is the kind of optimization that compounds because it happens before every downstream model call.\n\nTool-heavy workflows often blow up because of structured JSON, not because the user wrote a long prompt.\n\nLynkr's TOON path compresses large JSON payloads before they hit the provider. Plain text goes through unchanged. The useful effect is that file reads, grep arrays, tool traces, and other structured outputs stop dominating the request.\n\n**Benchmark setup:** a Bash tool returning 60 grep results as a JSON array, roughly 3,400 tokens unoptimized.\n\n**Result:** 87.6% token reduction and 50% lower cost at the same latency.\n\nThat last part matters. This was not a tradeoff where cost improved because the request got slower. Compression happened in-process and the wall-clock result stayed flat.\n\nThe easiest cheap request is the one that never reaches the model.\n\nLynkr computes embeddings for the incoming prompt and returns a cached response when a semantically similar request shows up again. In the benchmark, the second prompt was just a paraphrase of the first:\n\nThe important part is not just token avoidance. The response time dropped from 1.9s to 171ms, about **11x faster**.\n\nFor interactive tooling, that difference is felt immediately.\n\nLiteLLM has routing. But in this benchmark configuration it was using `cost-based-routing`\n\n, which means the gateway optimizes for cheap first.\n\nThat works for simple questions. It breaks when the prompt genuinely needs a stronger model.\n\nLynkr scores requests across 15 dimensions — token size, reasoning markers, code complexity, risk signals, and agentic traits — then routes automatically.\n\nIn the benchmark:\n\n`minimax-m2.5`\n\n`moonshot-v1-auto`\n\nThat is the difference between \"cheap by default\" and \"cheap when appropriate.\"\n\nA lot of gateway comparisons collapse into \"who can talk to more providers.\" That is table stakes now.\n\nThe more important question is:\n\n**What does the gateway do to reduce spend before the request hits the model?**\n\nThat is where Lynkr is different in practice.\n\nIt stacks three cost levers:\n\nThen it adds **tier routing** on top, so the remaining requests go to the right model for the job.\n\nThat stack is why the benchmark result is interesting. It is not just \"Lynkr can route too.\" It is that Lynkr changes the size and shape of the request before routing even happens.\n\nUsing the large JSON tool-result test as a representative tool-heavy scenario:\n\nSo on equal footing, same backend, same model class, Lynkr came out roughly **50% cheaper**.\n\nThat is the distinction I'd care about if I were evaluating an LLM gateway for coding agents. Not whether the gateway has another provider adapter, but whether it reduces the number of tokens my provider ever sees.\n\nPortkey is good at a different layer of the stack.\n\nIt is stronger on managed observability, prompt management, and governance. But this benchmark was not measuring dashboarding or policy UX. It was measuring request-path optimization.\n\nOn that axis, Lynkr is doing something Portkey does not really center on:\n\nSo I would not frame this as \"Portkey but cheaper.\" They solve different primary problems.\n\nTo keep this honest, there are a few things worth stating clearly.\n\nI built Lynkr. So the burden is on me to be explicit about methodology and where the numbers come from.\n\nIf LiteLLM routes everything to a free local model, the raw total can look lower. But that is not the useful comparison.\n\nThe fair comparison is **same backend, same prompt, same model class**. On those apples-to-apples paths, Lynkr was cheaper because it sent fewer tokens upstream.\n\nIn this benchmark, Lynkr injected a system prompt with memory and agent instructions, which added about 2,800 tokens of overhead in some scenarios. That is why comparing estimated raw request size to billed tokens can be misleading.\n\nThe correct comparison is billed tokens between Lynkr and LiteLLM on the same scenario.\n\nLynkr is for teams running things like:\n\nIf your real problem is reducing spend on coding workflows without rewriting client-side integrations, the benchmark result is pretty simple:\n\n**Lynkr wins when the workload includes tools, structured outputs, repeated prompts, and mixed-complexity requests.**\n\nThat is exactly what real coding-agent traffic looks like.\n\nThe benchmark script is reproducible from the Lynkr repo root:\n\n```\nnode benchmark-tier-routing.js\n```\n\nVersions used in this run:\n\nIf all you want is a gateway that forwards requests, Lynkr is not interesting.\n\nIf you want a gateway that makes coding traffic cheaper **before** it reaches the model, that is where Lynkr starts to separate.\n\nThe three levers that mattered in this benchmark were:\n\nAnd on top of that, tier routing kept the hard prompts from being sent to the wrong model just because it was cheaper.\n\nIf you want to dig into it, the repo is here:\n\n**GitHub:** [https://github.com/Fast-Editor/Lynkr](https://github.com/Fast-Editor/Lynkr)\n\nIf you test it against your own coding workload, I would genuinely like to know where it holds up and where it doesn't.", "url": "https://wpnews.pro/news/i-benchmarked-lynkr-against-litellm-on-the-same-backends", "canonical_source": "https://dev.to/lynkr/i-benchmarked-lynkr-against-litellm-on-the-same-backends-lynkr-was-cheaper-for-tool-heavy-workloads-2onf", "published_at": "2026-06-06 00:14:18+00:00", "updated_at": "2026-06-06 00:41:29.755360+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "ai-products", "large-language-models", "ai-startups"], "entities": ["Lynkr", "LiteLLM", "Ollama", "Moonshot", "Azure OpenAI", "Claude Code", "Codex", "Cursor"], "alternates": {"html": "https://wpnews.pro/news/i-benchmarked-lynkr-against-litellm-on-the-same-backends", "markdown": "https://wpnews.pro/news/i-benchmarked-lynkr-against-litellm-on-the-same-backends.md", "text": "https://wpnews.pro/news/i-benchmarked-lynkr-against-litellm-on-the-same-backends.txt", "jsonld": "https://wpnews.pro/news/i-benchmarked-lynkr-against-litellm-on-the-same-backends.jsonld"}}