I Benchmarked Lynkr Against LiteLLM on the Same Backends.

wpnews.pro

Founder disclosure: I built Lynkr, so take this as a technical benchmark write-up, not a neutral industry report. The numbers below come from the same backend providers on both gateways.

If you're routing AI coding traffic through a gateway, just switching providers is not enough. The real savings come from reducing the tokens that ever reach the model in the first place.

I ran Lynkr and LiteLLM against the same backends — Ollama locally, Moonshot, and Azure OpenAI — across 9 scenarios. On the scenarios that actually look like agentic coding work, Lynkr was cheaper because it does three things before forwarding the request upstream: smart tool selection, TOON compression, and semantic caching.

Lynkr was measurably better on the cost-sensitive parts of the workload:

Area	Lynkr result	Why it mattered
Tool selection	53% fewer tokens	Removes irrelevant tool schemas
TOON compression	87.6% fewer tokens	Shrinks large JSON tool outputs
Semantic cache	171ms cache hit	Avoids repeat model calls
Tier routing	Escalates hard prompts	Doesn’t over-optimize for cheapest path

This matters if you're running Claude Code, Codex, Cursor, or similar agent workflows where tools, file reads, grep output, and repeated context dominate your token bill.

Same benchmark inputs, same providers, same request shape.

Each scenario sent the same HTTP request to both gateways at POST /v1/messages

.

A lot of coding requests are read-only, but the model still gets handed the full tool universe: write, edit, bash, git, file ops, everything.

Lynkr classifies the request first and strips irrelevant tool schemas before forwarding upstream. So a read-only question does not pay to carry write-capable tools.

Benchmark setup: 14 tool definitions attached to every request, which is pretty realistic for a Claude Code or Cursor style session.

Result: 53% fewer input tokens and 52% lower cost on the same model and prompt.

This is the kind of optimization that compounds because it happens before every downstream model call.

Tool-heavy workflows often blow up because of structured JSON, not because the user wrote a long prompt.

Lynkr's TOON path compresses large JSON payloads before they hit the provider. Plain text goes through unchanged. The useful effect is that file reads, grep arrays, tool traces, and other structured outputs stop dominating the request.

Benchmark setup: a Bash tool returning 60 grep results as a JSON array, roughly 3,400 tokens unoptimized.

Result: 87.6% token reduction and 50% lower cost at the same latency.

That last part matters. This was not a tradeoff where cost improved because the request got slower. Compression happened in-process and the wall-clock result stayed flat.

The easiest cheap request is the one that never reaches the model.

Lynkr computes embeddings for the incoming prompt and returns a cached response when a semantically similar request shows up again. In the benchmark, the second prompt was just a paraphrase of the first:

The important part is not just token avoidance. The response time dropped from 1.9s to 171ms, about 11x faster.

For interactive tooling, that difference is felt immediately.

LiteLLM has routing. But in this benchmark configuration it was using cost-based-routing

, which means the gateway optimizes for cheap first.

That works for simple questions. It breaks when the prompt genuinely needs a stronger model.

Lynkr scores requests across 15 dimensions — token size, reasoning markers, code complexity, risk signals, and agentic traits — then routes automatically.

In the benchmark:

minimax-m2.5

moonshot-v1-auto

That is the difference between "cheap by default" and "cheap when appropriate."

A lot of gateway comparisons collapse into "who can talk to more providers." That is table stakes now.

The more important question is:

What does the gateway do to reduce spend before the request hits the model?

That is where Lynkr is different in practice.

It stacks three cost levers:

Then it adds tier routing on top, so the remaining requests go to the right model for the job.

That stack is why the benchmark result is interesting. It is not just "Lynkr can route too." It is that Lynkr changes the size and shape of the request before routing even happens.

Using the large JSON tool-result test as a representative tool-heavy scenario:

So on equal footing, same backend, same model class, Lynkr came out roughly 50% cheaper.

That is the distinction I'd care about if I were evaluating an LLM gateway for coding agents. Not whether the gateway has another provider adapter, but whether it reduces the number of tokens my provider ever sees.

Portkey is good at a different layer of the stack.

It is stronger on managed observability, prompt management, and governance. But this benchmark was not measuring dashboarding or policy UX. It was measuring request-path optimization.

On that axis, Lynkr is doing something Portkey does not really center on:

So I would not frame this as "Portkey but cheaper." They solve different primary problems.

To keep this honest, there are a few things worth stating clearly.

I built Lynkr. So the burden is on me to be explicit about methodology and where the numbers come from.

If LiteLLM routes everything to a free local model, the raw total can look lower. But that is not the useful comparison.

The fair comparison is same backend, same prompt, same model class. On those apples-to-apples paths, Lynkr was cheaper because it sent fewer tokens upstream.

In this benchmark, Lynkr injected a system prompt with memory and agent instructions, which added about 2,800 tokens of overhead in some scenarios. That is why comparing estimated raw request size to billed tokens can be misleading.

The correct comparison is billed tokens between Lynkr and LiteLLM on the same scenario.

Lynkr is for teams running things like:

If your real problem is reducing spend on coding workflows without rewriting client-side integrations, the benchmark result is pretty simple:

Lynkr wins when the workload includes tools, structured outputs, repeated prompts, and mixed-complexity requests.

That is exactly what real coding-agent traffic looks like.

The benchmark script is reproducible from the Lynkr repo root:

node benchmark-tier-routing.js

Versions used in this run:

If all you want is a gateway that forwards requests, Lynkr is not interesting.

If you want a gateway that makes coding traffic cheaper before it reaches the model, that is where Lynkr starts to separate.

The three levers that mattered in this benchmark were:

And on top of that, tier routing kept the hard prompts from being sent to the wrong model just because it was cheaper.

If you want to dig into it, the repo is here:

GitHub: https://github.com/Fast-Editor/Lynkr

If you test it against your own coding workload, I would genuinely like to know where it holds up and where it doesn't.

source & further reading

dev.to — original article How I Use Geekflare MCP and Claude as a Developer to Speed My Workflows How AI-Assisted Development Improved My Productivity—Without Replacing My Thinking OpenAI evaluation agent hacks Hugging Face as US safety APIs block the response

I Benchmarked Lynkr Against LiteLLM on the Same Backends.

Run your AI side-project on zahid.host