{"slug": "benchmarking-ai-gateways-gomodel-vs-litellm-vs-portkey-vs-bifrost", "title": "Benchmarking AI Gateways: GoModel vs LiteLLM vs Portkey vs Bifrost", "summary": "A developer benchmarked four AI gateways—GoModel, LiteLLM, Portkey, and Bifrost—on runtime and deployment overhead. GoModel, a small open-source gateway written in Go, showed significantly lower memory usage, faster cold starts, and smaller image size compared to LiteLLM, which had a 372 MB compressed image and 25-second cold start. The benchmark measured latency, throughput, memory, CPU, and cold start on an AWS c7i.large instance with a mock backend to isolate gateway performance.", "body_md": "In October 2025 I tried to build my startup on top of LiteLLM.\n\nAt first it looked like the obvious choice. It supported many providers, it had\n\nan OpenAI-compatible API, and it was already used by a lot of people. I did not\n\nwant to write an AI gateway. I wanted to build the product behind it.\n\nThen I started running it on the hot path.\n\nMy opinion changed there.\n\nA gateway is not a dashboard or integration glue you call once in a while. It\n\nsits on every request, every retry, every stream, every tool call, every\n\nfallback, every timeout.\n\nA heavy gateway charges rent forever.\n\nMost AI gateway comparisons miss that part. They talk about provider count,\n\ndashboards, tracing, and \"support for 1000+ models\". Those things matter, but\n\nthey are not free. Before the gateway calls OpenAI, Anthropic, Gemini, vLLM, or\n\nanything else, it has already spent your CPU, memory, cold-start time, and\n\noperational budget.\n\nI am not comparing full product maturity here. I am comparing how these gateways\n\nbehave on the hot path.\n\nSo I started writing [GoModel](https://github.com/ENTERPILOT/GoModel): a small\n\nopen-source AI gateway and AI control plane in Go, with an OpenAI-compatible API\n\nand explicit provider adapters.\n\nWhen I [launched GoModel on Hacker News](https://news.ycombinator.com/item?id=47861333),\n\nI promised a real, reproducible benchmark. This article is that follow-up.\n\nThe benchmark question is simple:\n\n**How lean is each AI gateway when it sits on the request path?**\n\nThat question runs through the whole benchmark: GoModel vs LiteLLM vs Portkey vs\n\nBifrost, measured by latency, throughput, memory, CPU, cold start, and image\n\nsize rather than landing pages or feature matrices.\n\nLatency gets the easiest arguments. It rarely tells the whole story.\n\nMost real LLM calls are dominated by inference time. If a model takes `2000 ms`\n\nto answer, the difference between `5 ms`\n\nand `15 ms`\n\nof proxy overhead is not\n\nthe main story.\n\nThe main story is the deployment envelope:\n\nThose numbers decide whether the gateway can run where you want it to run.\n\nA `372 MB`\n\ncompressed image (`1.2 GB`\n\nunpacked) that idles around gigabytes of\n\nRAM and takes `25 s`\n\nto cold-start is a different operational thing than a\n\n`16 MB`\n\nimage that peaks at `37 MB`\n\nof RAM and is serving traffic `0.56 s`\n\nafter\n\nlaunch.\n\nSo I care about the runtime footprint.\n\nThis benchmark does **not** prove that one gateway is best for every company.\n\nI am not measuring:\n\nThose things matter. Some of them matter a lot.\n\nLiteLLM in particular has more integrated providers and more gateway features\n\nthan GoModel today. If your first requirement is maximum provider coverage right\n\nnow, LiteLLM has a real advantage. This benchmark does not erase that. It\n\nmeasures the runtime footprint of putting each gateway on the request path. In\n\npractice, many smaller or newer providers already expose an OpenAI-compatible\n\nAPI, so provider count is not always the same as practical routing coverage.\n\nThe benchmark measures one narrower thing: **runtime and deployment overhead on\nthe request path**.\n\nThat still matters, because the gateway is on the hot path. If you run high\n\nrequest volume, local models, serverless workloads, edge workloads, or many small\n\nmodel calls, the overhead stops being theoretical.\n\nI tested four AI gateways people actually compare:\n\nEvery gateway talked to the **same instant mock backend**, on purpose. I did not\n\nwant to benchmark OpenAI, Anthropic, AWS networking, or random internet jitter.\n\nI wanted to isolate the gateway itself.\n\nEach gateway ran one at a time, in Docker, on an **AWS c7i.large** with\n\nI first ran this on a free-tier `t2.micro`\n\n. That was cheap and easy to\n\nreproduce, but unfair to the heavier gateways. A 1 GiB machine cannot hold a\n\ngateway that wants gigabytes of memory, so it starts swapping. At that point you\n\nare benchmarking the host being too small.\n\nSo I moved to `c7i.large`\n\n: still small, but non-burstable and large enough that\n\nnothing swaps. It also makes the LiteLLM setup more honest. LiteLLM recommends\n\none worker per vCPU, and this machine has 2 vCPUs, so LiteLLM gets 2\n\nworkers. That gives it the multi-core access it is supposed to have instead of\n\npinning it to a single worker on a tiny box.\n\nThe test covered six workloads:\n\nEach workload used `8,000`\n\nrequests at concurrency `10`\n\n, across **two trials\nwith randomized gateway order**. Latency is the\n\nI would not call this a statistically exhaustive study. It is a reproducible\n\nengineering benchmark, and the harness is public so people can rerun it, change\n\nthe machine, or add their own workloads.\n\nA few details matter if you want to reproduce or criticize the numbers:\n\n`2`\n\nworkers.Representative latency is chat completions, non-streaming. All resource figures\n\nare measured under load on the same box.\n\n| Metric | GoModel | Bifrost | Portkey | LiteLLM |\n|---|---|---|---|---|\n| Runtime | Go | Go | Node.js | Python |\nLatency overhead `p50`\n|\n`1.8 ms` |\n`2.5 ms` |\n`9.7 ms` |\n`30.6 ms` |\nLatency `p99`\n|\n`6.9 ms` |\n`18.3 ms` |\n`30.5 ms` |\n`39.3 ms` |\n| Throughput (sustained) | `4900 req/s` |\n`3100 req/s` |\n`950 req/s` |\n`324 req/s` |\n| Peak RAM under load | `37 MB` |\n`143 MB` |\n`112 MB` |\n`2.3 GB` |\n| Efficiency (req/s per CPU %) | `52` |\n`25` |\n`8.2` |\n`2.6` |\n| Cold start to first request | `0.56 s` |\n`7.1 s` |\n`1.1 s` |\n`25.5 s` |\n| Docker image (compressed pull) | `16 MB` |\n`77 MB` |\n`59 MB` |\n`372 MB` |\n| Workload coverage | `6/6` |\n`6/6` |\n`4/6` |\n`6/6` |\n| Vendor-neutral core | Yes | Partial † | Yes | Yes |\n| Core source available | Yes ‡ | Partial ‡ | Partial ‡ | Yes |\n\nGoModel had the lowest median latency and the tightest tail: `1.8 ms`\n\np50 and\n\n`6.9 ms`\n\np99.\n\nBifrost was close on median latency at `2.5 ms`\n\n, which is a good result. The\n\ngap opened at the tail and in memory: `18.3 ms`\n\np99 and `143 MB`\n\npeak RAM under\n\nload.\n\nPortkey was heavier than I expected for this narrow proxy benchmark. It served\n\n`950 req/s`\n\nsustained and used `112 MB`\n\npeak RAM under load. In this setup it did\n\nnot serve the Anthropic `/v1/messages`\n\ndialect, so it gets `4/6`\n\nworkload\n\ncoverage. Treat that as a setup limitation, not a claim that Portkey cannot\n\nsupport Anthropic in a fuller virtual-key configuration.\n\nLiteLLM was the outlier. At its recommended worker count, it used about\n\n`2.3 GB`\n\nof RAM, cold-started in `25.5 s`\n\n, and sustained `324 req/s`\n\n.\n\nNot because Python is morally bad. The language matters only when it changes the\n\ndeployment envelope. Here it does: memory floor, image size, cold-start time,\n\ndependency graph, and throughput per core.\n\nThe later [supply-chain incident around LiteLLM](https://futuresearch.ai/blog/litellm-pypi-supply-chain-attack/)\n\nalso made me more confident in GoModel's design direction. A small Go binary\n\nwith a standard-library-heavy dependency tree is structurally less exposed to\n\nthat class of problem than a large Python dependency graph.\n\nForwarding JSON is not the hard part.\n\nThe hard part is provider drift.\n\nOpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Groq, xAI, Cerebras, vLLM,\n\nand local servers all disagree in small ways. Then they change those ways. Tool\n\ncalling changes. Streaming changes. Reasoning parameters change. Image inputs\n\nchange. Error formats change. Rate-limit semantics change.\n\nAn AI gateway or AI control plane has to absorb that without becoming magic.\n\nGoModel's bet is not \"support every model name on the internet\".\n\nThe bet is:\n\nFor the same reason, GoModel starts as a small OpenAI-compatible gateway, not as\n\na dashboard with a proxy attached.\n\nIf all your traffic goes to a cloud model that takes several seconds to answer,\n\ngateway overhead can look academic.\n\nLocal models change the math.\n\nIf you are routing through an AI gateway to vLLM, Ollama, LM Studio, llama.cpp,\n\nor small specialized models on your own network, the model call can be much\n\nfaster. Then gateway overhead, cold starts, memory, and sidecar size matter more.\n\nOne reason I want GoModel to stay small: a gateway should be cheap enough to put\n\nnear the workload.\n\nBifrost is built by Maxim AI, an LLM\n\nevaluation and observability platform. It routes to many model providers, but\n\nthe gateway also sits close to Maxim's eval and observability ecosystem. If you\n\nwant to choose your own eval platform, or stay independent from any eval\n\nplatform, ask whether Bifrost is the right match for you. Good software can\n\nstill have incentives attached. \"Vendor-neutral\" needs an asterisk here.\n\n\"Open-source\" also needs care.\n\nPortkey keeps observability storage, dashboard, multi-team RBAC, and at-scale\n\nsemantic caching in a closed managed tier. Bifrost's core gateway is Apache-2.0,\n\nbut its Enterprise edition adds closed or managed features. LiteLLM's proxy core\n\nis MIT, but enterprise features like SSO, audit logs, and fine-grained access\n\ncontrol sit behind a proprietary commercial license.\n\nGoModel is open-source today. Some enterprise-grade AI control plane features may\n\nstay private. The core gateway is intended to remain useful without those private\n\nfeatures.\n\nThe benchmark is built to be self-verifiable. It provisions the AWS instance,\n\nruns every gateway against the same backend, prints the tables, and destroys the\n\ninfrastructure.\n\n```\n./run.sh\n```\n\nOne caveat: it runs on **paid** AWS infrastructure, not the free tier. A\n\n`c7i.large`\n\nis about `$0.09`\n\n/hour and the run self-destructs within an hour or\n\ntwo, so budget **under $1** per run to be safe.\n\nIf you pass `KEEP=1`\n\nor teardown fails, you keep paying until you destroy the\n\nbox, so double-check the teardown.\n\nI did not start GoModel because I wanted another AI gateway in the world.\n\nI started it because the gateway I wanted to use became part of the problem. It\n\nsat on the hot path, but did not feel like hot-path software: too heavy, too\n\nslow to start, too expensive to keep around, too large for the job.\n\nThis benchmark is the result of turning that frustration into numbers.\n\nThe numbers say GoModel is small in the places I care about: `16 MB`\n\nimage,\n\n`37 MB`\n\npeak RAM, `0.56 s`\n\ncold start, `1.8 ms`\n\np50, `6.9 ms`\n\np99, and\n\n`4900 req/s`\n\nsustained throughput on a small AWS box.\n\nLiteLLM still has more providers and more features today. Portkey and Bifrost\n\nhave their own strengths. But if the gateway is going to sit between your users\n\nand every model call, I think it should first be cheap, predictable, and boring\n\nto run.\n\nGoModel is my attempt to build that kind of gateway.", "url": "https://wpnews.pro/news/benchmarking-ai-gateways-gomodel-vs-litellm-vs-portkey-vs-bifrost", "canonical_source": "https://dev.to/s-bandy/benchmarking-ai-gateways-gomodel-vs-litellm-vs-portkey-vs-bifrost-5d98", "published_at": "2026-06-26 17:51:26+00:00", "updated_at": "2026-06-26 18:33:55.727169+00:00", "lang": "en", "topics": ["ai-infrastructure", "developer-tools", "machine-learning", "large-language-models", "ai-products"], "entities": ["GoModel", "LiteLLM", "Portkey", "Bifrost", "AWS", "OpenAI", "Anthropic", "Gemini"], "alternates": {"html": "https://wpnews.pro/news/benchmarking-ai-gateways-gomodel-vs-litellm-vs-portkey-vs-bifrost", "markdown": "https://wpnews.pro/news/benchmarking-ai-gateways-gomodel-vs-litellm-vs-portkey-vs-bifrost.md", "text": "https://wpnews.pro/news/benchmarking-ai-gateways-gomodel-vs-litellm-vs-portkey-vs-bifrost.txt", "jsonld": "https://wpnews.pro/news/benchmarking-ai-gateways-gomodel-vs-litellm-vs-portkey-vs-bifrost.jsonld"}}