When building reliable AI agents, there are two dominant approaches.
Approach A: Proxy Gateway (LiteLLM, Braintrust, etc.) App sends request to Gateway Proxy which forwards to LLM Provider. Requires Docker, database, operations team.
Approach B: Embedded SDK (NeuralBridge) App plus SDK sends directly to LLM Provider. One dependency, pip install.
Every proxy gateway adds 30-200ms of network latency per call. For an agent that makes 10 LLM calls, that is 300-2000ms of unnecessary overhead.
Latency breakdown:
Embedded reliability eliminates the network hop:
| Factor | Gateway | Embedded SDK |
|---|---|---|
| Added latency | 30-200ms | ~0ms |
| Dependencies | Docker, DB, Redis | 1 (httpx) |
| Install size | 500MB+ | 375 KB |
| Single point of failure | Yes (proxy) | No |
| Ops cost | High | Zero |
Gateways serve a purpose for centralized logging, auth, and rate limiting. But for latency-sensitive AI agents, embedding reliability directly in the process is strictly better.
The ideal stack: embedded SDK for reliability plus lightweight observability layer on top.
https://github.com/hhhfs9s7y9-code/neuralbridge-sdk NeuralBridge: Apache 2.0, 1 dependency, 375 KB.