Six AI agent SDKs for enterprise Kubernetes, compared

wpnews.pro

There’s a question we hear constantly from platform and engineering leaders right now, “which agent SDK should we standardize on for our Kubernetes clusters?”

The honest answer is that the question is slightly wrong, and the rest of this post explains why. But it’s a fair question, so let’s compare the contenders first.

If you’re an enterprise running on-premise or in your own VPC, the SDK you pick has to do two things most of the * “build an agent in 20 lines”* tutorials skip over. It has to run in a container you control, and it has to talk to a model you can host yourself. That second one rules out a surprising amount.

The six SDKs most people are actually using #

These are the ones with the most mindshare in mid-2026. There are others, but these are the names that come up in every conversation. They sit on a rough spectrum of model freedom: most will happily run against a model you host yourself, the OpenAI SDK will too but treats that as a side path, and one of them (Anthropic’s) is tied to a single vendor’s models. I’ve ordered them with the most flexible first.

LangGraph

LangChain’s lower-level library. You model your agent as a directed graph: nodes do work, edges decide what happens next, and the whole thing checkpoints its state so a long-running agent can , resume, and even rewind. If your problem is * “this workflow is genuinely complicated and has to survive restarts,”* LangGraph is the one built for that.

For on-prem it’s reasonable. There’s a Helm chart for self-hosting, the core is MIT-licensed, and it’s model-agnostic so you can point it at a local model. The catch is operational weight: a production self-hosted deployment wants Postgres for state and Redis for streaming, so you’re running real infrastructure, not just a pod. The platform layer on top is commercial.

CrewAI

The one your team will get running fastest. You describe a “crew” of agents with roles (“researcher”, “writer”) and let them collaborate. The learning curve is the lowest of the six, the core is open source and MIT-licensed, built from scratch without a LangChain dependency, and it’s genuinely model-agnostic. People wire it up to Ollama or a self-hosted vLLM endpoint without much fuss. There’s a Helm chart for the enterprise platform, and if you want it to feel native to Kubernetes you can wrap crews as custom resources with something like Kagent.

The tradeoff for that simplicity is control. When you want fine-grained say over exactly what happens at each step, the role-based abstraction can feel like it’s deciding things for you.

Google ADK

Google’s ADK. The model here is a hierarchy: a root agent delegates to sub-agents, and it speaks the A2A (agent-to-agent) protocol natively, so agents built in different frameworks can talk to each other. It’s Apache 2.0 and ships in Python, TypeScript, Go, Java, and Kotlin, with the Python implementation the oldest and most complete. Its own docs say it “can work with almost any generative AI model,” with documented support for Claude, Ollama, vLLM, and others through LiteLLM, so it’s genuinely model-agnostic despite the Gemini-first defaults.

It looks Google-Cloud-coupled, and it does have a one-command adk deploy gke path, but that’s a convenience, not a requirement. Underneath it’s a container. You can run it on any on-prem cluster with hand-written manifests, and Google has published a real reference for running ADK against a self-hosted Llama model on vLLM. It’s Gemini-first by default, but you can bring other models through LiteLLM. Less locked-in than the branding suggests.

Microsoft Agent Framework

The grown-up merger of two earlier projects: AutoGen, which is where a lot of the multi-agent research came from, and Semantic Kernel, which is where the enterprise plumbing lived. It runs on Python and .NET, which is the real reason it’s on this list. If you’re a Microsoft and .NET shop, this is the one that speaks your language, literally.

It does two kinds of orchestration: the loose, LLM-driven kind where agents reason their way through a problem, and the deterministic, business-logic kind where you want a workflow to run the same way every time. For on-prem it’s a good citizen. It’s MIT-licensed and open source, it’s genuinely model-agnostic with first-party connectors that include Ollama, and people are already running it on AKS or plain Kubernetes against local open-weight models like Qwen or Mistral. One thing to keep straight: Microsoft’s hosted agent service is an Azure product, but the framework itself is yours to run wherever.

OpenAI Agents SDK

The cleanest developer experience of the six. It ships in Python and TypeScript, agents hand off control to each other explicitly, the API is small, and if your team already uses the OpenAI API they’ll be productive in an afternoon. For self-hosting, you bring your own container and infrastructure, which is fine.

It’s also more model-flexible than the name suggests, and this is the part worth knowing because it’s easy to miss. The API guide on the OpenAI platform site barely mentions it, but the Agents SDK’s own documentation has a “Models” page that points you to non-OpenAI providers two ways. The clean one is any OpenAI-compatible endpoint: you set a base URL and an API key, which covers local models served through vLLM or Ollama. Beyond that, official LiteLLM and Any-LLM extensions reach 100-plus providers, though the docs flag those as best-effort and beta. So you can run it fully self-hosted against your own model. OpenAI is still the default and best-supported path, but the lock-in is softer than the name implies. The next entry is where the real model lock-in lives.

Anthropic Claude Agent SDK

Anthropic’s harness, and the same engine that powers Claude Code, exposed as a library in Python and TypeScript. It spawns and supervises a CLI subprocess that owns a shell and a working directory, which is a genuinely different model from the others. Every agent is a long-lived process with state on disk, so you think about it more like running a fleet of little workers than calling a stateless API. The SDK code is MIT-licensed, though Anthropic’s docs note that use of it is governed by their Commercial Terms of Service, and Anthropic ships Dockerfiles and Kubernetes manifests for self-hosting it.

The honest caveat is the model. This runs on Claude, full stop, and it’s the only one of the six with no supported way to swap in your own model. You can route through Amazon Bedrock, Google’s Gemini Enterprise Agent Platform (formerly Vertex AI), or Azure to keep traffic inside a cloud account you control, which helps with compliance, but those are all just channels for hosting Claude, not alternative model vendors. There’s no air-gapped, weights-on-your-own-GPU story the way there is with the open-weight crowd. If your on-prem requirement is about latency, data residency, or “our cloud, our keys,” it can work. If it’s about never sending a token off the box, it can’t.

The comparison at a glance #

Picking one (or, realistically, several) #

LangGraph if the workflow is hard and stateful. CrewAI if you want multi-agent collaboration running by Friday. ADK if you’re betting on A2A and a mix of frameworks talking to each other. Microsoft Agent Framework if your stack is already C#/.NET, or you want both creative and deterministic orchestration in one place. OpenAI’s SDK for the cleanest developer experience, with non-OpenAI and local models available through OpenAI-compatible endpoints (or its beta LiteLLM extension) if you need them. Claude’s Agent SDK if you want the Claude Code engine as a library and Bedrock or Gemini Enterprise is close enough to “on-prem” for you.

Five of the six can run against a model you host yourself. Four treat that as a first-class path, the OpenAI SDK does it through OpenAI-compatible endpoints (with LiteLLM as a beta add-on), and only Anthropic’s Claude Agent SDK is locked to a single vendor’s models, though Bedrock or Gemini Enterprise at least keep that traffic in your own cloud. For an on-premise enterprise that model-freedom question is the biggest filter. After that, the choice is mostly about how your team thinks: graphs, crews, hierarchies, or handoffs.

The honest part, though, is that most enterprises don’t pick one. The data team gets something working in CrewAI in a day. A platform engineer builds the stateful pipeline in LangGraph because nothing else handles the checkpointing. The .NET team reaches for Microsoft’s framework. Someone ships a Claude or OpenAI SDK agent before anyone writes a standard down. A year later you’re running several of these at once, plus whatever lands next quarter. That’s not a failure of planning. It’s just what a healthy, fast-moving org looks like, and it’s worth designing for rather than fighting.

Governing the fleet you’ll actually have #

Here’s the catch that sits underneath all six options. Once an agent is a running pod, the SDK that built it no longer matters. From the cluster’s point of view, every agent looks the same: a workload making network calls to a model and to tools, acting on behalf of someone, doing things you didn’t watch happen. The SDK’s view stops at the edge of its own process. Your security and platform teams’ problem doesn’t.

None of the six frameworks govern that. It isn’t their job. They help a developer build an agent; they don’t tell you which agents exist in your cluster, what they’re allowed to reach, or what they actually did with the credentials you handed them. And because you’ll be running more than one framework, anything that only governs agents written a particular way leaves most of your fleet uncovered.

This is the gap Tigera Lynx is built to close. It governs agents at the platform layer instead of inside any single SDK, so the same controls apply whether the agent was written in LangGraph, CrewAI, ADK, or something that doesn’t exist yet. Lynx discovers the agents already running, including the ones nobody registered, using eBPF down at the kernel where the network call happens. An agent that skips your gateway entirely still shows up, because a syscall is a syscall regardless of the framework above it.

From there it puts a single control point in the path of every agent interaction and requires no changes to the agent’s code to do it. If governance depends on every developer importing your library and using it correctly, you don’t have governance, you have a polite request. Lynx works at the level where that assumption can’t break: discovery, policy, and a full audit trail your security team gets handed instead of reconstructing after the first incident. It’s already running in production at large banks, which are not known for a relaxed view of risk. Pick the SDK that fits how your team builds. The decision that actually carries risk is whether anything sits between your agents and the rest of your cluster once they’re live, and that layer has to be SDK-agnostic, because your fleet already is. If your teams are shipping agents faster than you can govern them, see how Lynx governs AI agents on Kubernetes.

See Lynx discover and govern agents in a 3-minute interactive demo →

source & further reading

tigera.io — original article How Lynx Works: A Technical Walkthrough Why We Built Lynx: Bringing Control to the Age of AI Agents Five Principles of an Accountable AI Agent Network: How to Evaluate Any Governance Platform