Someone builds a “router” that classifies incoming requests and sends them to the right specialist agent. Two weeks later, a different team builds a “dispatcher” that does almost the same thing. A month after that, someone else ships a “coordinator”. It’s mostly the similar idea, different name, different bugs. Meanwhile, nobody can version these systems, test them without hitting the real API, or debug what actually happened when a run goes wrong under one eco-system.
Engineers working with frameworks like LangGraph, CrewAI, AutoGen which are industry standards that give you raw primitives but leave everything else to you. You get graph nodes and message passing. You don’t get a shared vocabulary of named patterns, a way to declare your system as code, cost visibility, memory management, or an observability layer that knows what a “Debate” pattern is versus a generic span.
PyAgent fills all of it. It’s eight packages that form a coherent stack, from YAML specification through to a visual control plane. You can use one package or all eight. This is what the full stack looks like and how each piece fits.
The intended flow is:
Let’s walk through each layer.
The standard way to write multi-agent systems in Python is to wire agents together in code. That works until you need to version them, review them in a PR, run them in CI without hitting real APIs, or hand the system to a team member who has to grep through call sites to understand how things connect.
Blueprint separates what your system does — the YAML spec — from how it runs — the compiler and runtime. The same file that deploys to production can be statically validated in CI, simulated with MockLLM in tests, and diffed against the previous version to see exactly what changed.
Think of it as Kubernetes for agent systems: infrastructure as code, but for LLM workflows.
Sample blueprint kick-starter to load specified blueprint and compile it.
The Blueprint YAML supports seven top-level sections: metadata, providers, agents, workflows, context, contracts, and observability. Beyond runtime execution, the same spec is used by the Blueprint CLI for static validation, Mermaid diagram rendering, semantic diffing between versions, and contract conformance testing — all without making a single LLM call.
Patterns are the execution layer. Each is a named, tested, composable class that encodes one recurring coordination problem and its trade-offs. The full set is 18 patterns across four tiers.
In a codebase where one team builds a “router,” another builds a “dispatcher,” and a third builds a “coordinator,” you can’t reason about the system at a glance. When the codebase instead uses Supervisor, FanOutFanIn, and Debate, every engineer on the team immediately knows what each piece does, what it costs, and what its failure modes are. The pattern name is the shared vocabulary.
Because every pattern implements the same Pattern base class, any pattern can be used anywhere an Agent is expected. A FanOut of Pipelines, wrapped in SelfReflection, is three classes:
When you’re not sure which pattern to reach for, thePatternAdvisorcan suggest one based on your task description and constraints.
pyagent-providers is a unified registry over Anthropic, OpenAI, Gemini, and any LiteLLM-compatible backend. Rather than wiring LLM clients directly into agents, you define a provider registry in your Blueprint (or in code) and reference providers by name. The registry handles fallback chains, capability negotiation, and cost-aware routing automatically.
If the primary provider is unavailable or over rate limit, the registry falls back to the next in the chain — without the agent needing to know. The provider section in Blueprint also wires into the Router for difficulty-aware model selection (covered below).
pyagent-context gives agents memory that persists within and across runs. The ContextLedger manages three tiers:
Working memory is the in-flight context for the current task — what’s been said so far in this run.
Session memory persists across turns in a conversation, but not across separate sessions. Useful for stateful chat applications where the agent needs to remember what was discussed earlier in the conversation.
Semantic memory is a long-term store for facts, preferences, and learned context that should persist across sessions. Agents that read from semantic memory can recall things like user preferences, past decisions, or domain knowledge that was accumulated in prior runs.
In Blueprint, context configuration lives in its own section:
The context layer also handles redaction of PII before it reaches external providers — the same PIIGuard available in the Guardrails API but applied at the memory level.
In a five-stage Pipeline, if each agent outputs 2,000 tokens, the fifth agent receives 8,000 tokens of prior context before it even starts its own work. pyagent-compress addresses this by trimming inter-agent message transfer while preserving key information.
The package also supports token budget enforcement (hard cutoffs that trigger compression before a call would exceed the model’s context window) and agent pruning (removing older agents’ contributions from the shared context when they’re no longer relevant to the current stage).
In long pipelines where compression is always-on, the OTel trace will include a pyagent.compress.savings_pct attribute on every span, so you can see exactly how much the compressor is saving at each step.
pyagent-trace wraps the whole stack with structured observability. Every agent call, pattern execution, and LLM invocation emits events through a TraceEventBus. Consumers hook to the exporters, cost trackers, whereas Studio hook to subscribe the event-bus and receive events in real time.
The key design is that the trace is pattern-aware. Spans carry attributes like pyagent.pattern.type, pyagent.agent.name, pyagent.router.difficulty, and pyagent.compress.savings_pct. This means your tracing backend doesn't just see generic HTTP calls — it knows a Debate ran for 3 rounds, the judge was claude-sonnet, and the whole thing cost $0.01032.
The simplest integration is the decorator approach — inherit from a traced variant of any pattern class:
CostTracker aggregates costs by pattern, model, and agent across a session:
Recorder captures every LLM call to JSONL for deterministic replay — useful for debugging a specific run without re-incurring API costs:
OTel backends are supported out of the box — Jaeger for local development, Langfuse for production LLM observability, Grafana Tempo + Prometheus for infrastructure teams, and any OTLP endpoint. A recorded trace looks like:
Trace: investment-analysis / debate (9.8s total, $0.01032)├── pyagent.agent.bull 2.4s gemini-2.5-flash 450→380 tok $0.00094├── pyagent.agent.bear 2.7s gemini-2.5-flash 450→410 tok $0.00098├── pyagent.agent.bull 2.1s (round 2) 410→360 tok $0.00086├── pyagent.agent.bear 1.9s (round 2) 410→380 tok $0.00084└── pyagent.agent.judge 0.7s claude-sonnet-4 1200→480 tok $0.00840
pyagent-studio is a kubectl-style CLI and web control plane for designing, simulating, debugging, and governing multi-agent systems. The analogy from the docs is direct: Studio is to PyAgent what kubectl + the Kubernetes Dashboard are to Kubernetes.
The web dashboard has nine sections:
/overview gives a system summary — loaded blueprints, provider health, recent run stats.
/workflows and /agents let you inspect how agents are wired together and what their configurations are.
/simulate runs a workflow against MockLLM (no API cost) and shows the full execution path.
/traces is the trace explorer, powered by the JSONL files from pyagent-trace. You can filter by agent, pattern type, cost range, or duration, and drill into any span for the full input/output.
/governance validates loaded blueprints against their contracts and flags any violations.
/providers shows provider health, latency percentiles, and per-model cost breakdowns.
/diff renders semantic diffs between two Blueprint versions side by side.
The CLI and the web dashboard share the same underlying service layer, so anything you can do in the UI you can also script with pyagent commands in CI.
pyagent-router scores incoming queries for difficulty and selects the most cost-effective model that's capable of handling them. It knows pricing for the full range of current models — GPT-4.1-nano through o3, Haiku through Sonnet, Gemini Flash through Pro — and applies that knowledge at call time.
The router integrates with the Blueprint’s providers section through routing strategies. cost_optimized minimises spend; latency minimises wall-clock time; capability ensures the model can handle the inferred difficulty. In practice, most teams start with cost_optimized and adjust after seeing their trace cost breakdowns in Studio.
This is also how Talker-Reasoner works under the hood — the router is the mechanism that decides whether the talker or the reasoner handles a given query.
If you’re new to multi-agent systems, start with pyagent-patterns and the Pipeline and Supervisor patterns. Get something running, then layer in routing and context once you've established the basic shape of your system.
If you’re adding PyAgent to an existing codebase, the hooks guide is worth reading first — it shows how to attach trace, context, compression, and cost tracking to agents you’ve already built, without rewiring everything.
If you’re building for production, start with Blueprint. The discipline of declaring your system in YAML pays off quickly in CI validation, versioned diffs, and simulation — especially once the system gets complex enough that you can’t hold it all in your head.
PyAgent is MIT licensed, requires Python 3.11+, and has zero mandatory dependencies in the core packages.
PyAgent: A Design Pattern Orchestrator for Multi-Agent LLM Systems was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.