{"slug": "pydantic-ai-typed-testable-agents-for-engineers-who-like-guarantees", "title": "Pydantic AI: Typed, Testable Agents for Engineers Who Like Guarantees", "summary": "Pydantic AI introduces a typed agent framework that enforces output validation, preventing hallucinated data from reaching production APIs. The framework uses generic parameters for dependency and output types, and automatically retries on validation errors, ensuring agents return validated objects or exceptions.", "body_md": "You ship an agent that resolves billing disputes. It works in the demo. Two weeks later a support ticket lands: the agent tried to refund $4,000 on a $19 charge. You read the trace. The model returned a JSON blob, your code did `json.loads`\n\n, pulled `amount`\n\n, and passed it straight to the payments API. No cap. No type. No check. The model hallucinated a number and your code trusted it.\n\nThe model is stochastic. Your code does not have to be. The gap between those two facts is where most production agent bugs live, and it is exactly the gap [Pydantic AI](https://ai.pydantic.dev) is built to close.\n\nMost agent frameworks hand you an `Agent`\n\nobject and a bag of strings. Pydantic AI hands you `Agent[Deps, Output]`\n\n— a generic parameterized by its dependency type and its output type. The IDE and your type checker read those parameters. So does the runtime.\n\nInstall pulls in the framework plus an optional tracing extra:\n\n```\npip install \"pydantic-ai[logfire]\"\n```\n\nThe smallest program that earns its keep:\n\n``` python\nfrom dataclasses import dataclass\nfrom pydantic import BaseModel\nfrom pydantic_ai import Agent, RunContext\n\n@dataclass\nclass Deps:\n    customer_name: str\n\nclass SupportReply(BaseModel):\n    reply: str\n    escalate: bool\n\nagent = Agent(\n    \"anthropic:claude-opus-4-8\",\n    deps_type=Deps,\n    output_type=SupportReply,\n    system_prompt=\"You are a support agent.\",\n)\n```\n\nA tool is a plain function whose type hints become the schema the model sees, and the run returns the validated `SupportReply`\n\n:\n\n``` php\n@agent.tool\ndef customer_name(ctx: RunContext[Deps]) -> str:\n    return ctx.deps.customer_name\n\nresult = agent.run_sync(\n    \"What is my name?\",\n    deps=Deps(customer_name=\"Ana\"),\n)\nprint(result.output.reply)\nprint(result.output.escalate)\n```\n\nThree things are load-bearing there. `deps_type`\n\ndeclares what the agent needs from you. `output_type`\n\ndeclares what it must return. `@agent.tool`\n\nwraps a plain Python function and reads its type hints to build the tool schema the model sees.\n\nPydantic AI ships no implicit default model, so you always pass a model string. This post reaches for Anthropic's Claude for a reason: it follows tool schemas closely and returns well-formed structured output, which is precisely what the validation layer below leans on.\n\nWhen the model returns something that does not parse into `SupportReply`\n\n, Pydantic AI does not hand you a broken object. It catches the `ValidationError`\n\n, formats it, and sends it back to the model as a correction request. You get a validated object or a clean exception — never a string with a JSON fence stuck to it.\n\nPush that idea onto the billing agent and the types stop being documentation. They become rails.\n\n``` python\nfrom dataclasses import dataclass\nfrom typing import Literal\nfrom pydantic import BaseModel, Field\nfrom pydantic_ai import Agent, RunContext\n\n@dataclass\nclass BillingDeps:\n    customer_id: str\n    api_key: str\n\nclass BillingAction(BaseModel):\n    action: Literal[\"refund\", \"retry\", \"escalate\"]\n    amount_cents: int = Field(ge=0, le=20_000)\n    reason: str\n\nagent = Agent(\n    \"anthropic:claude-opus-4-8\",\n    deps_type=BillingDeps,\n    output_type=BillingAction,\n    system_prompt=(\n        \"You resolve billing disputes. Refund under \"\n        \"$200, retry on transient failures, escalate \"\n        \"everything else.\"\n    ),\n)\n```\n\nThe tools read from `ctx.deps`\n\nand their return types feed straight into the schema the model reads:\n\n``` php\n@agent.tool\ndef last_charge(ctx: RunContext[BillingDeps]) -> int:\n    \"\"\"Return the last charge in cents.\"\"\"\n    # real impl: call billing API with ctx.deps\n    return 1899\n\n@agent.tool\ndef charge_status(\n    ctx: RunContext[BillingDeps],\n) -> Literal[\"ok\", \"failed\", \"pending\"]:\n    \"\"\"Return the status of the last charge.\"\"\"\n    return \"failed\"\n```\n\nRun it, and the output is a validated `BillingAction`\n\nor an exception, never a raw string:\n\n```\nresult = agent.run_sync(\n    \"My card was charged but the order never \"\n    \"shipped. Fix it.\",\n    deps=BillingDeps(customer_id=\"cus_123\", api_key=\"...\"),\n)\nassert isinstance(result.output, BillingAction)\nprint(result.output.action, result.output.amount_cents)\n```\n\nEvery annotation is doing work. `output_type=BillingAction`\n\nguarantees the return is a `BillingAction`\n\nor an exception. `Literal[\"refund\", \"retry\", \"escalate\"]`\n\ncloses the action set so the model cannot invent a fourth. `Field(ge=0, le=20_000)`\n\ncaps the refund at two hundred dollars in the type system, not in a post-hoc check you will forget to write. And the tool return types become part of the schema the model reads: `charge_status`\n\ntelling the model that `\"ok\"`\n\n, `\"failed\"`\n\n, and `\"pending\"`\n\nare the only legal answers is something it sees at call time.\n\nThe $4,000 refund from the opening cannot happen here. It fails validation before it reaches your payments code, and the model gets one shot to correct itself.\n\nThat correction loop is worth respecting before you depend on it. On a validation failure, the framework formats the error and posts it back to the model as a tool-call-style correction. The model gets a configurable number of retries — set `retries`\n\non the `Agent`\n\n.\n\nMost of the time this is what you want. Sometimes it is not. On a mistyped field name the model can burn three retries guessing at the schema, returning the same wrong shape each time and running up your token bill. Watch the retry count in your traces. If you see the same validation error repeating, the prompt is the bug, not the retry ceiling.\n\nHere is where the type discipline turns into something you feel every day. An agent is a function from input to output with a network call and a nondeterministic model in the middle. That normally makes it miserable to test. Pydantic AI ships two tools that make it ordinary.\n\n`TestModel`\n\nruns the agent end to end without any network call. It inspects your output schema, generates data that satisfies it, and calls every tool once. It is the \"does this wire up at all\" test.\n\n``` python\nfrom pydantic_ai.models.test import TestModel\n\ndef test_billing_agent_wiring():\n    with agent.override(model=TestModel()):\n        result = agent.run_sync(\n            \"charged twice\",\n            deps=BillingDeps(customer_id=\"c\", api_key=\"k\"),\n        )\n    assert isinstance(result.output, BillingAction)\n    assert 0 <= result.output.amount_cents <= 20_000\n```\n\nNo API key. No latency. No token spend. The test asserts the contract holds (a `BillingAction`\n\nwith an amount inside the capped range) and runs in milliseconds in CI.\n\nWhen you need to pin exact behavior, `FunctionModel`\n\nlets you script what the model returns for a given set of messages:\n\n``` python\nfrom pydantic_ai.models.function import FunctionModel, AgentInfo\nfrom pydantic_ai.messages import (\n    ModelMessage,\n    ModelResponse,\n    ToolCallPart,\n)\n\ndef always_escalate(\n    messages: list[ModelMessage], info: AgentInfo\n) -> ModelResponse:\n    args = {\n        \"action\": \"escalate\",\n        \"amount_cents\": 0,\n        \"reason\": \"policy\",\n    }\n    return ModelResponse(\n        parts=[ToolCallPart(\"final_result\", args)]\n    )\n\ndef test_escalation_path():\n    with agent.override(model=FunctionModel(always_escalate)):\n        result = agent.run_sync(\n            \"refund $5000 now\",\n            deps=BillingDeps(customer_id=\"c\", api_key=\"k\"),\n        )\n    assert result.output.action == \"escalate\"\n```\n\nYou are testing your own logic: the tool wiring, the dependency injection, the validation. The model is held fixed. The stochastic part is mocked out, so the test is deterministic and fast. This is the same discipline you already apply to a database or an HTTP client: swap the real dependency for a fake at the boundary. `agent.override`\n\nis that boundary.\n\nStatic types cannot make a language model deterministic. What they can do is bound its output before that output touches anything that costs money or mutates state. The model proposes; the type system disposes. `Literal`\n\ncloses a set. `Field`\n\nclamps a range. `output_type`\n\nrefuses a malformed shape. Everything the model returns passes through a gate you defined in Python, checked by Pyright before you ship and by Pydantic at runtime.\n\nFor a shop already living in Pydantic — most FastAPI backends in 2026 — the payoff is that agents start to feel like routes. Same type hints, same IDE support, same validation contract, same test ergonomics. The agent is no longer a special, scary thing bolted onto the side of the system. It is another typed function you can reason about.\n\nStart with one agent. Give it an `output_type`\n\nwith a `Field`\n\nconstraint on the one value that could hurt you if the model got it wrong. Write a `TestModel`\n\ntest for it. Ship that. You will have closed the exact gap that produced the $4,000 refund, and you will have a test that proves it stays closed.\n\nIf you want the wider map — how typed agents sit next to the other frameworks, how to trace them once they are running, and how to keep their cost honest — that is what *The AI Engineer's Library* covers. *Agents in Production* walks the framework landscape and the patterns for building and shipping multi-step agents; *Observability for LLM Applications* is the companion on tracing, evals, and cost. Both aim at the same thing this post does: agents you can trust because you can see and constrain what they do.", "url": "https://wpnews.pro/news/pydantic-ai-typed-testable-agents-for-engineers-who-like-guarantees", "canonical_source": "https://dev.to/gabrielanhaia/pydantic-ai-typed-testable-agents-for-engineers-who-like-guarantees-2cim", "published_at": "2026-07-04 09:35:13+00:00", "updated_at": "2026-07-04 09:48:43.709292+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools"], "entities": ["Pydantic AI", "Anthropic", "Claude"], "alternates": {"html": "https://wpnews.pro/news/pydantic-ai-typed-testable-agents-for-engineers-who-like-guarantees", "markdown": "https://wpnews.pro/news/pydantic-ai-typed-testable-agents-for-engineers-who-like-guarantees.md", "text": "https://wpnews.pro/news/pydantic-ai-typed-testable-agents-for-engineers-who-like-guarantees.txt", "jsonld": "https://wpnews.pro/news/pydantic-ai-typed-testable-agents-for-engineers-who-like-guarantees.jsonld"}}