Agent Harness Explained: Build Production-Ready AI Agents with Microsoft Agent Framework Microsoft's Agent Framework introduces the `create_harness_agent` function, a production-ready pattern that pre-assembles critical infrastructure components—including tool-calling loops, conversation history management, context-window compaction, planning mechanisms, durable memory, skill extensibility, and observability—into a single, configurable pipeline. This approach eliminates the "hidden complexity tax" developers face when wiring these components manually, where agents commonly fail due to context window limits, lost tool results, untracked parallel tasks, or production crashes with no debugging trace. By separating agent instructions (what to do) from execution infrastructure (how to reliably run), the harness pattern enables developers to focus on agent logic rather than scaffolding, similar to how Kubernetes automates container deployment versus manual `docker run` commands. Meta Description:Learn what an agent harness is, why it matters for production AI systems, and how to implement one step-by-step using Microsoft Agent Framework's create harness agent -- with real Python code, architecture diagrams, and deep technical walkthroughs. You have a capable LLM. You have a clear use case. You write a chat loop in 20 lines of Python and it works -- until the context window fills up and the agent loses the thread. Or it calls a tool but forgets the result two turns later. Or it starts three things at once and has no way to track which are done. Or it crashes in production with no trace of what went wrong. This is the hidden complexity tax of AI agents . Every production-grade agent needs: a tool-calling loop, conversation history management, context-window compaction, a planning mechanism, durable memory, skill extensibility, and observability. If you wire each of these yourself, you spend more time on infrastructure than on the actual intelligence. And when one breaks, the entire agent fails in ways that are nearly impossible to debug. The agent harness pattern solves this by pre-assembling all of those components into a single, tested, configurable pipeline -- so you can focus on what your agent does, not how to keep it running. In this deep dive, you'll learn: create harness agent Let's build something that lasts. The term "harness" comes from two established software patterns. In test engineering , a test harness is the scaffolding that configures, instruments, and tears down a system under test -- so test authors write test logic , not setup code. In dependency injection , a DI container is a wiring harness -- it resolves and connects components so application code never calls new directly. An agent harness applies the same thinking to AI agents. It is: A factory or container that constructs a fully wired, ready-to-run agent by assembling all required infrastructure components -- history, tools, memory, observability, planning -- from a single, declarative configuration point. The key insight is separation of concerns : your agent instructions define what the agent should do; the harness defines how that intent is reliably executed, persisted, and observed. Architecture Note: Think of the harness like a production-grade Kubernetes deployment vs. running a container manually with docker run . Both run your code, but one handles restarts, resource limits, networking, logging, and scaling automatically. To appreciate what the harness eliminates, consider what you would have to build manually for a real agent: | Component | Manual Responsibility | |---|---| Tool Calling Loop | Detect tool calls, dispatch, collect results, re-invoke model, set termination condition | History Management | Serialize/deserialize conversation history, decide storage backend, handle multi-turn sessions | Context Compaction | Monitor token count, decide eviction strategy, preserve semantic context | Planning / Todo | Design a task-tracking schema, prompt the model to use it, parse structured outputs | Mode Management | Track agent state planning vs. executing , implement approval gates | Persistent Memory | Define memory schema, write/read from durable store, inject into context at the right time | Skills Loading | Design skill discovery protocol, filter by relevance, progressively inject into context | Telemetry | Instrument every model call, tool dispatch, and context mutation with spans and metrics | Writing all of this correctly, testing it, and keeping it working as your LLM provider updates its API is a substantial engineering effort. The harness pattern packages this effort once, so you -- and every developer on your team -- never has to repeat it. The difference is immediately visible in code. Here is a minimal but realistic "manual" agent that handles just the tool-calling loop: ? DIY approach -- and this is just the tool loop, not memory, compaction, or telemetry import json async def run agent manually client, tools, messages : while True: response = await client.chat messages=messages, tools=tools if response.finish reason == "stop": return response.content if response.finish reason == "tool calls": messages.append response.message add assistant turn for tool call in response.tool calls: Manually dispatch each tool tool fn = tool registry.get tool call.function.name if not tool fn: raise ValueError f"Unknown tool: {tool call.function.name}" result = await tool fn json.loads tool call.function.arguments messages.append { "role": "tool", "tool call id": tool call.id, "content": str result } Loop back to model -- now you ALSO need token counting, history management, todo tracking, telemetry... Now here is the harness equivalent that does all of the above plus history, compaction, planning, memory, and telemetry: ? Harness approach -- full production pipeline in 4 lines from agent framework import create harness agent from agent framework.foundry import FoundryChatClient from azure.identity import AzureCliCredential agent = create harness agent client=FoundryChatClient credential=AzureCliCredential , max context window tokens=128 000, max output tokens=16 384, Same outcome. Orders of magnitude less surface area for bugs. Microsoft Agent Framework MAF is Microsoft's open-source, production-grade framework for building AI agents and multi-agent workflows. It is the direct successor to both AutoGen and Semantic Kernel -- combining AutoGen's simple, composable agent abstractions with Semantic Kernel's enterprise features: session-based state management, type safety, middleware pipelines, and comprehensive telemetry. MAF supports: At the center of MAF is the Agent base class. All agent types derive from this common base, which defines a consistent interface for multi-agent orchestration. The default agent runtime execution model follows a deterministic loop: User Message ? ? ??????????????????????????????????????????????????? ? Agent Runtime ? ? ? ? 1. Context Assembly ? ? ?? HistoryProvider + ContextProviders ? ? Memory, Skills, Mode, Todos ? ? ? ? 2. Middleware Pre-Processing ? ? ?? Telemetry, Compaction checks ? ? ? ? 3. Model Inference ? ? ?? FoundryChatClient / OpenAI / Anthropic ? ? ? ? 4. Tool Dispatch Loop ? ? ?? FunctionInvocationLayer ? ? Tool call - execute - result - loop ? ? ? ? 5. History Persistence ? ? ?? Saved after every service call ? ? ? ? 6. Middleware Post-Processing ? ? ?? Telemetry spans closed ? ??????????????????????????????????????????????????? ? ? Agent Response streaming or complete This loop runs transparently on every agent.run call. The harness ensures all six phases are populated and correctly ordered. MAF's Python implementation is split into composable packages. The top-level agent-framework meta-package installs all of them: pip install agent-framework | Package | Purpose | |---|---| core | Agent base, session management, middleware, context providers | foundry | FoundryChatClient , Azure AI Foundry integration | tools | get web search tool and other built-in tool factories | orchestrations | Multi-agent workflow graphs sequential, concurrent, handoff | devui | Interactive browser-based DevUI for local agent debugging | a2a | Agent-to-Agent A2A protocol support for cross-platform agents | lab | Experimental features: benchmarking, RL, research | create harness agent create harness agent is a factory function -- a single call that constructs and wires a fully operational agent. Let's look at its complete signature and then break down every component it assembles: python from agent framework import create harness agent agent = create harness agent Required: the LLM backend client=client, Required: token budget for the agent's context window max context window tokens=128 000, max output tokens=16 384, Optional: agent identity name="MyAgent", description="What this agent does.", agent instructions="System prompt / instructions for the agent.", Optional: feature toggles all enabled by default disable todo=False, TodoProvider disable mode=False, AgentModeProvider disable compaction=False, CompactionProvider memory store=None, MemoryContextProvider None = disabled skills directory=None, SkillsProvider None = disabled extra tools= , Additional tools to inject This single call assembles 8 sub-systems automatically. Here is each one in depth. This is the agent's agentic loop engine -- the component that makes the agent actually do things rather than just respond. When a model returns a response containing tool calls, the FunctionInvocationLayer middleware: Tool functions are registered as plain Python callables with type-annotated signatures. The framework automatically generates JSON schema from the annotations: php async def search database query: str, limit: int = 10 - list dict : """Search the product database for matching items. Args: query: The search query string. limit: Maximum number of results to return. Returns: List of matching product records. """ Your implementation here return results The framework auto-generates the JSON schema and handles dispatch agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, extra tools= search database , The InMemoryHistoryProvider manages the conversation history for the agent's session. What makes this different from a plain message list is when persistence occurs: after every individual model service call , not just at the end of the agent's turn. This matters for reliability. Consider an agent making three tool calls in a single turn. Without per-call persistence, a crash on the third call loses work from calls one and two. With per-call persistence, the agent can resume from the last successful state. Sessions are created per-conversation to isolate history session = agent.create session All turns in this session share and accumulate history result 1 = await agent.run "Research quantum computing trends", session=session result 2 = await agent.run "Now summarize your findings in bullet points", session=session Turn 2 has full memory of everything from turn 1 For production deployments that require history to survive process restarts, MAF supports swapping InMemoryHistoryProvider with durable backends -- Azure Cosmos DB, Redis, or any custom implementation via the IHistoryProvider interface. Context-window overflow is one of the most common production failures for long-running agents. The CompactionProvider addresses this with a two-pronged strategy: Sliding Window: When total token count approaches max context window tokens , older messages are pruned. The system prompt and most recent N messages are always preserved. Tool Result Compaction: Tool results are often verbose -- a web search might return thousands of tokens. The compaction layer intelligently summarizes tool results when under token pressure, keeping essential information while reducing usage. Token budget math: available context = max context window tokens - max output tokens - system prompt tokens Compaction fires BEFORE the hard limit is hit agent = create harness agent client=client, max context window tokens=128 000, GPT-4o's total window max output tokens=16 384, Reserved for model's response ~112K tokens available for history + context providers Compaction fires automatically to stay within budget Or disable it if you manage token budget yourself: agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, disable compaction=True, The TodoProvider gives the agent a structured task-tracking system within its own context. Rather than relying on the agent to implicitly remember its progress, TodoProvider exposes tool functions the agent can call to manage an explicit todo list: create todo title, description -- Creates a new work item complete todo id -- Marks an item done list todos -- Gets all pending items update todo id, status -- Updates item statusThis is deceptively powerful. The agent can decompose a complex task into explicit work items, track completion, and resist skipping steps -- all driven by its own reasoning. User: "Research the top 5 AI agent frameworks and compare them." Agent internal reasoning with TodoProvider : - create todo "Research AutoGen" - create todo "Research LangGraph" - create todo "Research CrewAI" - create todo "Research Microsoft Agent Framework" - create todo "Write comparison table" Agent proceeds autonomously - complete todo "Research AutoGen" - complete todo "Research LangGraph" - ... and so on until all todos are done The AgentModeProvider implements a two-phase workflow that mirrors how skilled professionals approach complex tasks: Phase 1 -- Plan Mode Interactive In plan mode the agent is collaborative : This is the human-in-the-loop gate -- users can modify the plan or redirect entirely before any autonomous work begins. Phase 2 -- Execute Mode Autonomous Once approved, the agent: This pattern is what makes agents trustworthy in production -- the human sees and approves the plan, and the agent executes it faithfully. The MemoryContextProvider provides file-based durable memory -- a persistent store the agent can read from and write to across sessions and compaction events. This solves a critical gap in pure in-context memory: when compaction fires, older information is dropped. If the agent wrote a research finding 50 messages ago, it may no longer be in context. But if the agent saved it to memory storage, the MemoryContextProvider re-injects it at the start of every context assembly cycle. python import tempfile from pathlib import Path memory dir = Path tempfile.mkdtemp / "agent memory" memory dir.mkdir parents=True, exist ok=True agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, memory store=str memory dir , Enable file-based persistent memory The agent can now use internal memory tools: memory write key, content -- Persist information to disk memory read key -- Retrieve persisted information Future sessions re-inject saved memory automatically into context The SkillsProvider enables runtime skill discovery and progressive loading . Skills are domain-specific capability packages -- sets of tools, instructions, and knowledge -- that the agent can discover and load on demand. Rather than loading all possible tools upfront wasting context tokens , SkillsProvider implements progressive loading : it first presents skill summaries, and the agent selectively loads full skill definitions it actually needs. A general-purpose agent can become a specialized SQL expert, code reviewer, or API integration specialist -- dynamically, based on what the task requires. Skills can be authored in three ways as of the latest MAF release : agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, skills directory="./skills", Load skills from this directory Example skills/ directory: skills/ ??? web research.yaml <- Web search + summarization capabilities ??? data analysis.yaml <- Pandas/SQL data analysis capabilities ??? code review.yaml <- Code quality review capabilities ??? report writing.yaml <- Document formatting capabilities The AgentTelemetryLayer instruments every observable event in the agent's lifecycle: | Telemetry Event | What is Captured | |---|---| agent.run.start | Session ID, user input, agent name, timestamp | agent.model.call | Model name, token counts prompt/completion , latency | agent.tool.call | Tool name, arguments, result size, latency | agent.compaction.triggered | Token counts before/after, messages evicted | agent.mode.switch | From/to mode, trigger reason | agent.run.complete | Total tokens, tool calls, total latency, final status | python Configure an OTLP exporter before creating the agent from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.exporter.otlp.proto.grpc.trace exporter import OTLPSpanExporter from opentelemetry.sdk.trace.export import BatchSpanProcessor Point at Azure Monitor / Jaeger / Grafana Tempo / Datadog, etc. exporter = OTLPSpanExporter endpoint="http://localhost:4317" provider = TracerProvider provider.add span processor BatchSpanProcessor exporter trace.set tracer provider provider The harness agent automatically uses the configured tracer -- no extra code needed agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, Now let's build the complete harness-based Research Agent from the official MAF repository https://github.com/microsoft/agent-framework/tree/main/python/samples/02-agents/harness -- line by line, with full commentary. Step 1: Install Microsoft Agent Framework pip install agent-framework Step 2: Authenticate with Azure az login Step 3: Create your .env file cat .env << EOF FOUNDRY PROJECT ENDPOINT=https://your-project.services.ai.azure.com/api/projects/your-project-name FOUNDRY MODEL=gpt-4o EOF Finding Your Foundry Endpoint:Navigate to your Azure AI Foundry project in the Azure Portal - Project Overview - copy the "Project endpoint" URL. Here is the absolute minimum viable harness agent -- 4 lines of setup that give you a fully operational agent with history, compaction, todos, planning, and telemetry: python minimal harness.py import asyncio from agent framework import create harness agent from agent framework.foundry import FoundryChatClient from azure.identity import AzureCliCredential from dotenv import load dotenv async def main : MAF does NOT auto-load .env -- must be explicit load dotenv 1. LLM backend using your az login session no secrets in code client = FoundryChatClient credential=AzureCliCredential 2. Harness agent -- all 8 sub-systems active by default agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, 3. Create a session isolates conversation history session = agent.create session 4. Run the agent response = await agent.run "What are the latest trends in AI agent frameworks?", session=session, print response if name == " main ": asyncio.run main Now the complete research agent from the official repository with detailed inline commentary: harness research.py Source: https://github.com/microsoft/agent-framework/tree/main/python/samples/02-agents/harness import asyncio from agent framework import create harness agent from agent framework.foundry import FoundryChatClient from azure.identity import AzureCliCredential from dotenv import load dotenv ?????????????????????????????????????????????????????????????????????????? SECTION 1: Agent Instructions The system prompt is injected at every context assembly cycle. It is ALWAYS present -- even after compaction fires -- because the harness preserves the system prompt unconditionally. Key best practices here: - Clear role: "You are a research assistant" - Quality criteria: multiple sources, cross-referencing - Output format: Markdown, headings, inline citations - Durable memory: save final report to file memory so it survives compaction and persists across sessions ?????????????????????????????????????????????????????????????????????????? RESEARCH INSTRUCTIONS = """\ Research Assistant Instructions You are a research assistant. When given a research topic, research it thoroughly using web search and web browsing. Use your knowledge to form good search queries and hypotheses, but always verify claims with the tools available to you rather than relying on memory alone. Research quality Consult multiple sources when possible and cross-reference key claims. When sources disagree, note the discrepancy and explain which source you consider more reliable and why. If a web page fails to load or a search returns irrelevant results, try alternative search queries or sources before moving on. Track your sources -- you will need them when presenting results. Presenting results When presenting your final findings: - Use Markdown formatting for clarity. - Use clear sections with headings for each major topic or sub-question. - Cite your sources inline e.g., "According to source name URL , ..." . - End with a brief summary of key takeaways. - Save the final research report to file memory so it survives compaction and can be referenced in future sessions. """ async def main - None: ????????????????????????????????????????????????????????????????????? SECTION 2: Environment & Client Setup ????????????????????????????????????????????????????????????????????? load dotenv Reads FOUNDRY PROJECT ENDPOINT and FOUNDRY MODEL AzureCliCredential uses your az login session -- ideal for dev. For production: replace with ManagedIdentityCredential client = FoundryChatClient credential=AzureCliCredential ????????????????????????????????????????????????????????????????????? SECTION 3: Harness Agent Assembly ????????????????????????????????????????????????????????????????????? agent = create harness agent client=client, max context window tokens=128 000, Total model context window max output tokens=16 384, Reserved for model's response name="ResearchAgent", description="A research assistant that plans and executes research tasks.", agent instructions=RESEARCH INSTRUCTIONS, All features active by default: - TodoProvider: tracks research tasks as explicit work items - AgentModeProvider: plan/execute two-phase workflow - CompactionProvider: sliding window + tool result compaction - SkillsProvider: progressive skill discovery - MemoryContextProvider: file-based durable memory - AgentTelemetryLayer: full OpenTelemetry instrumentation ????????????????????????????????????????????????????????????????????? SECTION 4: Session -- isolates this conversation's history and state ????????????????????????????????????????????????????????????????????? session = agent.create session print "Research Assistant powered by create harness agent " print "=" 50 print "Enter a research topic to get started." print "Type /exit to end the session.\n" ????????????????????????????????????????????????????????????????????? SECTION 5: Interactive streaming chat loop ????????????????????????????????????????????????????????????????????? while True: user input = input "You: " .strip if not user input: continue if user input.lower == "/exit": print "\nGoodbye " break print "\nAssistant: ", end="", flush=True agent.run ..., stream=True returns AsyncGenerator AgentUpdate, None Each AgentUpdate has: update.text -- streaming text fragment from the model update.contents -- list of structured content items tool calls, etc. async for update in agent.run user input, session=session, stream=True : if update.contents: for content in update.contents: if content.type == "function call": Tool is being invoked -- show users what's happening print f"\n calling tool: {content.name} ", flush=True print " ", end="", flush=True Handle web search events from the built-in search tool elif content.type in "search tool call", "search tool result" and \ getattr content, "tool name", None == "web search": action = None if content.type == "search tool result" and isinstance content.result, dict : action = content.result.get "action", {} elif content.type == "search tool call": action = content.arguments if isinstance content.arguments, dict else None if action: action type = action.get "type", "search" if action type == "search": queries = action.get "queries" or query str = ", ".join f'"{q}"' for q in queries \ if queries else action.get "query", "" print f"\n ? Web search: {query str}", flush=True print " ", end="", flush=True elif action type == "open page": url = action.get "url", " unknown " print f"\n ? Opening: {url}", flush=True print " ", end="", flush=True elif action type == "find in page": pattern = action.get "pattern", "" print f'\n ? Find in page: "{pattern}"', flush=True print " ", end="", flush=True else: print f"\n ? Web search: {action type}", flush=True print " ", end="", flush=True Stream text fragments as they arrive from the model if update.text: print update.text, end="", flush=True print "\n" if name == " main ": asyncio.run main Here is a clean, reusable streaming handler you can adapt for your own applications: streaming handler.py -- Reusable streaming output handler async def stream agent response agent, user input: str, session - str: """ Stream an agent response, printing text as it arrives and logging tool calls with their arguments and results. Returns: The complete assembled response text. """ full response parts: list str = print "Assistant: ", end="", flush=True async for update in agent.run user input, session=session, stream=True : Stream text as it arrives if update.text: print update.text, end="", flush=True full response parts.append update.text Handle structured content items if update.contents: for content in update.contents: if content.type == "function call": args summary = str content.arguments :100 print f"\n ? Tool call: {content.name} {args summary}... ", flush=True elif content.type == "function result": result preview = str content.result :80 print f"\n ? Result: {result preview}...", flush=True elif content.type == "search tool call": if hasattr content, "arguments" and isinstance content.arguments, dict : query = content.arguments.get "query", "" print f"\n ? Searching: '{query}'", flush=True elif content.type == "search tool result": if isinstance content.result, dict : url = content.result.get "url", "" if url: print f"\n ? Retrieved: {url}", flush=True print "\n" return "".join full response parts ?? Pattern 1: Lean agent no planning, no todo management ???????????? Use for: Simple Q&A, single-turn tasks, low-latency scenarios lean agent = create harness agent client=client, max context window tokens=32 000, max output tokens=4 096, name="QuickAnswerAgent", agent instructions="You are a concise Q&A assistant. Answer briefly and directly.", disable todo=True, No task tracking for Q&A disable mode=True, No plan/execute modes needed disable compaction=False, Keep compaction for long conversations ?? Pattern 2: Full research agent with persistent memory ??????????????? Use for: Long-running research, multi-session workflows from pathlib import Path memory path = Path "./research memory" memory path.mkdir exist ok=True research agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, name="ResearchAgent", agent instructions=RESEARCH INSTRUCTIONS, memory store=str memory path , Enable file-based persistent memory ?? Pattern 3: Agent with custom enterprise tools ?????????????????????? Use for: Domain-specific agents with proprietary data access from agent framework.tools import get web search tool async def query internal db query: str, department: str = "all" - list dict : """Query the internal company database. Args: query: Search query for the database. department: Filter by department name, or 'all' for global search. Returns: List of matching records. """ Your internal DB implementation return async def get slack messages channel: str, days back: int = 7 - list dict : """Retrieve recent Slack messages from a channel. Args: channel: Slack channel name without . days back: Number of days of history to retrieve. Returns: List of message objects with sender, timestamp, and text. """ Your Slack API implementation return custom agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, name="InternalResearchAgent", agent instructions="You are an internal research assistant with access to company data.", extra tools= get web search tool , Built-in web search query internal db, Custom: internal database get slack messages, Custom: Slack integration , Set environment variables export FOUNDRY PROJECT ENDPOINT="https://your-project.services.ai.azure.com/api/projects/your-project-name" export FOUNDRY MODEL="gpt-4o" Authenticate with Azure az login Run the research agent python harness research.py Expected terminal output when you ask a research question: Research Assistant powered by create harness agent ================================================== Enter a research topic to get started. Type /exit to end the session. You: Research the current state of AI agent frameworks in 2026 Assistant: calling tool: switch to plan mode calling tool: create todo calling tool: create todo calling tool: create todo calling tool: switch to execute mode Here is my research plan. I will: 1. Survey the major frameworks MAF, AutoGen, LangGraph, CrewAI, LlamaIndex 2. Look up recent benchmarks and community activity 3. Compare feature sets in a table 4. Summarize key takeaways Shall I proceed? You: Yes, go ahead. Assistant: ? Web search: "Microsoft Agent Framework 2026 features" ? Opening: https://github.com/microsoft/agent-framework ? Web search: "LangGraph vs AutoGen comparison 2026" calling tool: complete todo ? Web search: "CrewAI production readiness 2026" ... AI Agent Frameworks -- State of the Ecosystem 2026 Microsoft Agent Framework MAF According to Microsoft DevBlogs https://devblogs.microsoft.com/agent-framework/ , MAF 1.0 ... Pro Tip:Launch the built-in DevUI for a visual debugging experience during development: install agent-framework devui and run agent devui in your project directory. The harness pattern dramatically reduces the operational burden of running agents in production, but there are still architectural decisions to make: The right max context window tokens depends on your model and workload. A rule of thumb: For multi-user production deployments, InMemoryHistoryProvider is insufficient -- process restarts lose all history. MAF supports pluggable history backends: Example: using a custom durable history provider pattern from agent framework.core import Agent Implement IHistoryProvider backed by your chosen store CosmosDB, Redis, PostgreSQL, etc. and pass it to the agent builder See MAF documentation for the full IHistoryProvider interface As of May 2026, MAF ships FIDES Flow Integrity Deterministic Enforcement System as a middleware -- the 1 defense against prompt injection OWASP LLM Top 10 risk 1 . FIDES assigns integrity labels trusted/untrusted and confidentiality labels public/private to every piece of content flowing through the agent. Labels propagate automatically, and policy enforcement is deterministic -- not heuristic. Enable FIDES middleware for production agents from agent framework.security import FidesMiddleware agent = create harness agent client=client, max context window tokens=128 000, max output tokens=16 384, Additional middleware can be composed with the harness Consult MAF docs for middleware registration API When you're ready to move from local development to production, Foundry Hosted Agents provides containerized Micro VM hosting with built-in identity, autoscaling, session state management, and versioning. The migration from a local harness agent is minimal: The agent code is identical -- only the hosting changes Add foundry hosting package and declare your agent as a hosted endpoint See: https://github.com/microsoft/agent-framework/tree/main/python/samples/04-hosting With OpenTelemetry already wired in by the harness, you need only configure your exporter to get production-grade visibility: Key dashboards to build: token consumption per session, tool call latency distribution, compaction frequency, and agent error rates. The agent harness pattern is the difference between an AI demo and a production AI system. It acknowledges a fundamental truth: building the intelligence of an agent is the easy part. Keeping that intelligence reliable, observable, durable, and safe under production load is the hard part -- and the harness handles that hard part for you. Microsoft Agent Framework's create harness agent is the most complete open-source implementation of this pattern available today. In a single factory call it wires together eight battle-tested subsystems -- function invocation, history persistence, context compaction, todo-based planning, plan/execute mode management, durable file memory, progressive skill loading, and OpenTelemetry instrumentation -- all individually configurable, all working in concert. Here is what to take away from this deep dive: disable todo=True , disable mode=True than to add them later harness research.py Ready to build? pip install agent-framework az login python harness research.py Explore the full framework at github.com/microsoft/agent-framework https://github.com/microsoft/agent-framework , join the community on Discord https://discord.gg/b5zjErwbQM , and check the latest patterns on the official blog https://devblogs.microsoft.com/agent-framework/ . The infrastructure is handled. Go build the intelligence. ? All code samples in this article are sourced from or based on the official Microsoft Agent Framework repository MIT License . Verify all API signatures against the latest release before deploying to production.