{"slug": "agent-harness-explained-build-production-ready-ai-agents-with-microsoft-agent", "title": "Agent Harness Explained: Build Production-Ready AI Agents with Microsoft Agent Framework", "summary": "Microsoft's Agent Framework introduces the `create_harness_agent` function, a production-ready pattern that pre-assembles critical infrastructure components—including tool-calling loops, conversation history management, context-window compaction, planning mechanisms, durable memory, skill extensibility, and observability—into a single, configurable pipeline. This approach eliminates the \"hidden complexity tax\" developers face when wiring these components manually, where agents commonly fail due to context window limits, lost tool results, untracked parallel tasks, or production crashes with no debugging trace. By separating agent instructions (what to do) from execution infrastructure (how to reliably run), the harness pattern enables developers to focus on agent logic rather than scaffolding, similar to how Kubernetes automates container deployment versus manual `docker run` commands.", "body_md": "Meta Description:Learn what an agent harness is, why it matters for production AI systems, and how to implement one step-by-step using Microsoft Agent Framework's`create_harness_agent`\n\n-- with real Python code, architecture diagrams, and deep technical walkthroughs.\n\nYou have a capable LLM. You have a clear use case. You write a chat loop in 20 lines of Python and it works -- until the context window fills up and the agent loses the thread. Or it calls a tool but forgets the result two turns later. Or it starts three things at once and has no way to track which are done. Or it crashes in production with no trace of what went wrong.\n\nThis is the **hidden complexity tax of AI agents**. Every production-grade agent needs: a tool-calling loop, conversation history management, context-window compaction, a planning mechanism, durable memory, skill extensibility, and observability. If you wire each of these yourself, you spend more time on infrastructure than on the actual intelligence. And when one breaks, the entire agent fails in ways that are nearly impossible to debug.\n\nThe **agent harness** pattern solves this by pre-assembling all of those components into a single, tested, configurable pipeline -- so you can focus on *what* your agent does, not *how* to keep it running.\n\nIn this deep dive, you'll learn:\n\n`create_harness_agent`\n\nLet's build something that lasts.\n\nThe term \"harness\" comes from two established software patterns. In **test engineering**, a test harness is the scaffolding that configures, instruments, and tears down a system under test -- so test authors write *test logic*, not setup code. In **dependency injection**, a DI container is a wiring harness -- it resolves and connects components so application code never calls `new`\n\ndirectly.\n\nAn **agent harness** applies the same thinking to AI agents. It is:\n\nA factory or container that constructs a fully wired, ready-to-run agent by assembling all required infrastructure components -- history, tools, memory, observability, planning -- from a single, declarative configuration point.\n\nThe key insight is **separation of concerns**: your agent instructions define *what* the agent should do; the harness defines *how* that intent is reliably executed, persisted, and observed.\n\n**Architecture Note:** Think of the harness like a production-grade Kubernetes deployment vs. running a container manually with `docker run`\n\n. Both run your code, but one handles restarts, resource limits, networking, logging, and scaling automatically.\n\nTo appreciate what the harness eliminates, consider what you would have to build manually for a real agent:\n\n| Component | Manual Responsibility |\n|---|---|\nTool Calling Loop |\nDetect tool calls, dispatch, collect results, re-invoke model, set termination condition |\nHistory Management |\nSerialize/deserialize conversation history, decide storage backend, handle multi-turn sessions |\nContext Compaction |\nMonitor token count, decide eviction strategy, preserve semantic context |\nPlanning / Todo |\nDesign a task-tracking schema, prompt the model to use it, parse structured outputs |\nMode Management |\nTrack agent state (planning vs. executing), implement approval gates |\nPersistent Memory |\nDefine memory schema, write/read from durable store, inject into context at the right time |\nSkills Loading |\nDesign skill discovery protocol, filter by relevance, progressively inject into context |\nTelemetry |\nInstrument every model call, tool dispatch, and context mutation with spans and metrics |\n\nWriting all of this correctly, testing it, and keeping it working as your LLM provider updates its API is a substantial engineering effort. The harness pattern packages this effort once, so you -- and every developer on your team -- never has to repeat it.\n\nThe difference is immediately visible in code. Here is a minimal but realistic \"manual\" agent that handles just the tool-calling loop:\n\n```\n# ? DIY approach -- and this is just the tool loop, not memory, compaction, or telemetry\nimport json\n\nasync def run_agent_manually(client, tools, messages):\n    while True:\n        response = await client.chat(messages=messages, tools=tools)\n\n        if response.finish_reason == \"stop\":\n            return response.content\n\n        if response.finish_reason == \"tool_calls\":\n            messages.append(response.message)  # add assistant turn\n\n            for tool_call in response.tool_calls:\n                # Manually dispatch each tool\n                tool_fn = tool_registry.get(tool_call.function.name)\n                if not tool_fn:\n                    raise ValueError(f\"Unknown tool: {tool_call.function.name}\")\n\n                result = await tool_fn(**json.loads(tool_call.function.arguments))\n\n                messages.append({\n                    \"role\": \"tool\",\n                    \"tool_call_id\": tool_call.id,\n                    \"content\": str(result)\n                })\n            # Loop back to model -- now you ALSO need token counting,\n            # history management, todo tracking, telemetry...\n```\n\nNow here is the harness equivalent that does all of the above *plus* history, compaction, planning, memory, and telemetry:\n\n```\n# ? Harness approach -- full production pipeline in 4 lines\nfrom agent_framework import create_harness_agent\nfrom agent_framework.foundry import FoundryChatClient\nfrom azure.identity import AzureCliCredential\n\nagent = create_harness_agent(\n    client=FoundryChatClient(credential=AzureCliCredential()),\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n)\n```\n\nSame outcome. Orders of magnitude less surface area for bugs.\n\n**Microsoft Agent Framework (MAF)** is Microsoft's open-source, production-grade framework for building AI agents and multi-agent workflows. It is the **direct successor to both AutoGen and Semantic Kernel** -- combining AutoGen's simple, composable agent abstractions with Semantic Kernel's enterprise features: session-based state management, type safety, middleware pipelines, and comprehensive telemetry.\n\nMAF supports:\n\nAt the center of MAF is the `Agent`\n\nbase class. All agent types derive from this common base, which defines a consistent interface for multi-agent orchestration.\n\nThe **default agent runtime execution model** follows a deterministic loop:\n\n```\nUser Message\n    ?\n    ?\n???????????????????????????????????????????????????\n?                  Agent Runtime                  ?\n?                                                 ?\n?  1. Context Assembly                            ?\n?     ?? HistoryProvider + ContextProviders       ?\n?         (Memory, Skills, Mode, Todos)           ?\n?                                                 ?\n?  2. Middleware Pre-Processing                   ?\n?     ?? Telemetry, Compaction checks             ?\n?                                                 ?\n?  3. Model Inference                             ?\n?     ?? FoundryChatClient / OpenAI / Anthropic   ?\n?                                                 ?\n?  4. Tool Dispatch Loop                          ?\n?     ?? FunctionInvocationLayer                  ?\n?         (Tool call -> execute -> result -> loop)  ?\n?                                                 ?\n?  5. History Persistence                         ?\n?     ?? Saved after every service call           ?\n?                                                 ?\n?  6. Middleware Post-Processing                  ?\n?     ?? Telemetry spans closed                   ?\n???????????????????????????????????????????????????\n    ?\n    ?\nAgent Response (streaming or complete)\n```\n\nThis loop runs transparently on every `agent.run()`\n\ncall. The harness ensures all six phases are populated and correctly ordered.\n\nMAF's Python implementation is split into composable packages. The top-level `agent-framework`\n\nmeta-package installs all of them:\n\n```\npip install agent-framework\n```\n\n| Package | Purpose |\n|---|---|\n`core` |\n`Agent` base, session management, middleware, context providers |\n`foundry` |\n`FoundryChatClient` , Azure AI Foundry integration |\n`tools` |\n`get_web_search_tool()` and other built-in tool factories |\n`orchestrations` |\nMulti-agent workflow graphs (sequential, concurrent, handoff) |\n`devui` |\nInteractive browser-based DevUI for local agent debugging |\n`a2a` |\nAgent-to-Agent (A2A) protocol support for cross-platform agents |\n`lab` |\nExperimental features: benchmarking, RL, research |\n\n`create_harness_agent`\n\n`create_harness_agent`\n\nis a factory function -- a single call that constructs and wires a fully operational agent. Let's look at its complete signature and then break down every component it assembles:\n\n``` python\nfrom agent_framework import create_harness_agent\n\nagent = create_harness_agent(\n    # Required: the LLM backend\n    client=client,\n\n    # Required: token budget for the agent's context window\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n\n    # Optional: agent identity\n    name=\"MyAgent\",\n    description=\"What this agent does.\",\n    agent_instructions=\"System prompt / instructions for the agent.\",\n\n    # Optional: feature toggles (all enabled by default)\n    disable_todo=False,           # TodoProvider\n    disable_mode=False,           # AgentModeProvider\n    disable_compaction=False,     # CompactionProvider\n    memory_store=None,            # MemoryContextProvider (None = disabled)\n    skills_directory=None,        # SkillsProvider (None = disabled)\n    extra_tools=[],               # Additional tools to inject\n)\n```\n\nThis single call assembles **8 sub-systems** automatically. Here is each one in depth.\n\nThis is the agent's **agentic loop engine** -- the component that makes the agent actually *do things* rather than just respond.\n\nWhen a model returns a response containing tool calls, the `FunctionInvocationLayer`\n\nmiddleware:\n\nTool functions are registered as plain Python callables with type-annotated signatures. The framework automatically generates JSON schema from the annotations:\n\n``` php\nasync def search_database(query: str, limit: int = 10) -> list[dict]:\n    \"\"\"Search the product database for matching items.\n\n    Args:\n        query: The search query string.\n        limit: Maximum number of results to return.\n\n    Returns:\n        List of matching product records.\n    \"\"\"\n    # Your implementation here\n    return results\n\n# The framework auto-generates the JSON schema and handles dispatch\nagent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n    extra_tools=[search_database],\n)\n```\n\nThe `InMemoryHistoryProvider`\n\nmanages the **conversation history** for the agent's session. What makes this different from a plain message list is *when* persistence occurs: **after every individual model service call**, not just at the end of the agent's turn.\n\nThis matters for reliability. Consider an agent making three tool calls in a single turn. Without per-call persistence, a crash on the third call loses work from calls one and two. With per-call persistence, the agent can resume from the last successful state.\n\n```\n# Sessions are created per-conversation to isolate history\nsession = agent.create_session()\n\n# All turns in this session share and accumulate history\nresult_1 = await agent.run(\"Research quantum computing trends\", session=session)\nresult_2 = await agent.run(\"Now summarize your findings in bullet points\", session=session)\n# Turn 2 has full memory of everything from turn 1\n```\n\nFor production deployments that require history to survive process restarts, MAF supports swapping `InMemoryHistoryProvider`\n\nwith durable backends -- Azure Cosmos DB, Redis, or any custom implementation via the `IHistoryProvider`\n\ninterface.\n\nContext-window overflow is one of the most common production failures for long-running agents. The `CompactionProvider`\n\naddresses this with a two-pronged strategy:\n\n**Sliding Window:** When total token count approaches `max_context_window_tokens`\n\n, older messages are pruned. The system prompt and most recent N messages are always preserved.\n\n**Tool Result Compaction:** Tool results are often verbose -- a web search might return thousands of tokens. The compaction layer intelligently summarizes tool results when under token pressure, keeping essential information while reducing usage.\n\n```\n# Token budget math:\n# available_context = max_context_window_tokens - max_output_tokens - system_prompt_tokens\n# Compaction fires BEFORE the hard limit is hit\n\nagent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,   # GPT-4o's total window\n    max_output_tokens=16_384,            # Reserved for model's response\n    # ~112K tokens available for history + context providers\n    # Compaction fires automatically to stay within budget\n)\n\n# Or disable it if you manage token budget yourself:\nagent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n    disable_compaction=True,\n)\n```\n\nThe `TodoProvider`\n\ngives the agent a **structured task-tracking system** within its own context. Rather than relying on the agent to implicitly remember its progress, `TodoProvider`\n\nexposes tool functions the agent can call to manage an explicit todo list:\n\n`create_todo(title, description)`\n\n-- Creates a new work item`complete_todo(id)`\n\n-- Marks an item done`list_todos()`\n\n-- Gets all pending items`update_todo(id, status)`\n\n-- Updates item statusThis is deceptively powerful. The agent can decompose a complex task into explicit work items, track completion, and resist skipping steps -- all driven by its own reasoning.\n\n```\nUser: \"Research the top 5 AI agent frameworks and compare them.\"\n\nAgent (internal reasoning with TodoProvider):\n  -> create_todo(\"Research AutoGen\")\n  -> create_todo(\"Research LangGraph\")\n  -> create_todo(\"Research CrewAI\")\n  -> create_todo(\"Research Microsoft Agent Framework\")\n  -> create_todo(\"Write comparison table\")\n\n  [Agent proceeds autonomously]\n\n  -> complete_todo(\"Research AutoGen\")\n  -> complete_todo(\"Research LangGraph\")\n  -> ... and so on until all todos are done\n```\n\nThe `AgentModeProvider`\n\nimplements a **two-phase workflow** that mirrors how skilled professionals approach complex tasks:\n\n**Phase 1 -- Plan Mode (Interactive)**\n\nIn plan mode the agent is *collaborative*:\n\nThis is the human-in-the-loop gate -- users can modify the plan or redirect entirely before any autonomous work begins.\n\n**Phase 2 -- Execute Mode (Autonomous)**\n\nOnce approved, the agent:\n\nThis pattern is what makes agents *trustworthy* in production -- the human sees and approves the plan, and the agent executes it faithfully.\n\nThe `MemoryContextProvider`\n\nprovides **file-based durable memory** -- a persistent store the agent can read from and write to across sessions and compaction events.\n\nThis solves a critical gap in pure in-context memory: when compaction fires, older information is dropped. If the agent wrote a research finding 50 messages ago, it may no longer be in context. But if the agent saved it to memory storage, the `MemoryContextProvider`\n\nre-injects it at the start of every context assembly cycle.\n\n``` python\nimport tempfile\nfrom pathlib import Path\n\nmemory_dir = Path(tempfile.mkdtemp()) / \"agent_memory\"\nmemory_dir.mkdir(parents=True, exist_ok=True)\n\nagent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n    memory_store=str(memory_dir),  # Enable file-based persistent memory\n)\n\n# The agent can now use internal memory tools:\n# memory_write(key, content) -- Persist information to disk\n# memory_read(key)           -- Retrieve persisted information\n# Future sessions re-inject saved memory automatically into context\n```\n\nThe `SkillsProvider`\n\nenables **runtime skill discovery and progressive loading**. Skills are domain-specific capability packages -- sets of tools, instructions, and knowledge -- that the agent can discover and load on demand.\n\nRather than loading all possible tools upfront (wasting context tokens), `SkillsProvider`\n\nimplements **progressive loading**: it first presents skill summaries, and the agent selectively loads full skill definitions it actually needs. A general-purpose agent can become a specialized SQL expert, code reviewer, or API integration specialist -- dynamically, based on what the task requires.\n\nSkills can be authored in three ways (as of the latest MAF release):\n\n```\nagent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n    skills_directory=\"./skills\",  # Load skills from this directory\n)\n\n# Example skills/ directory:\n# skills/\n# ??? web_research.yaml     <- Web search + summarization capabilities\n# ??? data_analysis.yaml    <- Pandas/SQL data analysis capabilities\n# ??? code_review.yaml      <- Code quality review capabilities\n# ??? report_writing.yaml   <- Document formatting capabilities\n```\n\nThe `AgentTelemetryLayer`\n\ninstruments **every observable event** in the agent's lifecycle:\n\n| Telemetry Event | What is Captured |\n|---|---|\n`agent.run.start` |\nSession ID, user input, agent name, timestamp |\n`agent.model.call` |\nModel name, token counts (prompt/completion), latency |\n`agent.tool.call` |\nTool name, arguments, result size, latency |\n`agent.compaction.triggered` |\nToken counts before/after, messages evicted |\n`agent.mode.switch` |\nFrom/to mode, trigger reason |\n`agent.run.complete` |\nTotal tokens, tool calls, total latency, final status |\n\n``` python\n# Configure an OTLP exporter before creating the agent\nfrom opentelemetry import trace\nfrom opentelemetry.sdk.trace import TracerProvider\nfrom opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter\nfrom opentelemetry.sdk.trace.export import BatchSpanProcessor\n\n# Point at Azure Monitor / Jaeger / Grafana Tempo / Datadog, etc.\nexporter = OTLPSpanExporter(endpoint=\"http://localhost:4317\")\nprovider = TracerProvider()\nprovider.add_span_processor(BatchSpanProcessor(exporter))\ntrace.set_tracer_provider(provider)\n\n# The harness agent automatically uses the configured tracer -- no extra code needed\nagent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n)\n```\n\nNow let's build the complete harness-based Research Agent from the [official MAF repository](https://github.com/microsoft/agent-framework/tree/main/python/samples/02-agents/harness) -- line by line, with full commentary.\n\n```\n# Step 1: Install Microsoft Agent Framework\npip install agent-framework\n\n# Step 2: Authenticate with Azure\naz login\n\n# Step 3: Create your .env file\ncat > .env << EOF\nFOUNDRY_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com/api/projects/your-project-name\nFOUNDRY_MODEL=gpt-4o\nEOF\n```\n\nFinding Your Foundry Endpoint:Navigate to your Azure AI Foundry project in the Azure Portal -> Project Overview -> copy the \"Project endpoint\" URL.\n\nHere is the absolute minimum viable harness agent -- 4 lines of setup that give you a fully operational agent with history, compaction, todos, planning, and telemetry:\n\n``` python\n# minimal_harness.py\nimport asyncio\nfrom agent_framework import create_harness_agent\nfrom agent_framework.foundry import FoundryChatClient\nfrom azure.identity import AzureCliCredential\nfrom dotenv import load_dotenv\n\nasync def main():\n    # MAF does NOT auto-load .env -- must be explicit\n    load_dotenv()\n\n    # 1. LLM backend using your az login session (no secrets in code)\n    client = FoundryChatClient(credential=AzureCliCredential())\n\n    # 2. Harness agent -- all 8 sub-systems active by default\n    agent = create_harness_agent(\n        client=client,\n        max_context_window_tokens=128_000,\n        max_output_tokens=16_384,\n    )\n\n    # 3. Create a session (isolates conversation history)\n    session = agent.create_session()\n\n    # 4. Run the agent\n    response = await agent.run(\n        \"What are the latest trends in AI agent frameworks?\",\n        session=session,\n    )\n    print(response)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nNow the complete research agent from the official repository with detailed inline commentary:\n\n```\n# harness_research.py\n# Source: https://github.com/microsoft/agent-framework/tree/main/python/samples/02-agents/harness\n\nimport asyncio\nfrom agent_framework import create_harness_agent\nfrom agent_framework.foundry import FoundryChatClient\nfrom azure.identity import AzureCliCredential\nfrom dotenv import load_dotenv\n\n# ??????????????????????????????????????????????????????????????????????????\n# SECTION 1: Agent Instructions\n#\n# The system prompt is injected at every context assembly cycle.\n# It is ALWAYS present -- even after compaction fires -- because the harness\n# preserves the system prompt unconditionally.\n#\n# Key best practices here:\n#   - Clear role: \"You are a research assistant\"\n#   - Quality criteria: multiple sources, cross-referencing\n#   - Output format: Markdown, headings, inline citations\n#   - Durable memory: save final report to file memory so it survives\n#     compaction and persists across sessions\n# ??????????????????????????????????????????????????????????????????????????\n\nRESEARCH_INSTRUCTIONS = \"\"\"\\\n## Research Assistant Instructions\n\nYou are a research assistant. When given a research topic, research it\nthoroughly using web search and web browsing.\nUse your knowledge to form good search queries and hypotheses, but always\nverify claims with the tools available to you rather than relying on memory alone.\n\n### Research quality\n\nConsult multiple sources when possible and cross-reference key claims.\nWhen sources disagree, note the discrepancy and explain which source you\nconsider more reliable and why.\nIf a web page fails to load or a search returns irrelevant results, try\nalternative search queries or sources before moving on.\nTrack your sources -- you will need them when presenting results.\n\n### Presenting results\n\nWhen presenting your final findings:\n- Use Markdown formatting for clarity.\n- Use clear sections with headings for each major topic or sub-question.\n- Cite your sources inline (e.g., \"According to [source name](URL), ...\").\n- End with a brief summary of key takeaways.\n- Save the final research report to file memory so it survives compaction\n  and can be referenced in future sessions.\n\"\"\"\n\nasync def main() -> None:\n    # ?????????????????????????????????????????????????????????????????????\n    # SECTION 2: Environment & Client Setup\n    # ?????????????????????????????????????????????????????????????????????\n\n    load_dotenv()  # Reads FOUNDRY_PROJECT_ENDPOINT and FOUNDRY_MODEL\n\n    # AzureCliCredential uses your `az login` session -- ideal for dev.\n    # For production: replace with ManagedIdentityCredential\n    client = FoundryChatClient(credential=AzureCliCredential())\n\n    # ?????????????????????????????????????????????????????????????????????\n    # SECTION 3: Harness Agent Assembly\n    # ?????????????????????????????????????????????????????????????????????\n\n    agent = create_harness_agent(\n        client=client,\n        max_context_window_tokens=128_000,   # Total model context window\n        max_output_tokens=16_384,            # Reserved for model's response\n        name=\"ResearchAgent\",\n        description=\"A research assistant that plans and executes research tasks.\",\n        agent_instructions=RESEARCH_INSTRUCTIONS,\n        # All features active by default:\n        # - TodoProvider: tracks research tasks as explicit work items\n        # - AgentModeProvider: plan/execute two-phase workflow\n        # - CompactionProvider: sliding window + tool result compaction\n        # - SkillsProvider: progressive skill discovery\n        # - MemoryContextProvider: file-based durable memory\n        # - AgentTelemetryLayer: full OpenTelemetry instrumentation\n    )\n\n    # ?????????????????????????????????????????????????????????????????????\n    # SECTION 4: Session -- isolates this conversation's history and state\n    # ?????????????????????????????????????????????????????????????????????\n\n    session = agent.create_session()\n\n    print(\"Research Assistant (powered by create_harness_agent)\")\n    print(\"=\" * 50)\n    print(\"Enter a research topic to get started.\")\n    print(\"Type /exit to end the session.\\n\")\n\n    # ?????????????????????????????????????????????????????????????????????\n    # SECTION 5: Interactive streaming chat loop\n    # ?????????????????????????????????????????????????????????????????????\n\n    while True:\n        user_input = input(\"You: \").strip()\n        if not user_input:\n            continue\n        if user_input.lower() == \"/exit\":\n            print(\"\\nGoodbye!\")\n            break\n\n        print(\"\\nAssistant: \", end=\"\", flush=True)\n\n        # agent.run(..., stream=True) returns AsyncGenerator[AgentUpdate, None]\n        # Each AgentUpdate has:\n        #   update.text     -- streaming text fragment from the model\n        #   update.contents -- list of structured content items (tool calls, etc.)\n        async for update in agent.run(user_input, session=session, stream=True):\n            if update.contents:\n                for content in update.contents:\n                    if content.type == \"function_call\":\n                        # Tool is being invoked -- show users what's happening\n                        print(f\"\\n  [calling tool: {content.name}]\", flush=True)\n                        print(\"  \", end=\"\", flush=True)\n\n                    # Handle web search events from the built-in search tool\n                    elif content.type in (\"search_tool_call\", \"search_tool_result\") and \\\n                         getattr(content, \"tool_name\", None) == \"web_search\":\n                        action = None\n                        if content.type == \"search_tool_result\" and isinstance(content.result, dict):\n                            action = content.result.get(\"action\", {})\n                        elif content.type == \"search_tool_call\":\n                            action = content.arguments if isinstance(content.arguments, dict) else None\n\n                        if action:\n                            action_type = action.get(\"type\", \"search\")\n                            if action_type == \"search\":\n                                queries = action.get(\"queries\") or []\n                                query_str = \", \".join(f'\"{q}\"' for q in queries) \\\n                                            if queries else action.get(\"query\", \"\")\n                                print(f\"\\n  ? Web search: {query_str}\", flush=True)\n                                print(\"  \", end=\"\", flush=True)\n                            elif action_type == \"open_page\":\n                                url = action.get(\"url\", \"(unknown)\")\n                                print(f\"\\n  ? Opening: {url}\", flush=True)\n                                print(\"  \", end=\"\", flush=True)\n                            elif action_type == \"find_in_page\":\n                                pattern = action.get(\"pattern\", \"\")\n                                print(f'\\n  ? Find in page: \"{pattern}\"', flush=True)\n                                print(\"  \", end=\"\", flush=True)\n                            else:\n                                print(f\"\\n  ? Web search: {action_type}\", flush=True)\n                                print(\"  \", end=\"\", flush=True)\n\n            # Stream text fragments as they arrive from the model\n            if update.text:\n                print(update.text, end=\"\", flush=True)\n\n        print(\"\\n\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nHere is a clean, reusable streaming handler you can adapt for your own applications:\n\n```\n# streaming_handler.py -- Reusable streaming output handler\n\nasync def stream_agent_response(agent, user_input: str, session) -> str:\n    \"\"\"\n    Stream an agent response, printing text as it arrives\n    and logging tool calls with their arguments and results.\n\n    Returns:\n        The complete assembled response text.\n    \"\"\"\n    full_response_parts: list[str] = []\n\n    print(\"Assistant: \", end=\"\", flush=True)\n\n    async for update in agent.run(user_input, session=session, stream=True):\n\n        # Stream text as it arrives\n        if update.text:\n            print(update.text, end=\"\", flush=True)\n            full_response_parts.append(update.text)\n\n        # Handle structured content items\n        if update.contents:\n            for content in update.contents:\n\n                if content.type == \"function_call\":\n                    args_summary = str(content.arguments)[:100]\n                    print(f\"\\n  ? Tool call: {content.name}({args_summary}...)\", flush=True)\n\n                elif content.type == \"function_result\":\n                    result_preview = str(content.result)[:80]\n                    print(f\"\\n  ? Result: {result_preview}...\", flush=True)\n\n                elif content.type == \"search_tool_call\":\n                    if hasattr(content, \"arguments\") and isinstance(content.arguments, dict):\n                        query = content.arguments.get(\"query\", \"\")\n                        print(f\"\\n  ? Searching: '{query}'\", flush=True)\n\n                elif content.type == \"search_tool_result\":\n                    if isinstance(content.result, dict):\n                        url = content.result.get(\"url\", \"\")\n                        if url:\n                            print(f\"\\n  ? Retrieved: {url}\", flush=True)\n\n    print(\"\\n\")\n    return \"\".join(full_response_parts)\n# ?? Pattern 1: Lean agent (no planning, no todo management) ????????????\n# Use for: Simple Q&A, single-turn tasks, low-latency scenarios\n\nlean_agent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=32_000,\n    max_output_tokens=4_096,\n    name=\"QuickAnswerAgent\",\n    agent_instructions=\"You are a concise Q&A assistant. Answer briefly and directly.\",\n    disable_todo=True,       # No task tracking for Q&A\n    disable_mode=True,       # No plan/execute modes needed\n    disable_compaction=False, # Keep compaction for long conversations\n)\n\n# ?? Pattern 2: Full research agent with persistent memory ???????????????\n# Use for: Long-running research, multi-session workflows\n\nfrom pathlib import Path\n\nmemory_path = Path(\"./research_memory\")\nmemory_path.mkdir(exist_ok=True)\n\nresearch_agent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n    name=\"ResearchAgent\",\n    agent_instructions=RESEARCH_INSTRUCTIONS,\n    memory_store=str(memory_path),  # Enable file-based persistent memory\n)\n\n# ?? Pattern 3: Agent with custom enterprise tools ??????????????????????\n# Use for: Domain-specific agents with proprietary data access\n\nfrom agent_framework.tools import get_web_search_tool\n\nasync def query_internal_db(query: str, department: str = \"all\") -> list[dict]:\n    \"\"\"Query the internal company database.\n\n    Args:\n        query: Search query for the database.\n        department: Filter by department name, or 'all' for global search.\n\n    Returns:\n        List of matching records.\n    \"\"\"\n    # Your internal DB implementation\n    return []\n\nasync def get_slack_messages(channel: str, days_back: int = 7) -> list[dict]:\n    \"\"\"Retrieve recent Slack messages from a channel.\n\n    Args:\n        channel: Slack channel name (without #).\n        days_back: Number of days of history to retrieve.\n\n    Returns:\n        List of message objects with sender, timestamp, and text.\n    \"\"\"\n    # Your Slack API implementation\n    return []\n\ncustom_agent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n    name=\"InternalResearchAgent\",\n    agent_instructions=\"You are an internal research assistant with access to company data.\",\n    extra_tools=[\n        get_web_search_tool(),   # Built-in web search\n        query_internal_db,       # Custom: internal database\n        get_slack_messages,      # Custom: Slack integration\n    ],\n)\n# Set environment variables\nexport FOUNDRY_PROJECT_ENDPOINT=\"https://your-project.services.ai.azure.com/api/projects/your-project-name\"\nexport FOUNDRY_MODEL=\"gpt-4o\"\n\n# Authenticate with Azure\naz login\n\n# Run the research agent\npython harness_research.py\n```\n\n**Expected terminal output when you ask a research question:**\n\n```\nResearch Assistant (powered by create_harness_agent)\n==================================================\nEnter a research topic to get started.\nType /exit to end the session.\n\nYou: Research the current state of AI agent frameworks in 2026\n\nAssistant:\n  [calling tool: switch_to_plan_mode]\n  [calling tool: create_todo]\n  [calling tool: create_todo]\n  [calling tool: create_todo]\n  [calling tool: switch_to_execute_mode]\n\nHere is my research plan. I will:\n1. Survey the major frameworks (MAF, AutoGen, LangGraph, CrewAI, LlamaIndex)\n2. Look up recent benchmarks and community activity\n3. Compare feature sets in a table\n4. Summarize key takeaways\n\nShall I proceed?\n\nYou: Yes, go ahead.\n\nAssistant:\n  ? Web search: \"Microsoft Agent Framework 2026 features\"\n  ? Opening: https://github.com/microsoft/agent-framework\n  ? Web search: \"LangGraph vs AutoGen comparison 2026\"\n  [calling tool: complete_todo]\n  ? Web search: \"CrewAI production readiness 2026\"\n  ...\n\n## AI Agent Frameworks -- State of the Ecosystem (2026)\n\n### Microsoft Agent Framework (MAF)\nAccording to [Microsoft DevBlogs](https://devblogs.microsoft.com/agent-framework/), MAF 1.0 ...\n```\n\nPro Tip:Launch the built-in DevUI for a visual debugging experience during development: install`agent-framework[devui]`\n\nand run`agent devui`\n\nin your project directory.\n\nThe harness pattern dramatically reduces the operational burden of running agents in production, but there are still architectural decisions to make:\n\nThe right `max_context_window_tokens`\n\ndepends on your model and workload. A rule of thumb:\n\nFor multi-user production deployments, `InMemoryHistoryProvider`\n\nis insufficient -- process restarts lose all history. MAF supports pluggable history backends:\n\n```\n# Example: using a custom durable history provider (pattern)\nfrom agent_framework.core import Agent\n\n# Implement IHistoryProvider backed by your chosen store\n# (CosmosDB, Redis, PostgreSQL, etc.) and pass it to the agent builder\n# See MAF documentation for the full IHistoryProvider interface\n```\n\nAs of May 2026, MAF ships **FIDES (Flow Integrity Deterministic Enforcement System)** as a middleware -- the #1 defense against prompt injection (OWASP LLM Top 10 risk #1). FIDES assigns integrity labels (trusted/untrusted) and confidentiality labels (public/private) to every piece of content flowing through the agent. Labels propagate automatically, and policy enforcement is deterministic -- not heuristic.\n\n```\n# Enable FIDES middleware for production agents\nfrom agent_framework.security import FidesMiddleware\n\nagent = create_harness_agent(\n    client=client,\n    max_context_window_tokens=128_000,\n    max_output_tokens=16_384,\n    # Additional middleware can be composed with the harness\n    # Consult MAF docs for middleware registration API\n)\n```\n\nWhen you're ready to move from local development to production, Foundry Hosted Agents provides containerized Micro VM hosting with built-in identity, autoscaling, session state management, and versioning. The migration from a local harness agent is minimal:\n\n```\n# The agent code is identical -- only the hosting changes\n# Add foundry_hosting package and declare your agent as a hosted endpoint\n# See: https://github.com/microsoft/agent-framework/tree/main/python/samples/04-hosting\n```\n\nWith OpenTelemetry already wired in by the harness, you need only configure your exporter to get production-grade visibility:\n\nKey dashboards to build: token consumption per session, tool call latency distribution, compaction frequency, and agent error rates.\n\nThe **agent harness pattern** is the difference between an AI demo and a production AI system. It acknowledges a fundamental truth: building the *intelligence* of an agent is the easy part. Keeping that intelligence reliable, observable, durable, and safe under production load is the hard part -- and the harness handles that hard part for you.\n\n**Microsoft Agent Framework's create_harness_agent** is the most complete open-source implementation of this pattern available today. In a single factory call it wires together eight battle-tested subsystems -- function invocation, history persistence, context compaction, todo-based planning, plan/execute mode management, durable file memory, progressive skill loading, and OpenTelemetry instrumentation -- all individually configurable, all working in concert.\n\nHere is what to take away from this deep dive:\n\n`disable_todo=True`\n\n, `disable_mode=True`\n\n) than to add them later`harness_research.py`\n\n**Ready to build?**\n\n```\npip install agent-framework\naz login\npython harness_research.py\n```\n\nExplore the full framework at [github.com/microsoft/agent-framework](https://github.com/microsoft/agent-framework), join the community on [Discord](https://discord.gg/b5zjErwbQM), and check the latest patterns on the [official blog](https://devblogs.microsoft.com/agent-framework/).\n\nThe infrastructure is handled. Go build the intelligence. ?\n\n*All code samples in this article are sourced from or based on the official Microsoft Agent Framework repository (MIT License). Verify all API signatures against the latest release before deploying to production.*", "url": "https://wpnews.pro/news/agent-harness-explained-build-production-ready-ai-agents-with-microsoft-agent", "canonical_source": "https://dev.to/monuminu/agent-harness-explained-build-production-ready-ai-agents-with-microsoft-agent-framework-666", "published_at": "2026-05-30 13:48:41+00:00", "updated_at": "2026-05-30 13:51:58.978810+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "ai-tools", "ai-products", "large-language-models"], "entities": ["Microsoft Agent Framework", "create_harness_agent"], "alternates": {"html": "https://wpnews.pro/news/agent-harness-explained-build-production-ready-ai-agents-with-microsoft-agent", "markdown": "https://wpnews.pro/news/agent-harness-explained-build-production-ready-ai-agents-with-microsoft-agent.md", "text": "https://wpnews.pro/news/agent-harness-explained-build-production-ready-ai-agents-with-microsoft-agent.txt", "jsonld": "https://wpnews.pro/news/agent-harness-explained-build-production-ready-ai-agents-with-microsoft-agent.jsonld"}}