cd /news/ai-agents/agent-harness-explained-build-produc… · home topics ai-agents article
[ARTICLE · art-18543] src=dev.to pub= topic=ai-agents verified=true sentiment=↑ positive

Agent Harness Explained: Build Production-Ready AI Agents with Microsoft Agent Framework

Microsoft's Agent Framework introduces the `create_harness_agent` function, a production-ready pattern that pre-assembles critical infrastructure components—including tool-calling loops, conversation history management, context-window compaction, planning mechanisms, durable memory, skill extensibility, and observability—into a single, configurable pipeline. This approach eliminates the "hidden complexity tax" developers face when wiring these components manually, where agents commonly fail due to context window limits, lost tool results, untracked parallel tasks, or production crashes with no debugging trace. By separating agent instructions (what to do) from execution infrastructure (how to reliably run), the harness pattern enables developers to focus on agent logic rather than scaffolding, similar to how Kubernetes automates container deployment versus manual `docker run` commands.

read22 min publishedMay 30, 2026

Meta Description:Learn what an agent harness is, why it matters for production AI systems, and how to implement one step-by-step using Microsoft Agent Framework'screate_harness_agent

-- with real Python code, architecture diagrams, and deep technical walkthroughs.

You have a capable LLM. You have a clear use case. You write a chat loop in 20 lines of Python and it works -- until the context window fills up and the agent loses the thread. Or it calls a tool but forgets the result two turns later. Or it starts three things at once and has no way to track which are done. Or it crashes in production with no trace of what went wrong.

This is the hidden complexity tax of AI agents. Every production-grade agent needs: a tool-calling loop, conversation history management, context-window compaction, a planning mechanism, durable memory, skill extensibility, and observability. If you wire each of these yourself, you spend more time on infrastructure than on the actual intelligence. And when one breaks, the entire agent fails in ways that are nearly impossible to debug.

The agent harness pattern solves this by pre-assembling all of those components into a single, tested, configurable pipeline -- so you can focus on what your agent does, not how to keep it running.

In this deep dive, you'll learn:

create_harness_agent

Let's build something that lasts.

The term "harness" comes from two established software patterns. In test engineering, a test harness is the scaffolding that configures, instruments, and tears down a system under test -- so test authors write test logic, not setup code. In dependency injection, a DI container is a wiring harness -- it resolves and connects components so application code never calls new

directly.

An agent harness applies the same thinking to AI agents. It is:

A factory or container that constructs a fully wired, ready-to-run agent by assembling all required infrastructure components -- history, tools, memory, observability, planning -- from a single, declarative configuration point.

The key insight is separation of concerns: your agent instructions define what the agent should do; the harness defines how that intent is reliably executed, persisted, and observed.

Architecture Note: Think of the harness like a production-grade Kubernetes deployment vs. running a container manually with docker run

. Both run your code, but one handles restarts, resource limits, networking, logging, and scaling automatically.

To appreciate what the harness eliminates, consider what you would have to build manually for a real agent:

Component Manual Responsibility
Tool Calling Loop
Detect tool calls, dispatch, collect results, re-invoke model, set termination condition
History Management
Serialize/deserialize conversation history, decide storage backend, handle multi-turn sessions
Context Compaction
Monitor token count, decide eviction strategy, preserve semantic context
Planning / Todo
Design a task-tracking schema, prompt the model to use it, parse structured outputs
Mode Management
Track agent state (planning vs. executing), implement approval gates
Persistent Memory
Define memory schema, write/read from durable store, inject into context at the right time
Skills
Design skill discovery protocol, filter by relevance, progressively inject into context
Telemetry
Instrument every model call, tool dispatch, and context mutation with spans and metrics

Writing all of this correctly, testing it, and keeping it working as your LLM provider updates its API is a substantial engineering effort. The harness pattern packages this effort once, so you -- and every developer on your team -- never has to repeat it.

The difference is immediately visible in code. Here is a minimal but realistic "manual" agent that handles just the tool-calling loop:

import json

async def run_agent_manually(client, tools, messages):
    while True:
        response = await client.chat(messages=messages, tools=tools)

        if response.finish_reason == "stop":
            return response.content

        if response.finish_reason == "tool_calls":
            messages.append(response.message)  # add assistant turn

            for tool_call in response.tool_calls:
                tool_fn = tool_registry.get(tool_call.function.name)
                if not tool_fn:
                    raise ValueError(f"Unknown tool: {tool_call.function.name}")

                result = await tool_fn(**json.loads(tool_call.function.arguments))

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": str(result)
                })

Now here is the harness equivalent that does all of the above plus history, compaction, planning, memory, and telemetry:

from agent_framework import create_harness_agent
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential

agent = create_harness_agent(
    client=FoundryChatClient(credential=AzureCliCredential()),
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
)

Same outcome. Orders of magnitude less surface area for bugs.

Microsoft Agent Framework (MAF) is Microsoft's open-source, production-grade framework for building AI agents and multi-agent workflows. It is the direct successor to both AutoGen and Semantic Kernel -- combining AutoGen's simple, composable agent abstractions with Semantic Kernel's enterprise features: session-based state management, type safety, middleware pipelines, and comprehensive telemetry.

MAF supports:

At the center of MAF is the Agent

base class. All agent types derive from this common base, which defines a consistent interface for multi-agent orchestration.

The default agent runtime execution model follows a deterministic loop:

User Message
    ?
    ?
???????????????????????????????????????????????????
?                  Agent Runtime                  ?
?                                                 ?
?  1. Context Assembly                            ?
?     ?? HistoryProvider + ContextProviders       ?
?         (Memory, Skills, Mode, Todos)           ?
?                                                 ?
?  2. Middleware Pre-Processing                   ?
?     ?? Telemetry, Compaction checks             ?
?                                                 ?
?  3. Model Inference                             ?
?     ?? FoundryChatClient / OpenAI / Anthropic   ?
?                                                 ?
?  4. Tool Dispatch Loop                          ?
?     ?? FunctionInvocationLayer                  ?
?         (Tool call -> execute -> result -> loop)  ?
?                                                 ?
?  5. History Persistence                         ?
?     ?? Saved after every service call           ?
?                                                 ?
?  6. Middleware Post-Processing                  ?
?     ?? Telemetry spans closed                   ?
???????????????????????????????????????????????????
    ?
    ?
Agent Response (streaming or complete)

This loop runs transparently on every agent.run()

call. The harness ensures all six phases are populated and correctly ordered.

MAF's Python implementation is split into composable packages. The top-level agent-framework

meta-package installs all of them:

pip install agent-framework
Package Purpose
core
Agent base, session management, middleware, context providers
foundry
FoundryChatClient , Azure AI Foundry integration
tools
get_web_search_tool() and other built-in tool factories
orchestrations
Multi-agent workflow graphs (sequential, concurrent, handoff)
devui
Interactive browser-based DevUI for local agent debugging
a2a
Agent-to-Agent (A2A) protocol support for cross-platform agents
lab
Experimental features: benchmarking, RL, research

create_harness_agent

create_harness_agent

is a factory function -- a single call that constructs and wires a fully operational agent. Let's look at its complete signature and then break down every component it assembles:

from agent_framework import create_harness_agent

agent = create_harness_agent(
    client=client,

    max_context_window_tokens=128_000,
    max_output_tokens=16_384,

    name="MyAgent",
    description="What this agent does.",
    agent_instructions="System prompt / instructions for the agent.",

    disable_todo=False,           # TodoProvider
    disable_mode=False,           # AgentModeProvider
    disable_compaction=False,     # CompactionProvider
    memory_store=None,            # MemoryContextProvider (None = disabled)
    skills_directory=None,        # SkillsProvider (None = disabled)
    extra_tools=[],               # Additional tools to inject
)

This single call assembles 8 sub-systems automatically. Here is each one in depth.

This is the agent's agentic loop engine -- the component that makes the agent actually do things rather than just respond.

When a model returns a response containing tool calls, the FunctionInvocationLayer

middleware:

Tool functions are registered as plain Python callables with type-annotated signatures. The framework automatically generates JSON schema from the annotations:

async def search_database(query: str, limit: int = 10) -> list[dict]:
    """Search the product database for matching items.

    Args:
        query: The search query string.
        limit: Maximum number of results to return.

    Returns:
        List of matching product records.
    """
    return results

agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
    extra_tools=[search_database],
)

The InMemoryHistoryProvider

manages the conversation history for the agent's session. What makes this different from a plain message list is when persistence occurs: after every individual model service call, not just at the end of the agent's turn.

This matters for reliability. Consider an agent making three tool calls in a single turn. Without per-call persistence, a crash on the third call loses work from calls one and two. With per-call persistence, the agent can resume from the last successful state.

session = agent.create_session()

result_1 = await agent.run("Research quantum computing trends", session=session)
result_2 = await agent.run("Now summarize your findings in bullet points", session=session)

For production deployments that require history to survive process restarts, MAF supports swapping InMemoryHistoryProvider

with durable backends -- Azure Cosmos DB, Redis, or any custom implementation via the IHistoryProvider

interface.

Context-window overflow is one of the most common production failures for long-running agents. The CompactionProvider

addresses this with a two-pronged strategy:

Sliding Window: When total token count approaches max_context_window_tokens

, older messages are pruned. The system prompt and most recent N messages are always preserved.

Tool Result Compaction: Tool results are often verbose -- a web search might return thousands of tokens. The compaction layer intelligently summarizes tool results when under token pressure, keeping essential information while reducing usage.


agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,   # GPT-4o's total window
    max_output_tokens=16_384,            # Reserved for model's response
)

agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
    disable_compaction=True,
)

The TodoProvider

gives the agent a structured task-tracking system within its own context. Rather than relying on the agent to implicitly remember its progress, TodoProvider

exposes tool functions the agent can call to manage an explicit todo list:

create_todo(title, description)

-- Creates a new work itemcomplete_todo(id)

-- Marks an item donelist_todos()

-- Gets all pending itemsupdate_todo(id, status)

-- Updates item statusThis is deceptively powerful. The agent can decompose a complex task into explicit work items, track completion, and resist skipping steps -- all driven by its own reasoning.

User: "Research the top 5 AI agent frameworks and compare them."

Agent (internal reasoning with TodoProvider):
  -> create_todo("Research AutoGen")
  -> create_todo("Research LangGraph")
  -> create_todo("Research CrewAI")
  -> create_todo("Research Microsoft Agent Framework")
  -> create_todo("Write comparison table")

  [Agent proceeds autonomously]

  -> complete_todo("Research AutoGen")
  -> complete_todo("Research LangGraph")
  -> ... and so on until all todos are done

The AgentModeProvider

implements a two-phase workflow that mirrors how skilled professionals approach complex tasks:

Phase 1 -- Plan Mode (Interactive)

In plan mode the agent is collaborative:

This is the human-in-the-loop gate -- users can modify the plan or redirect entirely before any autonomous work begins.

Phase 2 -- Execute Mode (Autonomous)

Once approved, the agent:

This pattern is what makes agents trustworthy in production -- the human sees and approves the plan, and the agent executes it faithfully.

The MemoryContextProvider

provides file-based durable memory -- a persistent store the agent can read from and write to across sessions and compaction events.

This solves a critical gap in pure in-context memory: when compaction fires, older information is dropped. If the agent wrote a research finding 50 messages ago, it may no longer be in context. But if the agent saved it to memory storage, the MemoryContextProvider

re-injects it at the start of every context assembly cycle.

import tempfile
from pathlib import Path

memory_dir = Path(tempfile.mkdtemp()) / "agent_memory"
memory_dir.mkdir(parents=True, exist_ok=True)

agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
    memory_store=str(memory_dir),  # Enable file-based persistent memory
)

The SkillsProvider

enables **runtime skill discovery and progressive **. Skills are domain-specific capability packages -- sets of tools, instructions, and knowledge -- that the agent can discover and load on demand.

Rather than all possible tools upfront (wasting context tokens), SkillsProvider

implements **progressive **: it first presents skill summaries, and the agent selectively loads full skill definitions it actually needs. A general-purpose agent can become a specialized SQL expert, code reviewer, or API integration specialist -- dynamically, based on what the task requires.

Skills can be authored in three ways (as of the latest MAF release):

agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
    skills_directory="./skills",  # Load skills from this directory
)

The AgentTelemetryLayer

instruments every observable event in the agent's lifecycle:

Telemetry Event What is Captured
agent.run.start
Session ID, user input, agent name, timestamp
agent.model.call
Model name, token counts (prompt/completion), latency
agent.tool.call
Tool name, arguments, result size, latency
agent.compaction.triggered
Token counts before/after, messages evicted
agent.mode.switch
From/to mode, trigger reason
agent.run.complete
Total tokens, tool calls, total latency, final status
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
)

Now let's build the complete harness-based Research Agent from the official MAF repository -- line by line, with full commentary.

pip install agent-framework

az login

cat > .env << EOF
FOUNDRY_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com/api/projects/your-project-name
FOUNDRY_MODEL=gpt-4o
EOF

Finding Your Foundry Endpoint:Navigate to your Azure AI Foundry project in the Azure Portal -> Project Overview -> copy the "Project endpoint" URL.

Here is the absolute minimum viable harness agent -- 4 lines of setup that give you a fully operational agent with history, compaction, todos, planning, and telemetry:

import asyncio
from agent_framework import create_harness_agent
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential
from dotenv import load_dotenv

async def main():
    load_dotenv()

    client = FoundryChatClient(credential=AzureCliCredential())

    agent = create_harness_agent(
        client=client,
        max_context_window_tokens=128_000,
        max_output_tokens=16_384,
    )

    session = agent.create_session()

    response = await agent.run(
        "What are the latest trends in AI agent frameworks?",
        session=session,
    )
    print(response)

if __name__ == "__main__":
    asyncio.run(main())

Now the complete research agent from the official repository with detailed inline commentary:


import asyncio
from agent_framework import create_harness_agent
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential
from dotenv import load_dotenv

#
#

RESEARCH_INSTRUCTIONS = """\
## Research Assistant Instructions

You are a research assistant. When given a research topic, research it
thoroughly using web search and web browsing.
Use your knowledge to form good search queries and hypotheses, but always
verify claims with the tools available to you rather than relying on memory alone.

### Research quality

Consult multiple sources when possible and cross-reference key claims.
When sources disagree, note the discrepancy and explain which source you
consider more reliable and why.
If a web page fails to load or a search returns irrelevant results, try
alternative search queries or sources before moving on.
Track your sources -- you will need them when presenting results.

### Presenting results

When presenting your final findings:
- Use Markdown formatting for clarity.
- Use clear sections with headings for each major topic or sub-question.
- Cite your sources inline (e.g., "According to [source name](URL), ...").
- End with a brief summary of key takeaways.
- Save the final research report to file memory so it survives compaction
  and can be referenced in future sessions.
"""

async def main() -> None:

    load_dotenv()  # Reads FOUNDRY_PROJECT_ENDPOINT and FOUNDRY_MODEL

    client = FoundryChatClient(credential=AzureCliCredential())


    agent = create_harness_agent(
        client=client,
        max_context_window_tokens=128_000,   # Total model context window
        max_output_tokens=16_384,            # Reserved for model's response
        name="ResearchAgent",
        description="A research assistant that plans and executes research tasks.",
        agent_instructions=RESEARCH_INSTRUCTIONS,
    )


    session = agent.create_session()

    print("Research Assistant (powered by create_harness_agent)")
    print("=" * 50)
    print("Enter a research topic to get started.")
    print("Type /exit to end the session.\n")


    while True:
        user_input = input("You: ").strip()
        if not user_input:
            continue
        if user_input.lower() == "/exit":
            print("\nGoodbye!")
            break

        print("\nAssistant: ", end="", flush=True)

        async for update in agent.run(user_input, session=session, stream=True):
            if update.contents:
                for content in update.contents:
                    if content.type == "function_call":
                        print(f"\n  [calling tool: {content.name}]", flush=True)
                        print("  ", end="", flush=True)

                    elif content.type in ("search_tool_call", "search_tool_result") and \
                         getattr(content, "tool_name", None) == "web_search":
                        action = None
                        if content.type == "search_tool_result" and isinstance(content.result, dict):
                            action = content.result.get("action", {})
                        elif content.type == "search_tool_call":
                            action = content.arguments if isinstance(content.arguments, dict) else None

                        if action:
                            action_type = action.get("type", "search")
                            if action_type == "search":
                                queries = action.get("queries") or []
                                query_str = ", ".join(f'"{q}"' for q in queries) \
                                            if queries else action.get("query", "")
                                print(f"\n  ? Web search: {query_str}", flush=True)
                                print("  ", end="", flush=True)
                            elif action_type == "open_page":
                                url = action.get("url", "(unknown)")
                                print(f"\n  ? Opening: {url}", flush=True)
                                print("  ", end="", flush=True)
                            elif action_type == "find_in_page":
                                pattern = action.get("pattern", "")
                                print(f'\n  ? Find in page: "{pattern}"', flush=True)
                                print("  ", end="", flush=True)
                            else:
                                print(f"\n  ? Web search: {action_type}", flush=True)
                                print("  ", end="", flush=True)

            if update.text:
                print(update.text, end="", flush=True)

        print("\n")

if __name__ == "__main__":
    asyncio.run(main())

Here is a clean, reusable streaming handler you can adapt for your own applications:


async def stream_agent_response(agent, user_input: str, session) -> str:
    """
    Stream an agent response, printing text as it arrives
    and logging tool calls with their arguments and results.

    Returns:
        The complete assembled response text.
    """
    full_response_parts: list[str] = []

    print("Assistant: ", end="", flush=True)

    async for update in agent.run(user_input, session=session, stream=True):

        if update.text:
            print(update.text, end="", flush=True)
            full_response_parts.append(update.text)

        if update.contents:
            for content in update.contents:

                if content.type == "function_call":
                    args_summary = str(content.arguments)[:100]
                    print(f"\n  ? Tool call: {content.name}({args_summary}...)", flush=True)

                elif content.type == "function_result":
                    result_preview = str(content.result)[:80]
                    print(f"\n  ? Result: {result_preview}...", flush=True)

                elif content.type == "search_tool_call":
                    if hasattr(content, "arguments") and isinstance(content.arguments, dict):
                        query = content.arguments.get("query", "")
                        print(f"\n  ? Searching: '{query}'", flush=True)

                elif content.type == "search_tool_result":
                    if isinstance(content.result, dict):
                        url = content.result.get("url", "")
                        if url:
                            print(f"\n  ? Retrieved: {url}", flush=True)

    print("\n")
    return "".join(full_response_parts)

lean_agent = create_harness_agent(
    client=client,
    max_context_window_tokens=32_000,
    max_output_tokens=4_096,
    name="QuickAnswerAgent",
    agent_instructions="You are a concise Q&A assistant. Answer briefly and directly.",
    disable_todo=True,       # No task tracking for Q&A
    disable_mode=True,       # No plan/execute modes needed
    disable_compaction=False, # Keep compaction for long conversations
)


from pathlib import Path

memory_path = Path("./research_memory")
memory_path.mkdir(exist_ok=True)

research_agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
    name="ResearchAgent",
    agent_instructions=RESEARCH_INSTRUCTIONS,
    memory_store=str(memory_path),  # Enable file-based persistent memory
)


from agent_framework.tools import get_web_search_tool

async def query_internal_db(query: str, department: str = "all") -> list[dict]:
    """Query the internal company database.

    Args:
        query: Search query for the database.
        department: Filter by department name, or 'all' for global search.

    Returns:
        List of matching records.
    """
    return []

async def get_slack_messages(channel: str, days_back: int = 7) -> list[dict]:
    """Retrieve recent Slack messages from a channel.

    Args:
        channel: Slack channel name (without #).
        days_back: Number of days of history to retrieve.

    Returns:
        List of message objects with sender, timestamp, and text.
    """
    return []

custom_agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
    name="InternalResearchAgent",
    agent_instructions="You are an internal research assistant with access to company data.",
    extra_tools=[
        get_web_search_tool(),   # Built-in web search
        query_internal_db,       # Custom: internal database
        get_slack_messages,      # Custom: Slack integration
    ],
)
export FOUNDRY_PROJECT_ENDPOINT="https://your-project.services.ai.azure.com/api/projects/your-project-name"
export FOUNDRY_MODEL="gpt-4o"

az login

python harness_research.py

Expected terminal output when you ask a research question:

Enter a research topic to get started.
Type /exit to end the session.

You: Research the current state of AI agent frameworks in 2026

Assistant:
  [calling tool: switch_to_plan_mode]
  [calling tool: create_todo]
  [calling tool: create_todo]
  [calling tool: create_todo]
  [calling tool: switch_to_execute_mode]

Here is my research plan. I will:
1. Survey the major frameworks (MAF, AutoGen, LangGraph, CrewAI, LlamaIndex)
2. Look up recent benchmarks and community activity
3. Compare feature sets in a table
4. Summarize key takeaways

Shall I proceed?

You: Yes, go ahead.

Assistant:
  ? Web search: "Microsoft Agent Framework 2026 features"
  ? Opening: https://github.com/microsoft/agent-framework
  ? Web search: "LangGraph vs AutoGen comparison 2026"
  [calling tool: complete_todo]
  ? Web search: "CrewAI production readiness 2026"
  ...

## AI Agent Frameworks -- State of the Ecosystem (2026)

### Microsoft Agent Framework (MAF)
According to [Microsoft DevBlogs](https://devblogs.microsoft.com/agent-framework/), MAF 1.0 ...

Pro Tip:Launch the built-in DevUI for a visual debugging experience during development: installagent-framework[devui]

and runagent devui

in your project directory.

The harness pattern dramatically reduces the operational burden of running agents in production, but there are still architectural decisions to make:

The right max_context_window_tokens

depends on your model and workload. A rule of thumb:

For multi-user production deployments, InMemoryHistoryProvider

is insufficient -- process restarts lose all history. MAF supports pluggable history backends:

from agent_framework.core import Agent

As of May 2026, MAF ships FIDES (Flow Integrity Deterministic Enforcement System) as a middleware -- the #1 defense against prompt injection (OWASP LLM Top 10 risk #1). FIDES assigns integrity labels (trusted/untrusted) and confidentiality labels (public/private) to every piece of content flowing through the agent. Labels propagate automatically, and policy enforcement is deterministic -- not heuristic.

from agent_framework.security import FidesMiddleware

agent = create_harness_agent(
    client=client,
    max_context_window_tokens=128_000,
    max_output_tokens=16_384,
)

When you're ready to move from local development to production, Foundry Hosted Agents provides containerized Micro VM hosting with built-in identity, autoscaling, session state management, and versioning. The migration from a local harness agent is minimal:

With OpenTelemetry already wired in by the harness, you need only configure your exporter to get production-grade visibility:

Key dashboards to build: token consumption per session, tool call latency distribution, compaction frequency, and agent error rates.

The agent harness pattern is the difference between an AI demo and a production AI system. It acknowledges a fundamental truth: building the intelligence of an agent is the easy part. Keeping that intelligence reliable, observable, durable, and safe under production load is the hard part -- and the harness handles that hard part for you.

Microsoft Agent Framework's create_harness_agent is the most complete open-source implementation of this pattern available today. In a single factory call it wires together eight battle-tested subsystems -- function invocation, history persistence, context compaction, todo-based planning, plan/execute mode management, durable file memory, progressive skill , and OpenTelemetry instrumentation -- all individually configurable, all working in concert.

Here is what to take away from this deep dive:

disable_todo=True

, disable_mode=True

) than to add them laterharness_research.py

Ready to build?

pip install agent-framework
az login
python harness_research.py

Explore the full framework at github.com/microsoft/agent-framework, join the community on Discord, and check the latest patterns on the official blog.

The infrastructure is handled. Go build the intelligence. ?

All code samples in this article are sourced from or based on the official Microsoft Agent Framework repository (MIT License). Verify all API signatures against the latest release before deploying to production.

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/agent-harness-explai…] indexed:0 read:22min 2026-05-30 ·