Microsoft’s open trust stack runs on OpenInference

wpnews.pro

At Microsoft Build, Sarah Bird, the Chief Product Officer for Responsible AI at Microsoft,announced the open trust stack for AI agents. Two new open-source projects sit at the center of that stack:

ASSERT for spec-driven evaluation and regression testingAgent Control Specification (ACS) for runtime controls

Both built on OpenInference, the OpenTelemetry standard for AI applications created by Arize.

That decision matters because it connects evaluation, runtime controls, and observability through a shared telemetry layer.

Every evaluation generated by ASSERT, every control decision enforced by ACS, and every trace captured in production by Phoenix or Arize AX speaks the same language.

This post explains why the trace contract underneath those systems matters.

What Microsoft announced

Microsoft introduced two open-source components designed to help developers build and operate AI agents more safely.

ASSERT

ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing) is an evaluation framework that starts with a behavioral specification.

Developers define what an agent should and should not do. ASSERT generates test cases, executes them against the agent, and evaluates the results.

Microsoft describes ASSERT as the inner-loop tool for testing agent behavior.

Agent Control Specification (ACS)

Agent Control Specification (ACS) is a portable standard for runtime controls.

Microsoft positions ACS alongside MCP (Model Context Protocol) and A2A (Agent2Agent) as an open ecosystem standard for agent governance.

ACS allows developers to define safety controls that execute throughout the agent lifecycle, regardless of framework.

Together, ASSERT and ACS create a feedback loop:

Find failures with ASSERT.
Apply controls with ACS.
Re-run ASSERT to validate the fix. Both systems use OpenInference underneath. That shared telemetry layer is what allows evaluation, controls, and observability to work together.

Why OpenInference matters #

The most important architectural decision in Microsoft’s announcement isn’t ASSERT or ACS, but the fact that both use the same trace contract.

The reason becomes obvious when you look at how evaluation actually works.

Evaluators need traces, not just outputs #

The judge has a problem. It needs to grade your agent, but if all it sees is the final text response, it can’t tell the difference between “the agent did the right thing” and “the agent got the right answer for the wrong reason.” Did it actually call the budget validation tool? Did it route through the safety advisor or skip it? Did it ground the flight prices in real tool output, or did it invent a number that happened to look reasonable?

For example, with plain final-text responses, the judge sees one thing out of eight it actually needs. With a normal model SDK response object, four out of eight. With OpenTelemetry traces, including tool calls, arguments, routing decisions, intermediate model calls, and per-span latency, it sees all eight.The judge can only score what it sees.

What the evaluator receives	Visibility
Final response only	Limited
SDK response object	Partial
OpenTelemetry + OpenInference	Full execution path

OpenInference is what lets the judge see. It is a set of OpenTelemetry semantic conventions for LLM workloads. Pick a supported framework, drop in two lines of code, and your agent’s internals start emitting OpenInference spans that are auto-instrumented across 33+ frameworks with no agent code changes required.ASSERT then reads those spans back at evaluation time and hands them to the judge.

OpenInference exists so that developers can pick the agent framework they love and the observability they trust, without having to choose between them. ASSERT adopting OpenInference as its trace contract means a developer who instruments their LangGraph, CrewAI, LlamaIndex, or any of the dozens of supported frameworks today gets spec-driven evaluation with Arize observability with Phoenix and AX today — no rewriting of agent code, no lock-in to any one platform.— Aparna Dhinakaran, Co-founder & Chief Product Officer, Arize AI

**How ACS uses the same trace contract **

Agent Control Specification (ACS) applies the same standards thinking to runtime controls.ACS is an open spec for placing deterministic safety controls at five checkpoints in the agent loop: input, LLM call, state, tool execution, and output. Each checkpoint can carry three control types: deterministic rules, classifiers, or LLM judges, composed at the same checkpoint for defense in depth. The contract itself is a portable YAML document: versionable, auditable, framework-agnostic. Same model as OpenInference’s relationship to traces. The contract lives outside the model, gets reviewed once, applies everywhere.

The reference implementation ships at Microsoft Build as microsoft/agent-governance-toolkit. Install it, wrap any tool with a governance wrapper and every call is evaluated against your policy, logged, and enforced.

The detail that matters for the trace-contract story: ACS doesn’t just enforce policy. **Every decision it makes streams out as OpenInference spans into the same trace tree as your agent’s own activity. **Block, allow, require human approval, state transition. All of it lands as observable telemetry on the same standard the agent is already emitting.

Securing AI agents has been stuck between advisory system prompts and brittle per-framework code, and neither scales to the enterprise. Agent Control Specification (ACS) treats agent guardrails the way OpenInference treats traces: a portable, declarative contract enforced outside the model, reviewed once by security and applied everywhere. Every block, every human approval, and every state transition Agent Control Specification emits lands in Arize alongside the OpenInference trace that produced it, so policy and observability finally travel together.— Aparna Dhinakaran, Co-founder & Chief Product Officer, Arize AI

The whole loop, in one picture

Here’s what you get when both halves of the open trust stack ride the same trace contract:

ASSERT generates and runs your evaluations. It reads OpenInference spans from your agent and uses them as evidence for the judge.ACS enforces your policy at runtime. It emits OpenInference spans for every control decision it makes.Arize AX orPhoenix consumes those same spans in production.

Run ASSERT to identify defects, apply ACS to control them, and re-run ASSERT to validate the fix. Then you can stream the same spans into production observability. The contract under every step is OpenInference.

At eval time: ASSERT reads OpenInference spans as evidence for the judge.At policy time: ACS emits control decisions as OpenInference spans.In production: the same spans stream to AX or Phoenix.

Microsoft designed its open trust stack around a shared telemetry standard from the start.

ASSERT uses OpenInference traces as evidence during evaluation.

ACS emits OpenInference traces for every runtime control decision.

Phoenix and Arize AX consume those same traces in production.

One trace contract now connects evaluation, runtime controls, and observability.

Catch the recordings

BRK250: Govern open-source AI agents, any framework, any scale**.** Sarah Bird, CPO of Responsible AI at Microsoft. The official open-trust-stack announcement.DEM361: Understand and fix Agent Framework apps with observability and evals**.** Hands-on walkthrough of OpenInference instrumentation, the trace structure, and using an LLM judge to evaluate Microsoft Agent Framework apps.

Learn more

Microsoft’s open trust stack announcement: Sarah Bird’s full postASSERT on GitHub: the eval frameworkAgent Governance Toolkit: the ACS reference implementationOpenInference on GitHub: the trace standardPhoenix:the open-source observability backendArize AX:the production-grade observability platform

source & further reading

arize.com — original article Inside Cursor’s agent factory: how it verifies AI-written code Kiro CLI observability: trace and evaluate agent changes with Arize Skills From human-operated agent development to systematic agent improvement

Microsoft’s open trust stack runs on OpenInference

ASSERT

Agent Control Specification (ACS)

Why OpenInference matters #

Evaluators need traces, not just outputs #

Run your AI side-project on zahid.host