New Microsoft tool lets devs spin up AI behavior tests using text descriptions

wpnews.pro

cd /news/artificial-intelligence/new-microsoft-tool-lets-devs-spin-up… · home › topics › artificial-intelligence › article

[ARTICLE · art-19476] src=techcrunch.com ↗ pub=2026-06-02T19:02Z topic=artificial-intelligence verified=true sentiment=↑ positive

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Microsoft released ASSERT, an open-source framework that uses AI to convert plain-language descriptions of intended behaviors into automated tests for evaluating application-specific AI systems. The tool generates test cases, runs them against target systems, and scores results, allowing developers to verify that AI agents follow product-specific policies such as restricting email recipients or limiting confidential data access. The release addresses a growing industry need for repeatable, context-aware testing beyond general benchmarks.

read2 min views20 publishedJun 2, 2026

AI researchers and labs have advanced by leaps and bounds in evaluating AI models for everything from safety and compliance to sycophancy and alignment. But it appears companies and developers are faced with a new, specific need: making sure their AI system behaves as intended for their specific product or service.

In a bid to make that testing process simpler, Microsoft on Tuesday took the wraps off ASSERT, short for Adaptive Spec-driven Scoring for Evaluation and Regression Testing.

The open source framework, Microsoft says, makes evaluating application-specific AI behavior easy by using AI to turn high-level, natural-language descriptions of goals, policies, or intended behaviors into thorough, scored tests that can be investigated.

ASSERT takes plain-language descriptions of an AI model’s expected behavior and policies, turns them into a structured set of acceptable and unacceptable behaviors, generates problem scenarios and test cases, runs them against the target system, and scores the results. It can also record the paths the AI system takes, including intermediate actions and tool calls, so developers can inspect where failures happen.

Devs can provide system context, tools, and constraints, too, if they want to further customize what the evaluations cover.

For example, a developer could specify that a document research AI agent shouldn’t send emails to people outside the company, and it should limit confidential information to C-level executives and provide concise summaries with prior context in mind. ASSERT will use those rules to generate test cases that check whether the system follows those rules on an ongoing basis. The framework, according to Microsoft, fills a gap that broader, more general evaluations cannot when AI models are intended to behave in a manner that is shaped by an application or product’s context, policies, and tools.

“One of the things we’ve learned is that evaluations are absolutely critical to making good decisions,” said Sarah Bird, chief product officer of Responsible AI at Microsoft. “Because if you don’t understand the behavior of the AI system, it’s really hard to know if it’s meeting your organization’s bar … What we found is that if you really want to have a trustworthy system, you should evaluate many more dimensions that are application-specific.”

Bird said ASSERT can be used to evaluate systems when they’re being built, after deployment, and even for continuous monitoring.

The release comes amidst a gradual but broader shift in the AI industry. As models grow more capable, researchers are focusing on repeatable testing and regression checks, with Stanford’s HELM, MLCommons’ AILuminate, and evaluation groups like METR rolling out benchmarks to measure how models behave under different conditions.

source & further reading

techcrunch.com — original article Neil Rimer thinks the AI money is coming back out Vertu wants executives to pay $6,880 for an AI agent — here’s how it actually performs Databricks hits $188B valuation, extending its run as AI’s favorite second act

~/api · this article 200

$curl api.wpnews.pro/v1/news/new-microsoft-tool-lets-…

Read original on techcrunch.com → techcrunch.com/2026/06/02/new-microsoft-tool-let…

mentioned entities

Microsoft

ASSERT

metadata

slugnew-microsoft-tool-lets-devs-spin-up-ai-behavior-tests-using-text-descriptions

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicaltechcrunch.com

navigation

← prevNVIDIA Partners With Microsoft o…

next →‘Stop That Train’ Director Shuts…

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 18 Jul · #artificial-intelligence

Agentic Coding Is Fast. Your Judgment Is the Moat

snipvote.com · 18 Jul · #artificial-intelligence

OpenAI CFO introduces AI scorecard for ROI measurement

byteiota.com · 18 Jul · #artificial-intelligence

ChatGPT Desktop Merges Codex and Work — Atlas Shuts Down Aug 9

ethanniser.dev · 18 Jul · #artificial-intelligence

Not Holding Back the Ocean

── more on @microsoft 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required