cd /news/ai-safety/can-public-chat-data-predict-real-wo… · home topics ai-safety article
[ARTICLE · art-30540] src=lesswrong.com ↗ pub= topic=ai-safety verified=true sentiment=· neutral

Can public chat data predict real-world AI misalignments?

OpenAI researchers tested whether public chat data from WildChat can predict real-world AI misalignments, finding that deployment simulations using public conversations can estimate rates of undesirable model behavior, offering external evaluators a way to assess frontier models without access to private production data.

read1 min views1 publishedJun 17, 2026

This is an unofficial automated linkpost.

Frontier AI models are increasingly used in settings with real economic, legal, and societal consequences. As a result, governments, AI safety organizations and independent researchers need ways to evaluate how these systems behave under realistic conditions.

Traditional evaluations use hand-written, synthetic, or adversarial prompts to stress-test known risks and compare models under controlled conditions. But these prompts can be narrow, unrepresentative, or recognizable as tests. An alternative, complementary way to evaluate how models behave in the real world is often to look at real conversations users have with them. LLM developers can do this internally, by sampling examples from production data to check whether models responded appropriately and how often different failures occur. Evidence grounded in real usage helps close the gap between benchmark results and deployment behavior [1], and is less vulnerable to models behaving differently simply because they are being tested [2,3,4]. But outside evaluators generally cannot access this evidence. Because real user conversations are private, labs usually cannot share them with AI safety organizations, academics, or independent researchers. As a result, the most informative evidence about frontier model behavior relies on data that is often available only to the labs that built them.

Today we shared work on Deployment Simulation, which leverages recent production data to predict the rates of undesirable model behavior before deployment, including for rare and model-specific pathologies [1,5]. In this blog, we ask whether external groups can use this technique to evaluate frontier language models by switching the source dataset for a publicly available substitute, WildChat [6].

Continue reading at alignment.openai.com →

── more in #ai-safety 4 stories · sorted by recency
── more on @openai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/can-public-chat-data…] indexed:0 read:1min 2026-06-17 ·