LLMs Can Better Capture Human Judgments--With the Right Prompts

wpnews.pro

cd /news/large-language-models/llms-can-better-capture-human-judgme… · home › topics › large-language-models › article

[ARTICLE · art-24819] src=arxiv.org ↗ pub=2026-06-12T04:00Z topic=large-language-models verified=true sentiment=· neutral

LLMs Can Better Capture Human Judgments--With the Right Prompts

Researchers at arXiv have demonstrated that simple prompting strategies can improve large language models' ability to capture human judgments, addressing limitations in response distribution and wording instability. By prompting models to report standard deviations and response proportions, and ensuring scenarios are clear to human participants, the team achieved better AI-human alignment across 144 moral scenarios and 38 moral beliefs from 32 countries. The findings suggest that refining how questions are asked to LLMs can yield more accurate representations of human variability, though models' self-calibration remains poor.

read1 min views20 publishedJun 12, 2026

arXiv:2606.12754v1 Announce Type: new Abstract: Are large language models (LLMs) bad at capturing human judgment? Two commonly stated limitations are that LLMs fail to capture full distributions of responses, and that their judgments are unstable across wording variations. We demonstrate simple prompting strategies that mitigate these limitations. Across two datasets--a U.S.-representative set of 144 moral scenarios and 38 moral beliefs from the International Social Survey Programme's Family and Changing Gender Roles module covering 32 countries--we show how simple elicitation techniques help improve AI-human alignment. First, prompting models to report standard deviations and response proportions recovers the full range of human responses better than common strategies. Second, ensuring scenarios are clear to human participants--as reflected in human confusion ratings--boosts model alignment, and LLMs can track human confusion ratings. At the same time, we find that LLMs' estimates of their own error are poorly calibrated, though they can predict human variability relatively well. These results suggest that asking better questions to LLMs can yield better answers.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/llms-can-better-capture-…

Read original on arxiv.org → arxiv.org/abs/2606.12754

mentioned entities

International Social Survey Programme

metadata

slugllms-can-better-capture-human-judgments-with-the-right-prompts

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevLinear Coding Sessions

next →Can KKR Outmaneuver One of the B…

── more in #large-language-models 4 stories · sorted by recency

garymarcus.substack.com · 2 Aug · #large-language-models

OpenAI’s amazing — but vastly oversold — new model Astra

lesswrong.com · 2 Aug · #large-language-models

Single Forward Pass Evals on Fable, Opus 5, and GPT-5.6-Sol

shape-of-code.com · 2 Aug · #large-language-models

Flagging poor algorithm choice: LLMs next role

seanhelvey.com · 2 Aug · #large-language-models

AI Mania: From Tulips to Tokens

── more on @international social survey programme 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required