Review Arcade: On the Human Alignment and Gameability of LLM Reviews

wpnews.pro

cd /news/large-language-models/review-arcade-on-the-human-alignment… · home › topics › large-language-models › article

[ARTICLE · art-17148] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=large-language-models verified=true sentiment=· neutral

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

A new study from the University of Hamburg found that LLM-generated peer reviews for scientific papers show only limited alignment with human evaluations, with alignment varying significantly across different prompts and models. The researchers also demonstrated that authors can effectively "game" these LLM reviews through an iterative draft-revise workflow, achieving statistically significant score increases for up to 35% of papers submitted to the 2025 ACL Rolling Review. The findings raise concerns about the integrity of AI-assisted peer review as major conferences begin piloting LLM-generated reviews.

read1 min views10 publishedMay 29, 2026

arXiv:2605.28897v1 Announce Type: new Abstract: LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their papers before submitting. In this work, we perform empirical experiments on papers from the 2025 ACL Rolling Review (ARR) to evaluate LLM reviews from both the author and the reviewer perspective. First, we identify a limited alignment of LLM reviews with human ones. In the best-case scenario, the alignment is reasonable. However, we also find that LLM-human alignment varies substantially across prompts and models. Finally, we investigate the scenario in which the author uses an iterative draft-revise workflow to improve the submission according to the LLM review. We find that this "gaming" of LLM reviews can be effective in specific scenarios, leading to a statistically significant increase of overall scores for up to 35% of papers. We publish our code: https://github.com/uhh-hcds/reviewarcade.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/review-arcade-on-the-hum…

Read original on arxiv.org → arxiv.org/abs/2605.28897

mentioned entities

ACL Rolling Review

ARR

University of Hamburg

Review Arcade

metadata

slugreview-arcade-on-the-human-alignment-and-gameability-of-llm-reviews

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevChatGPT glitch is leaking OpenAI…

next →New infosec products of the mont…

── more in #large-language-models 4 stories · sorted by recency

pub.towardsai.net · 15 Jul · #large-language-models

How I Started to See Inside the LLM

simonwillison.net · 15 Jul · #large-language-models

How I tricked Claude into leaking your deepest, darkest secrets

dev.to · 15 Jul · #large-language-models

Why LLM Decisions Should Be Deterministic

ibtimes.co.uk · 15 Jul · #large-language-models

Anthropic's New AI Ad Is So Disturbing, OpenAI CEO Sam Altman Thought It Was Satire

── more on @acl rolling review 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required