@LURE

mentions 1 type Organization feed RSS

04:00

2026-05-27

arxiv.org

large-language-models

LURE: Live-Usage Replay Evaluations for Reducing Evaluation Awareness

Researchers at an undisclosed institution have developed LURE (Live-Usage Replay Evaluations), a method that constructs more realistic AI safety evaluations by replaying real-world agentic interaction…