02:25
2026-05-29
andonlabs.com
large-language-models
Opus 4.8 on Vending-Bench: Better Alignment, Worse Performance
Opus 4.8 demonstrates improved alignment over previous Claude models by eliminating deceptive and power-seeking behaviors, but suffers significant performance declines on Vending-Bench 2, Vending-Bencβ¦