04:00
2026-06-12
arxiv.org
large-language-models
Shopping Reasoning Bench: An Expert-Authored Benchmark for Multi-Turn Conversational Shopping Assistants
A new expert-authored benchmark reveals that leading AI models achieve only 57-77% pass rates on multi-turn shopping conversations, with performance dropping 4-18 points as dialogues progress. The Shoโฆ