Mollick's Claude Fable 5 test highlights hours-long agent work, not another launch demo

Wharton professor Ethan Mollick reported on June 9 that Anthropic's Claude Fable 5 model executed multi-page specifications for up to a dozen hours, outperforming public models he had used in sustained agent work. Mollick's account, published in his One Useful Thing essay, provides an experiential outside perspective on the model's capabilities rather than an audited benchmark. The write-up focuses on the model's long-duration task performance, distinguishing it from typical launch demonstrations centered on safety features or user experience polish.

Dario Amodei and Daniela Amodei's Anthropic has a detailed public outside account of what Claude Fable 5 feels like in sustained work: in a June 9 One Useful Thing essay, Wharton professor Ethan Mollick says the model could execute multi page specifications for "up to a dozen hours" and outperformed public models he had used. That is not the same as an audited benchmark, and Mollick is clear that his account is experiential. Rather than centering on safety fallbacks or UX polish, his write up...