21:30
2026-06-14
sparsethought.com
large-language-models
A bitter lesson for medicine, or a benchmark problem?
A Nature Medicine paper claiming general-purpose LLMs outperform specialized clinical tools on medical benchmarks is criticized for flawed methodology. The benchmark, Real Clinical Queries, evaluated โฆ