General-purpose LLMs beat specialized AI tools in Nature Medicine study

General-purpose frontier LLMs outperformed two specialized clinical AI tools across medical benchmarks in a Nature Medicine study published June 12. Researchers compared OpenEvidence and UpToDate Expert AI with GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6, finding that the clinical products' lack of transparency may limit their reliability.

General purpose frontier LLMs outperformed two specialized clinical AI tools across medical benchmarks in a Nature Medicine brief communication published June 12. Krithik Vishwanath and co authors compared OpenEvidence and UpToDate Expert AI with GPT 5.2, Gemini 3.1 Pro and Claude Opus 4.6. The clinical products are built on LLMs and marketed for medical use, but the researchers wrote that their architectures, base models and training pipelines are not public, leaving clinicians and health sy...