# General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

> Source: <https://marginalrevolution.com/marginalrevolution/2026/06/general-purpose-large-language-models-outperform-specialized-clinical-ai-tools-on-medical-benchmarks.html?utm_source=rss&utm_medium=rss&utm_campaign=general-purpose-large-language-models-outperform-specialized-clinical-ai-tools-on-medical-benchmarks>
> Published: 2026-06-15 05:16:01+00:00

# General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

This result does not surprise me at all. Here is part of the abstract:

Frontier LLMs outperformed clinical AI tools in all three evaluations. Clinical AI tools performed comparably to auto-enabled Google Search AI Overview on the RCQ. These findings highlight the need for independent, real-world evaluation of AI tools before they enter clinical settings.

[From Krithik Viswanath, et.al](https://www.nature.com/articles/s41591-026-04431-5). As a side note, this (and the more general version of the point) is one big reason why some fairly large number of Emergent Ventures proposals are rejected rather quickly.