Enabling a new model for healthcare with AI co-clinician

Google DeepMind's "AI co-clinician" research initiative, which aims to develop AI that functions as a collaborative member of the care team under expert clinical supervision. The initiative is designed to address a global shortage of health workers by augmenting clinicians with trustworthy, evidence-based support and exploring "triadic care" models where AI assists patients under a physician's authority. Early evaluations show that physicians preferred the AI co-clinician's responses over leading evidence synthesis tools, and the system recorded zero critical errors in 97 out of 98 realistic primary care queries.

Enabling a new model for healthcare with AI co-clinician Health systems worldwide are striving for better outcomes, lower costs, and an improved experience for both patients and clinicians. However, progress is constrained by a global shortage of clinical experts, with the World Health Organization predicting a shortfall of more than 10 million health workers by 2030. While AI is often seen as the key to bridging this gap, it has not yet been able to fully meet the needs of clinicians and patients. That's why, today, we are announcing our AI co-clinician research initiative, to explore how AI could better amplify doctors’ expertise and deliver higher quality care to patients. At Google DeepMind, our journey in medical AI has evolved from mastering examination-style tests of medical knowledge with MedPaLM, to matching physician performance in text-based simulated medical consultations with AMIE, including in real-world feasibility trial settings. We also have a long history of studying how clinicians and AI systems might work together. We hypothesize that the next evolution of healthcare delivery will entail “triadic care” where AI agents can help patients in their care journeys under the clinical authority of their physician. Medicine has always been a team sport, and AI agents can bring more teammates onto the field: extending clinicians' reach while ensuring they retain judgment and control. This serves as the foundation of our AI co-clinician research initiative: AI designed to function as a collaborative member of the care team that interacts with patients under expert clinical supervision. We designed and evaluated AI co-clinician in both clinician and patient-facing settings. Addressing both perspectives is key for AI to enhance the quality, cost, availability and experience of care delivery. Augmenting clinicians with AI co-clinician For a physician, a tool is useful only if it is trustworthy and factually grounded. We therefore researched how well AI co-clinician might support clinicians by surfacing high-quality evidence. In collaboration with academic physicians, we adapted the "NOHARM" framework to test our AI for "errors of commission" incorrect information and "errors of omission" failure to surface critical information . In head-to-head blind evaluations, physicians consistently preferred AI co-clinician’s responses to leading evidence synthesis tools. In objective analysis of 98 realistic primary care queries, our system recorded zero critical errors in 97 cases, improving over two AI systems widely used by physicians. Beyond reliable synthesis of clinical evidence, AI systems should answer queries about medications and therapeutic interventions with the precision that doctors demand. This is a difficult task for AI yet remains underexplored. To address this gap, we evaluated AI co-clinician on the OpenFDA set of RxQA questions, a challenging benchmark designed to assess complex medication knowledge and reasoning. We saw significant progress in navigating these tests, surpassing other frontier AI systems especially when questions were posed in the open-ended way they’re asked in real care. The findings underscore the potential for advanced AI to provide helpful assistance as clinicians navigate the increasingly data-intensive requirements of care planning and management. Researching AI co-clinician’s real time multimodal capabilities in telemedical settings Beyond assistive clinician-facing settings, we are also investigating how AI co-clinician performs within patient-facing research contexts. Expert clinical assessment traditionally includes subtle visual and auditory cues, such as observing a patient’s gait, the nuances of respiratory patterns, or the appearance of skin changes. While prior studies including our work with Beth Israel Deaconess Medical Center demonstrated value in AI text-chats before a doctor’s appointment, restricting interactions to text fundamentally constrains the clinical value of AI. Medicine isn’t just text; it requires eyes, ears and a voice. This is why we are exploring the potential for real-time multimodal AI as an assistive component of the care team. Building on the capabilities of Gemini and Project Astra, we tested the capabilities of AI co-clinician to use live audio and video to engage with patients, simulating telemedical calls where capable AI could one day support better diagnosis and management under expert supervision. Further details regarding our methodology and results are available in our technical report: “Towards Conversational Medical AI with Eyes, Ears and a Voice” Working with academic physicians at Harvard and Stanford, we designed a randomized simulation study with 20 synthetic clinical scenarios and 10 physician "patient-actors". The agent demonstrated new capabilities beyond text-only systems, such as guiding patients through complex physical examinations in real time. For example, it successfully corrected a patient's inhaler technique and guided shoulder maneuvers to identify a rotator cuff injury. While there is frequent discussion regarding AI’s potential to match or exceed human clinical performance, these high-fidelity simulations more rigorously evaluate that premise. We assessed over 140 aspects of consultation skill and found that expert physicians performed better than the AI system overall, particularly in identifying "red flags" and guiding critical physical examinations. This finding suggests these systems are currently best used as supportive tools for practitioners rather than replacements for clinical judgment. At the same time, our work highlights the significant progress in AI’s capabilities: AI co-clinician performed at a level comparable to or exceeding primary care physicians PCPs in 68 of the 140 assessed areas. The results underscore extensive promise and flag specific areas where further research can most impactfully advance medical AI. Below you can see the research team role-playing as hypothetical patients in this telemedical setting with the AI co-clinician, highlighting the system’s potential capabilities and limitations. Engineering trust with safeguards for clinical-grade AI The transition and deployment of AI into clinical environments requires uncompromising architectural and operational safeguards. In our research on simulations of patient-facing telemedical conversations, AI co-clinician uses a dual-agent architecture: a "Planner" module continuously monitors the conversation, verifying that the "Talker" agent stays within safe clinical boundaries. Similarly, to meet doctors’ needs AI co-clinician prioritizes clinical-grade evidence, performing verification and citation checking for retrieval. The evaluations we report above were constructed by physicians to mirror a range of their real-world evidence needs, formulating questions from hypothetical scenarios for rigorously evaluating AI’s capabilities. Research collaborations for rigorous real-world evaluation of AI co-clinician To further develop and assess AI co-clinician, we are currently advancing a phased approach with academic and research collaborators across globally diverse healthcare settings including in the US, India, Australia, New Zealand, Singapore and UAE. As we progress through these evaluation phases, we will further our research in more geos including mission-aligned healthcare organizations and academic medical centers. Our goal is to ensure that medical AI is developed and deployed responsibly in line with applicable standards, supporting better health worldwide. Note: Our research collaborations are not, at this stage, intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease, or to provide medical advice. Acknowledgements We are grateful to our research partners at Harvard Medical School and Stanford Medicine and the many medical centers and care organizations engaging in further trusted tester evaluations with our team. This project involved collaborations with many teams at Google DeepMind, Google Research, Google Cloud and Google for Health and we thank our team mates for insightful discussions and contributions. In particular, AI co-clinician would not have been possible without the core research and engineering efforts of Aniruddh Raghu, Arthur Chen, Charlie Taylor, CJ Park, David Stutz, Devora Berlowitz, Doug Fritz, Dylan Slack, Eliseo Papa, Jack Chen, JD Velasquez, Jing Rong Lim, Katya Tregubova, Kelvin Guu, Meet Shah, Richard Green, Ryutaro Tanno, Sukhdeep Singh, Victoria Johnston, Adam Rodman. We thank our many collaborators for their invaluable contributions, including Ali Eslami, Aliya Rysbeck, Andy Song, Anil Palepu, Anna Cupani, Bakul Patel, Bibo Xu, Brett Hatfield, David Wu, Ed Chi, Emma Cooney, Erica Oppenheimer, Erwan Rolland, Euan A. Ashley, Francesca Pietra, Rebeca Santamaria-Fernadez, Gordon Turner, Gregory Wayne, Hannah Gladman, Irene Teinemaa, Jack O'Sullivan, Jacob Koshy, Jan Freyberg, Jason Gusdorf, Joelle Wilson, Katherine Tong, Juraj Gottweis, Michael Howell, Mili Sanwalka, Pavel Dubov, Pete Clardy, Peter Brodeur, Rachelle Sico, SiWai Man, Sumanth Dahathri, Taylan Cemgil, Tim Strother, Uchechi Okereke, Valentin Lievin, Vishnu Ravi, Yana Lunts, Yun Liu, Simon Staffell, Rachel Teo, Adriana Fernandez Lara, Armin Senoner, Danielle Breen, Paula Tesch, Leen Verburgh, Dimple Vijaykumar, Juanita Bawagan, Muinat Abdul, Mariana Montes and Rob Ashley. Feature videos were produced by Christopher Godfree, Matt Mager, Emma Moxhay and Simon Waldron. Thanks to James Manyika and Demis Hassabis for their insightful guidance and support throughout the research process.