04:00
2026-06-24
arxiv.org
large-language-models
T2D-Bench: Evidence-Gated Evaluation of LLM Outputs for Type 2 Diabetes Using a Multi-Layer Clinical-Lifestyle Knowledge Graph
Researchers introduced T2D-Bench, a benchmark for evaluating LLM outputs on type 2 diabetes using a multi-layer clinical-lifestyle knowledge graph. Testing showed GPT-4o-mini and GPT-4o failed evidencβ¦