cd /news/large-language-models/ensembles-of-large-language-models-f… · home topics large-language-models article
[ARTICLE · art-33534] src=arxiv.org ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts

Researchers developed an ensemble of large language models, including Google's Gemini and Gemma, to automatically identify EQ-5D health-related quality-of-life studies in PubMed from abstracts. The weighted ensemble achieved 0.74 weighted F1-score and accuracy, outperforming individual models and offering a scalable approach for systematic literature reviews.

read1 min views1 publishedJun 19, 2026

arXiv:2606.19345v1 Announce Type: new Abstract: The rapid increase in scientific publications leads to the fact that manual study screening in systematic literature reviews (SLRs) is increasingly resource consuming, inefficient, and inconsistent. Classifying studies that clearly report health-related quality-of-life results, such as EQ-5D data, requires a high level of clinical interpretation and poses challenges for human reviewers. This study investigates the use of Google's Gemini and Gemma large language models (LLMs) in automating EQ-5D detection in the PubMed biomedical database based only on published abstracts. A multi-phase framework is proposed that integrates few-shot prompting, weight ensembling aggregation, and a soft stacking meta-classifier. Nine LLMs are evaluated on a dataset of PubMed studies manually labeled by two experts regarding EQ-5D reporting. The weighted ensemble of gemini-2.5-pro, gemma-3-12b, and gemma-3-27b obtained a 0.74 weighted F1-score and 0.74 accuracy, exceeding individually attained results. The ensembling of top-performing models improved the balance between precision and recall compared to individual models, while the soft stacking approach provided greater reliability and interpretability. Feature analysis shows that the probability results from the models are important in guiding the final predictions. The findings suggest that an ensemble-based LLM setup is a reliable and scalable approach for automating screening in biomedical research.

── more in #large-language-models 4 stories · sorted by recency
── more on @google 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ensembles-of-large-l…] indexed:0 read:1min 2026-06-19 ·