GCSE

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-06-25

arxiv.org

large-language-models

LLM Performance on a Real, Double-Marked GCSE Benchmark

Researchers introduced a dataset of 32,534 double-marked GCSE student responses and tested large language models (LLMs) against examiner consensus. Top-performing LLMs agreed more closely with examine…

// co-occurs with top 1 entities

arXiv 1