NarraDolma

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-06-19

arxiv.org

large-language-models

Characterizing Narrative Content in Web-scale LLM Pretraining Data

Researchers at the University of Washington and Allen Institute for AI conducted the first fine-grained study of narrative features in Dolma, a 3-trillion-token open LLM pretraining corpus. They devel…

// co-occurs with top 5 entities

University of Washington 1 Allen Institute for AI 1 Dolma 1 NarraBERT 1 RoBERTa 1