# [Dataset] Efficient LLM papers (quantization, LoRA, MoE, FlashAttention) from arXiv + Semantic Scholar — 1,734 records, quality-scored, JSONL

> Source: <https://discuss.huggingface.co/t/dataset-efficient-llm-papers-quantization-lora-moe-flashattention-from-arxiv-semantic-scholar-1-734-records-quality-scored-jsonl/176811#post_1>
> Published: 2026-06-15 09:16:09+00:00

Most of us aren’t training frontier models — we’re trying to fit a good one onto the

hardware we actually have. The research that makes that possible (quantization, LoRA/PEFT,

mixture-of-experts, FlashAttention, KV-cache tricks, Mamba/SSMs) is scattered across

hundreds of arXiv papers, and it’s some of the fastest-moving work in ML right now.

So I assembled it into one dataset: fineset-io/efficient-llm-papers

I find it useful as a “what’s the current state of the art for making this cheaper”

reference — and as a clean corpus if you’re fine-tuning a model to reason about

efficiency techniques.

Happy to take suggestions on gaps or answer questions about how the pipeline works.