[Dataset] Efficient LLM papers (quantization, LoRA, MoE, FlashAttention) from arXiv + Semantic Scholar — 1,734 records, quality-scored, JSONL

A new dataset, fineset-io/efficient-llm-papers, compiles 1,734 records of arXiv and Semantic Scholar papers on efficient LLM techniques like quantization, LoRA, MoE, and FlashAttention, each quality-scored in JSONL format. The dataset aims to serve as a reference for state-of-the-art efficiency methods and a clean corpus for fine-tuning models to reason about these techniques.

Most of us aren’t training frontier models — we’re trying to fit a good one onto the hardware we actually have. The research that makes that possible quantization, LoRA/PEFT, mixture-of-experts, FlashAttention, KV-cache tricks, Mamba/SSMs is scattered across hundreds of arXiv papers, and it’s some of the fastest-moving work in ML right now. So I assembled it into one dataset: fineset-io/efficient-llm-papers I find it useful as a “what’s the current state of the art for making this cheaper” reference — and as a clean corpus if you’re fine-tuning a model to reason about efficiency techniques. Happy to take suggestions on gaps or answer questions about how the pipeline works.