cd /news/machine-learning/ml-research-datasets-from-arxiv-and-… · home topics machine-learning article
[ARTICLE · art-29258] src=huggingface.co ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

ML research datasets from ArXiv and Semantic Scholar (JSONL, quality-scored)

FineSet released four quality-scored ML research datasets on Hugging Face, covering synthetic data, efficient LLMs, LLM agents, and mechanistic interpretability papers from ArXiv and Semantic Scholar. The datasets are continuously updated and designed for fine-tuning, with thousands of downloads each.

read1 min views2 publishedJun 16, 2026

Hugging Face Models Datasets Spaces Buckets new Docs Enterprise Pricing Website Tasks HuggingChat Collections Languages Organizations Community Blog Posts Daily Papers Learn Discord Forum GitHub Solutions Team & Enterprise Hugging Face PRO Enterprise Support Inference Providers Inference Endpoints Storage Buckets Log In Sign Up Hiring 💼 FineSet fineset-io Follow PhysiQuanty's profile picture 1 follower · 3 following https://fineset.io fineset_io AI & ML interests Export-ready, continuously-updated training datasets from arXiv, GitHub & more. Describe what you want to fine-tune on → get a dataset Recent Activity updated a dataset 1 day ago fineset-io/synthetic-data-papers published a dataset 1 day ago fineset-io/synthetic-data-papers updated a dataset 1 day ago fineset-io/efficient-llm-papers View all activity Organizations None yet models 0 None public yet datasets 4 Sort: Recently updated fineset-io/synthetic-data-papers Viewer • Updated 1 day ago • 738 • 13 fineset-io/efficient-llm-papers Viewer • Updated 1 day ago • 1.73k • 13 fineset-io/llm-agent-papers Viewer • Updated 4 days ago • 1.66k • 49 fineset-io/mechanistic-interpretability-papers Viewer • Updated 4 days ago • 748 • 68 • 1

── more in #machine-learning 4 stories · sorted by recency
── more on @fineset 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ml-research-datasets…] indexed:0 read:1min 2026-06-16 ·