ML research datasets from ArXiv and Semantic Scholar (JSONL, quality-scored)

wpnews.pro

cd /news/machine-learning/ml-research-datasets-from-arxiv-and-… · home › topics › machine-learning › article

[ARTICLE · art-29258] src=huggingface.co ↗ pub=2026-06-16T09:31Z topic=machine-learning verified=true sentiment=↑ positive

ML research datasets from ArXiv and Semantic Scholar (JSONL, quality-scored)

FineSet released four quality-scored ML research datasets on Hugging Face, covering synthetic data, efficient LLMs, LLM agents, and mechanistic interpretability papers from ArXiv and Semantic Scholar. The datasets are continuously updated and designed for fine-tuning, with thousands of downloads each.

read1 min views24 publishedJun 16, 2026

Hugging Face Models Datasets Spaces Buckets new Docs Enterprise Pricing Website Tasks HuggingChat Collections Languages Organizations Community Blog Posts Daily Papers Learn Discord Forum GitHub Solutions Team & Enterprise Hugging Face PRO Enterprise Support Inference Providers Inference Endpoints Storage Buckets Log In Sign Up Hiring 💼 FineSet fineset-io Follow PhysiQuanty's profile picture 1 follower · 3 following https://fineset.io fineset_io AI & ML interests Export-ready, continuously-updated training datasets from arXiv, GitHub & more. Describe what you want to fine-tune on → get a dataset Recent Activity updated a dataset 1 day ago fineset-io/synthetic-data-papers published a dataset 1 day ago fineset-io/synthetic-data-papers updated a dataset 1 day ago fineset-io/efficient-llm-papers View all activity Organizations None yet models 0 None public yet datasets 4 Sort: Recently updated fineset-io/synthetic-data-papers Viewer • Updated 1 day ago • 738 • 13 fineset-io/efficient-llm-papers Viewer • Updated 1 day ago • 1.73k • 13 fineset-io/llm-agent-papers Viewer • Updated 4 days ago • 1.66k • 49 fineset-io/mechanistic-interpretability-papers Viewer • Updated 4 days ago • 748 • 68 • 1

source & further reading

huggingface.co — original article PIN v2. Choosing a model's size and speed before you train it AI Workflow for Managing Mass Email Sending Services? Hugging Face Website Access Issue in Pakistan

~/api · this article 200

$curl api.wpnews.pro/v1/news/ml-research-datasets-fro…

Read original on huggingface.co → huggingface.co/fineset-io

mentioned entities

FineSet

Hugging Face

ArXiv

Semantic Scholar

metadata

slugml-research-datasets-from-arxiv-and-semantic-scholar-jsonl-quality-scored

topic#machine-learning

secondary2 topics

sentimentpositive

canonicalhuggingface.co

navigation

← prev‘Pretty Crazy’ Token Usage Is Te…

next →CEO David Cordani built Cigna in…

── more in #machine-learning 4 stories · sorted by recency

the-decoder.com · 31 Jul · #machine-learning

New Deepseek Flash model matches OpenAI's GPT-5.6 Luna at roughly 60 percent lower cost

noemamag.com · 31 Jul · #machine-learning

AI Has Already Entered The ‘Loss Of Control’ Transition

dev.to · 31 Jul · #machine-learning

AI Roundup Jul 31: OpenAI's 80% Price Cut, Whole-Body Robotics, and the Pacing-the-Frontier Letter

zdnet.com · 31 Jul · #machine-learning

How OpenAI's agent escaped: Sprung by humans in a series of preventable events

── more on @fineset 3 stories trending now

wpnews · 30 Jul · #artificial-intelligence

Microsoft and Meta Earnings Show Different AI Spending Pressures

wpnews · 31 Jul · #artificial-intelligence

Rewriting a Six-Year-Old Personal Project with AI

wpnews · 31 Jul · #artificial-intelligence

Microsoft doubles down on multi-model AI as it builds a Copilot super app

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required