cd /news/ai-research/how-to-publish-my-research-in-huggin… · home topics ai-research article
[ARTICLE · art-47373] src=discuss.huggingface.co ↗ pub= topic=ai-research verified=true sentiment=· neutral

How to publish my research in HuggingFace?

A guide explains how to publish research artifacts on Hugging Face, recommending splitting work into separate components: code on GitHub, model weights in HF Model repos, evaluation data in HF Dataset repos, demos in Spaces, and related pieces in Collections, with papers submitted to arXiv first.

read11 min views1 publishedJul 3, 2026
How to publish my research in HuggingFace?
Image: Discuss (auto-discovered)

I’m not a researcher, but I know this area can be confusing:

I would split this by artifact first, rather than trying to publish the whole thing as one object on Hugging Face.

For your current stage, I would think of it like this: Code / library / experiments → GitHub as the canonical code home. Weights / checkpoints / adapters → HF Model repo, if you have model artifacts. Eval data / LITM buckets / benchmark traces → HF Dataset repo or archived release. Interactive demo → HF Space, if a small demo helps people understand it. Several related pieces → HF Collection as the project landing page. Method / evaluation / framing question → Research forum. Runnable project / repo / model / Space showcase → Show and Tell. Paper-quality write-up later → arXiv first, then link the paper from the HF READMEs/cards so HF Paper Pages can connect the artifacts.

HF is usually the home for the ML artifacts around the work — models, datasets, demos, cards, eval material, project pages, and links — not necessarily the place where the paper itself starts. The Hub’s main object types are Models, Datasets, and Spaces.

Practical route I would use

  • Put the canonical code and docs in GitHub.
  • Add one minimal runnable example.
  • Add one baseline command.
  • If there are weights/checkpoints/adapters, publish them in a HF Model repo with a useful model card.
  • If there are eval examples, LITM buckets, traces, or public test cases, publish them in a HF Dataset repo or archived release.
  • If the behavior is easy to show interactively, create a small Space.
  • If there are multiple pieces, group them in a Collection.
  • Then post in Research or Show and Tell, depending on what kind of feedback you want.
  • If the write-up later becomes paper-quality, go through arXiv and then connect the paper to the HF artifacts.

The main goal is to let people inspect one piece at a time instead of asking them to evaluate one large “research package” with unclear boundaries.

Decision tree

If this is mostly an idea or method sketch:

  • Post in Research.
  • Ask for feedback on framing, evaluation design, and related work.
  • Keep claims modest until the evaluation protocol is easy to inspect.

If this is mostly runnable code:

  • Put the canonical source in GitHub.
  • Include install steps, one minimal example, one baseline, a license, and a README.

If this includes weights/checkpoints/adapters:

  • Use a HF Model repo.
  • Write a model card that explains what the artifact is, how to run it, intended use, limitations, training/eval details, metrics, and links to code/eval material.

If this includes evaluation artifacts:

  • Use a HF Dataset repo or archived release.
  • Document schema, construction method, metric, limitations, and license.

If this includes a demo:

  • Use a Space.
  • Be careful with source visibility and secrets.

If this has several related artifacts:

  • Use a Collection as the project index.

If this becomes a paper later:

  • Use arXiv or another normal paper/preprint route first.
  • Then link the paper from the relevant HF README/model card/dataset card/Space README.

#

More detail on where each piece fits

| Piece | Good home | Why | | Method overview / project docs | GitHub README, plus maybe HF README | Gives people the shortest path to understanding the project | | Source code / library | GitHub | Better for issues, PRs, releases, tests, contribution workflow | | Weights / checkpoints / adapters | HF Model repo | Natural Hub object for model artifacts | | Eval data / LITM buckets / benchmark traces | HF Dataset repo or archived release | Makes validation material reusable and documented | | Interactive demo | HF Space | Lets people try the behavior directly | | Whole project index | HF Collection | One page linking model/data/demo/paper/code | | Method/eval discussion | Research forum | Good for research questions and project coordination | | Concrete artifact showcase | Show and Tell | Good for runnable projects, Spaces, Models, Datasets | | Paper/preprint | arXiv later, if appropriate | Paper route first; HF can link artifacts afterward |

Code / library

If TIS is mainly a method, library, or experimental codebase, I would make GitHub the canonical code home. That is not because HF cannot store code. HF repos are Git repositories too; see HF repositories. It is just that GitHub is usually clearer for:

  • issues
  • pull requests
  • releases
  • tests
  • examples
  • CI
  • contributor workflow
  • citation files
  • release notes

A good README would answer:

  • What is TIS?
  • What problem is it trying to address?
  • What is the current status?
  • What is public now?
  • How do I run one minimal example?
  • How do I reproduce one result?
  • What is the baseline?
  • What are the limitations?
  • What kind of feedback do you want?

GitHub’s README guidance is here: About READMEs. Model artifacts

If there are actual weights, checkpoints, adapters, configs, or tokenizer/model files, then a HF Model repo makes sense. A model card should include enough context that someone can decide whether the artifact is relevant and safe to try:

  • model summary
  • quickstart
  • intended use
  • non-goals
  • limitations
  • training/eval setup
  • datasets
  • metrics
  • hardware/runtime notes if relevant
  • link to canonical code
  • link to eval data/scripts
  • license

Relevant docs:

Evaluation data / benchmark artifacts

If the reusable part is not a model but evaluation material, a Dataset repo may be more useful than a Model repo. Examples:

- LITM-style examples
- context-position buckets
  • benchmark traces
  • small public eval split
  • input/output pairs
  • scoring metadata
  • failure cases
  • scripts or data needed to reproduce a plot/table

The dataset card should explain:

  • schema
  • how examples were created
  • model/version used, if relevant
  • metric
  • limitations
  • license
  • whether the data is synthetic, derived, filtered, or hand-authored

Relevant docs:

Demo

If a short interactive example makes the idea easier to understand, a Space can help more than a long description. But public demos are public artifacts. If source visibility matters, check the Space visibility options before publishing. Also do not put secrets directly in code; use Space secrets or environment variables.

Relevant docs:

Project index

If the project becomes:

  • GitHub repo
  • Model repo
  • Dataset repo
  • Space
  • paper
  • discussion thread

then a Collection is probably the cleanest public landing page.

See Collections. What I would include before asking for validation

I would not ask people to validate the whole idea in the abstract. I would give them one small reproducible path.

Minimum useful bundle:

  • exact model name and revision
  • install command
  • one small input/example
  • one command that runs TIS
  • one command that runs a baseline
  • one small table or plot
  • dataset/eval split
  • metric
  • seed, if relevant
  • hardware/runtime notes
  • known failure cases
  • link to code
  • link to data/model/demo, if public

That turns the feedback request from “please validate my research” into something people can actually inspect.

For example, the forum ask could be: I am sharing the current code/docs for TIS. I am not asking for a full research review yet. I would mainly like feedback on the artifact structure, the minimal reproduction, the LITM/eval protocol, and which baseline or comparison should be added first.

#

Possible validation checklist for LITM / KV-cache-style work

Since you mention Lost in the Middle and a KV-cache-compressor-like direction, I would make the evaluation protocol explicit.

I would expect people to ask for:

  • base model name and exact revision
  • context length
  • where the relevant information is placed
  • bucket boundaries / position sweep design
  • dataset construction method
  • prompt format
  • generation settings
  • metric
  • no-compression baseline
  • simple recency baseline, if relevant
  • simple position/bucket baseline, if relevant
  • compression or memory budget, if relevant
  • latency / memory / quality tradeoff
  • random seed, if relevant
  • hardware/runtime
  • one command to reproduce one result
  • known failure cases

This does not mean you must solve all of this before posting. But even one tiny reproducible example plus one baseline will make the discussion much more concrete.

Optional later references/search directions, not requirements:

I would not put too many related-work links in the first post unless the artifacts are already available. Otherwise the thread can turn into a literature-review discussion before people can run anything.

Forum category

Use Research if the main question is:

  • method framing
  • evaluation design
  • related work
  • whether the problem statement is clear
  • coordination with people interested in the method
The category description says it is for research questions or project coordination: [About the Research category](https://discuss.huggingface.co/t/about-the-research-category/26).

Use **Show and Tell** if you already have something people can inspect or run:
  • GitHub repo
  • Model repo
  • Dataset repo
  • Space
  • reproducible example
  • release announcement

The category description says it is for sharing projects such as Spaces, Models, and Datasets and getting feedback: About the Show and Tell category.

A practical route could be:

  • Start with Research if the structure/evaluation is still unclear.
  • Publish the code/model/data/demo pieces.
  • Post a clearer Show and Tell thread once people can actually inspect or run the artifacts.

Later: arXiv and HF Paper Pages

I would treat arXiv/Paper Pages as a later step, not the first thing you need to solve.

Once the write-up becomes paper-quality:

  • Publish the preprint through the normal paper route, for example arXiv if appropriate.
  • Add the arXiv link to the relevant HF README/model card/dataset card/Space README.
  • HF can use those links to connect the artifacts with the Paper Page.

So I would not think of it as:

upload my whole paper/project to HF first

I would think of it as:

publish/link the artifacts clearly, then connect them to a paper later if the work reaches that stage

#

HF Paper Page note

HF Paper Pages are for finding and discussing artifacts related to a paper, such as models, datasets, and Spaces. The link often starts from the artifact side: if the README/model card/dataset card contains an arXiv or HF Paper Page link, HF can extract the arXiv ID and add the relevant tag.

See Paper Pages. Paper indexing and authorship claim issues are separate from the project-structure decision.

If a Paper Page/indexing/authorship step gets stuck later, I would treat that as a website-support issue, not as a sign that the project was organized incorrectly.

There is a useful forum note here: [Papers indexing/authorship claim quick help](https://discuss.huggingface.co/t/papers-indexing-authorship-claim-quick-help/173387).

Very roughly:

  • indexing/paper-not-found issues: wait, retry later, do not repeatedly hammer the button, then post the arXiv link/error/time if it persists
  • authorship claim stuck/rejected/already-claimed issues: check HF profile name/email alignment, then use the email route described in that thread if needed

This is only relevant later if you go through arXiv/Paper Pages.

Blog Articles, Posts, DOI, and citation

Blog Articles or Posts can be useful later for a long-form explanation, release note, or research update. I would not make them the canonical home of the project.

Publishing Blog Articles/Posts may also require PRO or organization permissions, depending on namespace and account type. See Blog Articles and HF PRO.

For priority/citation, I would not rely only on a forum post. Use versioned, citable snapshots where appropriate. Examples:

  • GitHub release + Zenodo DOI for code
  • HF DOI for model/dataset artifacts CITATION.cff

for citation metadata

  • arXiv later for a paper/preprint, if appropriate

This is not plagiarism-proofing. It just makes the public record clearer: what was released, when, under which version, and how it should be cited.

#

Citation / timestamp note

For code:

For HF model/dataset artifacts:
  • HF can generate DOIs for models and datasets.
  • See Digital Object Identifier (DOI).
  • Be careful: DOI-backed artifacts are intended to be persistent. Deleting, renaming, or changing visibility may require support.

If patent/commercial IP matters, do not treat a public forum post as legal strategy. Be careful before public disclosure. I am not giving legal advice here. Concrete next action

I would start with one small public bundle:

  • A GitHub README explaining TIS, current status, non-goals, and how to run one example.
  • One minimal runnable example.
  • One baseline command.
  • One tiny result table/plot.
  • A clear license.
  • A CITATION.cff

, if you want people to cite the code.

  • A HF Model repo only if there are weights/checkpoints/adapters.
  • A HF Dataset repo only if there are reusable eval examples/traces.
  • A Space only if a demo makes the behavior easier to understand.
  • A Collection once there is more than one artifact.
  • A Forum post with specific feedback questions.

That gives experienced people a much easier path to help.

── more in #ai-research 4 stories · sorted by recency
── more on @hugging face 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-publish-my-re…] indexed:0 read:11min 2026-07-03 ·