Designing the hf CLI as an agent-optimized way to work with the Hub

wpnews.pro

hf

is the official command-line entrypoint to the Hugging Face Hub. Anything you can do on the Hub from the Python SDK, you can do from your terminal: download and upload models, datasets and Spaces; create and manage repos, branches, tags and pull requests; run Jobs on HF infrastructure; manage Buckets, Collections, webhooks and Inference Endpoints. The hf

CLI has been primarily built for our users over the years. But it's now increasingly used by coding agents: Claude Code, Codex, Cursor and more. So we rebuilt it to make it work for both audiences at once. This blog post summarizes what we did, and how we benchmarked it. We found that on complex, multi-step tasks the no-CLI baseline (an agent hand-rolling curl

or the Python SDK) uses up to 6× as many tokens as the hf

CLI.

AI agent traffic on the Hub #

We started tracking agent usage of the Hub in April 2026. The hf

CLI (and the huggingface_hub

Python SDK it's built on) detects when a coding agent is driving it by reading the environment variables agents set: CLAUDECODE

/CLAUDE_CODE

for Claude Code, CODEX_SANDBOX

for Codex, plus Cursor, Gemini, Pi, and the universal AI_AGENT

. That single signal does two jobs: it shapes the CLI's output (more on that below) and it tags each Hub request with an agent/<name>

user-agent, so we can attribute traffic to the agent driving it. The two largest by distinct users are Claude Code and Codex, well ahead of everything else, and they're the two agents we benchmark later in this article.

The bars count distinct users per agent; request volume is the sub-label. Claude Code alone is ~40k users and nearly 49M requests, with Codex close behind. These are early numbers (we only began attributing agent traffic in April 2026), but the scale is already significant, and we expect it to keep growing as coding agents become a standard way to work with the Hub.

Built for humans and agents #

Humans and coding agents expect different outputs for the same hf

commands. A human wants rich terminal output: ANSI color, padded tables truncated to fit the screen, a green ✅ on success, ✔

for booleans, progress bars, prose hints. An agent wants the inverse: no ANSI, nothing truncated, every value in full since an agent can handle far denser output than a human, kept compact and structured to stay light on tokens. It also can't answer a CLI prompt and will happily re-run a command after a timeout. The rest of this section is how hf

gives each side what it needs. We introduced agent-mode output in hf

v1.9.0 and have been migrating the rest of the CLI to it gradually in the following releases.

One command, multiple renderings

When hf

auto-detects agent use (via the environment variables mentioned above), it renders the same command differently. It optimizes output format for humans or agents without passing a flag:

> hf models ls --author Qwen --sort downloads --limit 3
ID                       CREATED_AT DOWNLOADS LIBRARY_NAME LIKES PIPELINE_TAG    PRIVATE TAGS
------------------------ ---------- --------- ------------ ----- --------------- ------- -------------------------
Qwen/Qwen3-0.6B          2025-04-27  21156913 transformers  1285 text-generation         transformers, safetens...
Qwen/Qwen2.5-1.5B-Ins... 2024-09-17  15143953 transformers   725 text-generation         transformers, safetens...
Qwen/Qwen3-4B            2025-04-27  14808352 transformers   625 text-generation         transformers, safetens...
Hint: Use `--no-truncate` or `--format json` to display full values.

$ hf models ls --author Qwen --sort downloads --limit 3
id      created_at      downloads       library_name    likes   pipeline_tag    private tags
Qwen/Qwen3-0.6B 2025-04-27T03:40:08+00:00       21156913        transformers    1285    text-generation False   ['transformers', 'safetensors', 'qwen3', 'text-generation', 'conversational', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-0.6B-Base', 'base_model:finetune:Qwen/Qwen3-0.6B-Base', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us']
Qwen/Qwen2.5-1.5B-Instruct      2024-09-17T14:10:29+00:00       15143953        transformers    725     text-generation False['transformers', 'safetensors', 'qwen2', 'text-generation', 'chat', 'conversational', 'en', 'arxiv:2407.10671', 'base_model:Qwen/Qwen2.5-1.5B', 'base_model:finetune:Qwen/Qwen2.5-1.5B', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us']
Qwen/Qwen3-4B   2025-04-27T03:41:29+00:00       14808352        transformers    625     text-generation False   ['transformers', 'safetensors', 'text-generation', 'arxiv:2309.00071', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-4B-Base', 'base_model:finetune:Qwen/Qwen3-4B-Base', 'license:apache-2.0', 'endpoints_compatible', 'deploy:azure', 'region:us']

A human gets an aligned table, truncated to fit the terminal, plus a hint on how to see more, with color cues for status (a green ✓

on success, red on error). An agent gets the complete record as TSV: full repo ids, full ISO timestamps, every tag, no ANSI codes, nothing truncated, clean to parse and light on tokens.

In practice, we've implemented logging methods like .table(...)

, .result(...)

, .json()

, etc., which take raw data as input and handle the formatting. In addition to human and agent modes, we've introduced --json

and --quiet

options to make it easier to pipe commands together. The default mode is automatically chosen based on context, but users can always force the format of their choice with --format human | agent | json | quiet

.

Next-command hints

CLI commands rarely run in isolation: one step usually implies the next (git add

, then git commit

). Many hf

commands now end with a hint: the exact next command to run, pre-filled with the IDs you just used, so a user or agent can chain straight to the next step instead of working it out from scratch. Start a Job in the background and it points you to its logs; create a Space and it points you to its boot status:

$ hf jobs run --detach python:3.12 python train.py
✓ Job started
  id: 6f3a1c2e9b
  url: https://huggingface.co/jobs/celinah/6f3a1c2e9b
Hint: Use `hf jobs logs 6f3a1c2e9b` to fetch the logs.

For a human that's a convenience. For an agent it's a rail: the next action is named, parameterized with the right ids, and ready to run, so it takes fewer steps working out what to do. Errors behave the same way, naming the fix instead of just failing:

Error: Not logged in. Run `hf auth login` first.

Hints, warnings and errors all go to stderr while data goes to stdout, so none of this guidance pollutes the output the agent is parsing.

Non-blocking and safe to retry

hf

never sits on an interactive prompt waiting for a key an agent can't press. A destructive command still asks a human to confirm, but in agent mode it fails fast with the fix in the message (Use --yes to skip confirmation.

), and -y

/--yes

skips it. And because agents retry on timeouts and lost context, operations are built to be safe to repeat: hf repos create --exist-ok

is a no-op if the repo already exists, and re-running an upload re-commits cleanly. Separately, the commands that move real data take a --dry-run

that shows exactly what they'll transfer before they run, which proves handy for humans and agents alike, since neither has to commit to a long download or blind sync:

$ hf repos delete my-org/old-model
Error: You are about to permanently delete model 'my-org/old-model'. Proceed? Use --yes to skip confirmation.

$ hf download deepseek-ai/DeepSeek-V4-Pro config.json --dry-run
[dry-run] Will download 1 files (out of 1) totalling 1.8K.
file         size
config.json  1.8K

Discoverable, predictable commands

hf

is built to be probed: run hf

to see the resource groups, run --help

on the one you need, and every --help

ends with real, copy-pasteable examples (which an agent matches against far faster than it parses a description):

$ hf models ls --help
...
Examples
  $ hf models ls --sort downloads --limit 10
  $ hf models ls --search "qwen" --author Qwen
  $ hf models ls Qwen/Qwen3-4B --tree

The command tree is consistent, resource + verb with the obvious aliases (hf models ls

, hf repos create

, hf jobs ps

, hf collections delete

; list

/ls

, remove

/rm

), so once an agent learns one command it can guess the rest. And the output composes: -q

prints one id per line to pipe into the next command, --json

gives you something to hand to jq.

$ hf models ls --author Qwen -q | head -3
Qwen/Qwen3-0.6B
Qwen/Qwen2.5-1.5B-Instruct
Qwen/Qwen3-4B

Benchmarking the hf CLI for Coding Agents #

To find out whether the hf

CLI is really more efficient for agents, we measured it. We built a small evaluation harness and ran the same set of Hub tasks through each way of driving the Hub, many times over, grading every run against the live Hub. Here's the headline before the methodology: across both agents the hf

CLI comes out ahead, most clearly on complex, multi-step tasks where it uses far fewer tokens.

agent	tool	success score
Claude Code (Sonnet 4.6)
`hf` CLI
0.94
baseline	2 / 163
curl / Python SDK	0.84	1.3-1.6× tokens
11 / 163
Codex (GPT-5.5)
`hf` CLI
0.93
baseline	3 / 163
curl / Python SDK	0.92	1.6-1.8× tokens
10 / 163

(self-report error = the agent reported success on the 17 solvable tasks but the Hub said otherwise. The hf CLI rows are the CLI with its skill installed; what the skill adds on top of the bare CLI (chiefly fewer tool calls) is broken out in the skill section below. Representative transcripts are published in this bucket.)

The setup

We defined 18 non-trivial Hub tasks. Not "download a file", but the kind of thing you'd actually ask for: aggregate a trending org's models, inspect a repo's files and their sizes, upload a folder with include/exclude rules, delete files, copy files across repos, open a PR that adds a license, create a repo with a branch and a tag, sync and prune a bucket, build a collection. Each task goes to a fresh coding agent with exactly one way to talk to the Hub:

the hf

CLI, or curl / the Python SDK: nohf

CLI at all, so the agent falls back tocurl

against the REST API or thehuggingface_hub

Python library.

We run the hf

CLI in two configurations, with and without its skill (a generated command reference we come back to in its own section). But the headline comparison below is simply ** hf CLI vs curl / the SDK**; the skill's incremental effect is small enough that we break it out on its own rather than crowd it into the main results.

The config is deliberately clean: a fresh instance per run, no custom MCP servers, no CLAUDE.md

or AGENTS.md

, nothing in context to nudge behavior. The task and the tool go into a single prompt, and the agent finishes with a TASK_COMPLETE

or TASK_FAILED

marker, but we don't trust that marker (an agent will report success on work that never landed), so we grade every run independently by re-querying the live Hub: did the branch really get created, is the file actually gone, does the bucket exist? Each task/tool combination is run 10 times, since coding agents are non-deterministic, about 520 runs per agent (18 tasks × 3 tools × 10 reps, minus a cap on one billable Jobs task) and ~1,000 graded runs in total. We ran the whole thing twice, on the two most popular coding agents (Claude Code with Sonnet 4.6 and OpenAI Codex with GPT-5.5).

The results

The two charts below unpack the table above. First, task success on Sonnet, the agent where curl and the SDK struggle most:

Without the CLI, curl and the SDK trail by ten points, because on Sonnet they simply can't finish parts of the job (the writes, mostly), while the hf

CLI clears them.

The second image shows token impact on GPT-5.5, broken down per task. Each bar is the curl/SDK tokens divided by the CLI's on the same task, so 2.4×

means the non-hf version burned 2.4 times as many tokens to do the same thing:

On a one-shot read (count dataset rows, batch metadata) curl and the SDK are fine, and sometimes lighter. But as tasks get more complex and involve several dependent steps, the agent has to hand-roll the entire chain of REST calls (or dig through the SDK) and the cost blows up: 2.4× to 6× the CLI's on creating a repo with a branch and tag, deleting files, copying across repos, or syncing a bucket. The hf

CLI lets the agent express the task as a few higher-level commands, rather than crafting a complex workflow.

Key findings

The For the same task, at equal-or-better success, curl and the SDK burnhf

CLI is far leaner than curl or the SDK.roughly 1.3× to 1.8× the tokens. On easy reads they're fine, but on real multi-step work they pay** 2× to 6×**: the CLI composes a chain of REST calls into a few high-level commands, while curl or the SDK re-derives the chain by hand every run.On a stronger model, curl and the SDK work but stay wasteful. On Sonnet they can't finish parts of the job (the writes, mostly); on GPT-5.5 they mostly succeed, hand-rolling the REST calls (or using the SDK) correctly, but still pay well over the CLI's token bill.

The hf-cli skill #

hf

ships a skill: a compact reference of the whole command surface that an agent loads as context. It's auto-generated from the live hf

command tree, one line per command (its signature, a one-line description, and the flags that matter), grouped by resource, with a short glossary of common options. It deliberately skips the self-explanatory flags so it stays terse and light on context, and it's regenerated every release. Run hf skills preview

to print it, or install it with:

hf skills add
hf skills add --claude

What does it buy you? Mostly, the agent stops guessing. The clearest single view is how many commands each run takes, with the skill and without:

On both agents that's about ten commands per task down to about seven, roughly 30% fewer tool calls. That's because the agent isn't probing --help

to find the right command and argument. The skill won't cut your token bill, because it prepends a fixed slice of info to the context, so tokens remain about the same or slightly tick up for the same task. The Skill won't make the CLI more reliable either, but it will help the agent spend time running your task rather than finding out how the tool works. This could be particularly helpful when using hf

with local models.

We ran each task in a fresh session, so the skill pays its context cost on every task. In a real multi-task session that cost amortizes (the agent learns the command surface once), so the token picture likely improves there; we didn't measure that case.

Try it yourself #

We benchmarked all this because we think it matters. Agents are becoming real users of the Hub: they train models, build and clean datasets, and ship demos as Spaces, almost always on behalf of a person. A Hub that works well for agents is also a Hub that works better for the people using them. The better an agent's tools are, the more it can do for you.

If your agent interacts with the Hugging Face Hub, we recommend giving it the hf

CLI:

curl -LsSf https://hf.co/cli/install.sh | bash

powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"

Then hand it the skill, so it knows the whole command surface from the first turn:

hf skills add            # Codex, Cursor, OpenCode, Pi and other agents that load skills from .agents/skills
hf skills add --claude   # the above + Claude Code

Then point your agent at the Hub and let it work. Make sure you're logged in (hf auth login

), then hand it a prompt like:

Use `hf` to list my Hugging Face Hub models, datasets, and Spaces.
Take a look at how I am currently using the Hub and suggest a few ways you could help me.

It'll work out the commands on its own and come back with something useful.

The full command reference lives in the hf CLI guide.

Register an agent harness #

Building an agent harness? Get it registered! That's how hf

learns to detect it, and how the Hub attributes its traffic to your harness. You simply need to open a small PR adding an entry to agent-harnesses.ts. Read the

Register your agent harnessguide for more details.

source & further reading

huggingface.co — original article High School Sophomore Seeking arXiv Endorser for Vision Transformer MoE Paper (cs.LG / cs.CV) Reliability check on my own dataset's annotation layer: five machine raters, one definition, answers from 0 to 78 Same model, up to 4.66x different price — full Inference Providers pricing matrix

Designing the hf CLI as an agent-optimized way to work with the Hub

AI agent traffic on the Hub #

Built for humans and agents #

One command, multiple renderings

Next-command hints

Non-blocking and safe to retry

Discoverable, predictable commands

Benchmarking the hf CLI for Coding Agents #

The setup

The results

Key findings

The hf-cli skill #

Try it yourself #

Register an agent harness #

Run your AI side-project on zahid.host