{"slug": "designing-the-hf-cli-as-an-agent-optimized-way-to-work-with-the-hub", "title": "Designing the hf CLI as an agent-optimized way to work with the Hub", "summary": "Hugging Face redesigned its `hf` command-line interface to optimize it for AI coding agents, which now account for significant traffic on the Hub. The new CLI auto-detects when an agent is driving it and adjusts its output format accordingly, stripping ANSI colors and truncation to provide dense, structured data. Benchmarks show the agent-optimized CLI uses up to six times fewer tokens than a baseline approach using `curl` or the Python SDK, making it more efficient for complex, multi-step tasks.", "body_md": "# Designing the hf CLI as an agent-optimized way to work with the Hub\n\n[Update on GitHub](https://github.com/huggingface/blog/blob/main/hf-cli-for-agents.md)\n\n`hf`\n\nis the official command-line entrypoint to the Hugging Face Hub. Anything you can do on the Hub from the Python SDK, you can do from your terminal: download and upload models, datasets and Spaces; create and manage repos, branches, tags and pull requests; run Jobs on HF infrastructure; manage Buckets, Collections, webhooks and Inference Endpoints.\nThe `hf`\n\nCLI has been primarily built for our users over the years. But it's now increasingly used by **coding agents**: Claude Code, Codex, Cursor and more. So we rebuilt it to make it work for both audiences at once. This blog post summarizes what we did, and how we benchmarked it. We found that on complex, multi-step tasks the no-CLI baseline (an agent hand-rolling `curl`\n\nor the Python SDK) uses up to **6× as many tokens** as the `hf`\n\nCLI.\n\n## AI agent traffic on the Hub\n\nWe started tracking agent usage of the Hub in April 2026. The `hf`\n\nCLI (and the `huggingface_hub`\n\nPython SDK it's built on) detects when a coding agent is driving it by reading the environment variables agents set: `CLAUDECODE`\n\n/`CLAUDE_CODE`\n\nfor Claude Code, `CODEX_SANDBOX`\n\nfor Codex, plus Cursor, Gemini, Pi, and the universal `AI_AGENT`\n\n. That single signal does two jobs: it shapes the CLI's output (more on that below) and it tags each Hub request with an `agent/<name>`\n\nuser-agent, so we can attribute traffic to the agent driving it. The two largest by distinct users are **Claude Code and Codex**, well ahead of everything else, and they're the two agents we benchmark later in this article.\n\nThe bars count distinct users per agent; request volume is the sub-label. Claude Code alone is ~40k users and nearly 49M requests, with Codex close behind. These are early numbers (we only began attributing agent traffic in April 2026), but the scale is already significant, and we expect it to keep growing as coding agents become a standard way to work with the Hub.\n\n## Built for humans and agents\n\nHumans and coding agents expect different outputs for the same `hf`\n\ncommands. A human wants rich terminal output: ANSI color, padded tables truncated to fit the screen, a green ✅ on success, `✔`\n\nfor booleans, progress bars, prose hints. An agent wants\nthe inverse: no ANSI, nothing truncated, every value in full since an agent can handle far denser output than a human, kept compact and structured to stay light on tokens. It also can't answer a CLI prompt and will happily re-run a command after a timeout. The rest of this section is how `hf`\n\ngives each side what it needs. We introduced agent-mode output in `hf`\n\nv1.9.0 and have been migrating the rest of the CLI to it gradually in the following releases.\n\n### One command, multiple renderings\n\nWhen `hf`\n\nauto-detects agent use (via the environment variables mentioned above), it renders the **same command** differently. It optimizes output format for humans or agents without passing a flag:\n\n```\n# human (default in a terminal): aligned table, truncated to fit, with a hint\n> hf models ls --author Qwen --sort downloads --limit 3\nID                       CREATED_AT DOWNLOADS LIBRARY_NAME LIKES PIPELINE_TAG    PRIVATE TAGS\n------------------------ ---------- --------- ------------ ----- --------------- ------- -------------------------\nQwen/Qwen3-0.6B          2025-04-27  21156913 transformers  1285 text-generation         transformers, safetens...\nQwen/Qwen2.5-1.5B-Ins... 2024-09-17  15143953 transformers   725 text-generation         transformers, safetens...\nQwen/Qwen3-4B            2025-04-27  14808352 transformers   625 text-generation         transformers, safetens...\nHint: Use `--no-truncate` or `--format json` to display full values.\n\n# agent (auto-detected): TSV, full ids + ISO timestamps + every tag, nothing truncated\n$ hf models ls --author Qwen --sort downloads --limit 3\nid      created_at      downloads       library_name    likes   pipeline_tag    private tags\nQwen/Qwen3-0.6B 2025-04-27T03:40:08+00:00       21156913        transformers    1285    text-generation False   ['transformers', 'safetensors', 'qwen3', 'text-generation', 'conversational', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-0.6B-Base', 'base_model:finetune:Qwen/Qwen3-0.6B-Base', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us']\nQwen/Qwen2.5-1.5B-Instruct      2024-09-17T14:10:29+00:00       15143953        transformers    725     text-generation False['transformers', 'safetensors', 'qwen2', 'text-generation', 'chat', 'conversational', 'en', 'arxiv:2407.10671', 'base_model:Qwen/Qwen2.5-1.5B', 'base_model:finetune:Qwen/Qwen2.5-1.5B', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us']\nQwen/Qwen3-4B   2025-04-27T03:41:29+00:00       14808352        transformers    625     text-generation False   ['transformers', 'safetensors', 'text-generation', 'arxiv:2309.00071', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-4B-Base', 'base_model:finetune:Qwen/Qwen3-4B-Base', 'license:apache-2.0', 'endpoints_compatible', 'deploy:azure', 'region:us']\n```\n\nA **human** gets an aligned table, truncated to fit the terminal, plus a hint on how to see more, with color cues for status (a green `✓`\n\non success, red on error). An **agent** gets the complete record as TSV: full repo ids, full ISO timestamps, every tag, no ANSI codes, nothing truncated, clean to parse and light on tokens.\n\nIn practice, we've implemented logging methods like `.table(...)`\n\n, `.result(...)`\n\n, `.json()`\n\n, etc., which take raw data as input and handle the formatting. In addition to human and agent modes, we've introduced `--json`\n\nand `--quiet`\n\noptions to make it easier to pipe commands together. The default mode is automatically chosen based on context, but users can always force the format of their choice with `--format human | agent | json | quiet`\n\n.\n\n### Next-command hints\n\nCLI commands rarely run in isolation: one step usually implies the next (`git add`\n\n, then `git commit`\n\n). Many `hf`\n\ncommands now end with a **hint**: the exact next command to run, pre-filled with the IDs you just used, so a user or agent can chain straight to the next step instead of working it out from scratch. Start a Job in the background and it points you to its logs; create a Space and it points you to its boot status:\n\n``` bash\n$ hf jobs run --detach python:3.12 python train.py\n✓ Job started\n  id: 6f3a1c2e9b\n  url: https://huggingface.co/jobs/celinah/6f3a1c2e9b\nHint: Use `hf jobs logs 6f3a1c2e9b` to fetch the logs.\n```\n\nFor a human that's a convenience. For an agent it's a rail: the next action is named, parameterized with the right ids, and ready to run, so it takes fewer steps working out what to do. Errors behave the same way, naming the fix instead of just failing:\n\n```\nError: Not logged in. Run `hf auth login` first.\n```\n\nHints, warnings and errors all go to stderr while data goes to stdout, so none of this guidance pollutes the output the agent is parsing.\n\n### Non-blocking and safe to retry\n\n`hf`\n\nnever sits on an interactive prompt waiting for a key an agent can't press. A destructive command still asks a human to confirm, but in agent mode it *fails fast* with the fix in the message (`Use --yes to skip confirmation.`\n\n), and `-y`\n\n/`--yes`\n\nskips it. And because agents retry on timeouts and lost context, operations are built to be safe to repeat: `hf repos create --exist-ok`\n\nis a no-op if the repo already exists, and re-running an upload re-commits cleanly. Separately, the commands that move real data take a `--dry-run`\n\nthat shows exactly what they'll transfer before they run, which proves handy for humans and agents alike, since neither has to commit to a long download or blind sync:\n\n```\n# agent mode: a destructive command without --yes refuses, with the fix in the message\n$ hf repos delete my-org/old-model\nError: You are about to permanently delete model 'my-org/old-model'. Proceed? Use --yes to skip confirmation.\n\n# commands that move data take --dry-run to preview the transfer first\n$ hf download deepseek-ai/DeepSeek-V4-Pro config.json --dry-run\n[dry-run] Will download 1 files (out of 1) totalling 1.8K.\nfile         size\nconfig.json  1.8K\n```\n\n### Discoverable, predictable commands\n\n`hf`\n\nis built to be probed: run `hf`\n\nto see the resource groups, run `--help`\n\non the one you need, and every `--help`\n\nends with real, copy-pasteable examples (which an agent matches against far faster than it parses a description):\n\n``` bash\n$ hf models ls --help\n...\nExamples\n  $ hf models ls --sort downloads --limit 10\n  $ hf models ls --search \"qwen\" --author Qwen\n  $ hf models ls Qwen/Qwen3-4B --tree\n```\n\nThe command tree is consistent, **resource + verb** with the obvious aliases (`hf models ls`\n\n, `hf repos create`\n\n, `hf jobs ps`\n\n, `hf collections delete`\n\n; `list`\n\n/`ls`\n\n, `remove`\n\n/`rm`\n\n), so once an agent learns one command it can guess the rest. And the output composes: `-q`\n\nprints one id per line to pipe into the next command, `--json`\n\ngives you something to hand to [ jq](https://jqlang.org/).\n\n``` bash\n$ hf models ls --author Qwen -q | head -3\nQwen/Qwen3-0.6B\nQwen/Qwen2.5-1.5B-Instruct\nQwen/Qwen3-4B\n```\n\n## Benchmarking the hf CLI for Coding Agents\n\nTo find out whether the `hf`\n\nCLI is really more efficient for agents, we measured it. We built a small evaluation harness and ran the same set of Hub tasks through each way of driving the Hub, many times over, grading every run against the live Hub. Here's the headline before the methodology: across both agents the `hf`\n\nCLI comes out ahead, most clearly on complex, multi-step tasks where it uses far fewer tokens.\n\n| agent | tool | success score | token usage | self-report error |\n|---|---|---|---|---|\nClaude Code (Sonnet 4.6) |\n`hf` CLI |\n0.94 |\nbaseline | 2 / 163 |\n| curl / Python SDK | 0.84 | 1.3-1.6× tokens |\n11 / 163 | |\nCodex (GPT-5.5) |\n`hf` CLI |\n0.93 |\nbaseline | 3 / 163 |\n| curl / Python SDK | 0.92 | 1.6-1.8× tokens |\n10 / 163 |\n\n*(self-report error = the agent reported success on the 17 solvable tasks but the Hub said otherwise. The hf CLI rows are the CLI with its skill installed; what the skill adds on top of the bare CLI (chiefly fewer tool calls) is broken out in the skill section below. Representative transcripts are published in this bucket.)*\n\n### The setup\n\nWe defined **18 non-trivial Hub tasks**. Not \"download a file\", but the kind of thing you'd actually ask for: aggregate a trending org's models, inspect a repo's files and their sizes, upload a folder with include/exclude rules, delete files, copy files across repos, open a PR that adds a license, create a repo with a branch and a tag, sync and prune a bucket, build a collection. Each task goes to a fresh coding agent with exactly **one** way to talk to the Hub:\n\n- the\n`hf`\n\nCLI, or **curl / the Python SDK**: no`hf`\n\nCLI at all, so the agent falls back to`curl`\n\nagainst the REST API or the`huggingface_hub`\n\nPython library.\n\nWe run the `hf`\n\nCLI in two configurations, with and without its skill (a generated command reference we come back to in [its own section](#the-hf-cli-skill)). But the headline comparison below is simply ** hf CLI vs curl / the SDK**; the skill's incremental effect is small enough that we break it out on its own rather than crowd it into the main results.\n\nThe config is deliberately clean: a fresh instance per run, no custom MCP servers, no `CLAUDE.md`\n\nor `AGENTS.md`\n\n, nothing in context to nudge behavior. The task and the tool go into a single prompt, and the agent finishes with a `TASK_COMPLETE`\n\nor `TASK_FAILED`\n\nmarker, but we don't trust that marker (an agent will report success on work that never landed), so we grade every run independently by **re-querying the live Hub**: did the branch really get created, is the file actually gone, does the bucket exist? Each task/tool combination is run **10 times**, since coding agents are non-deterministic, about **520 runs per agent** (18 tasks × 3 tools × 10 reps, minus a cap on one billable Jobs task) and ~1,000 graded runs in total. We ran the whole thing twice, on the two most popular coding agents (**Claude Code** with Sonnet 4.6 and **OpenAI Codex** with GPT-5.5).\n\n### The results\n\nThe two charts below unpack the table above. First, **task success on Sonnet**, the agent where curl and the SDK struggle most:\n\nWithout the CLI, curl and the SDK trail by ten points, because on Sonnet they simply can't finish parts of the job (the writes, mostly), while the `hf`\n\nCLI clears them.\n\nThe second image shows **token impact on GPT-5.5**, broken down per task. Each bar is the curl/SDK tokens divided by the CLI's on the same task, so `2.4×`\n\nmeans the non-hf version burned 2.4 times as many tokens to do the same thing:\n\nOn a one-shot read (count dataset rows, batch metadata) curl and the SDK are fine, and sometimes lighter. But as tasks get more complex and involve several dependent steps, the agent has to hand-roll the entire chain of REST calls (or dig through the SDK) and the cost blows up: **2.4× to 6× the CLI's** on creating a repo with a branch and tag, deleting files, copying across repos, or syncing a bucket. The `hf`\n\nCLI lets the agent express the task as a few higher-level commands, rather than crafting a complex workflow.\n\n### Key findings\n\n**The** For the same task, at equal-or-better success, curl and the SDK burn`hf`\n\nCLI is far leaner than curl or the SDK.**roughly 1.3× to 1.8× the tokens**. On easy reads they're fine, but on real multi-step work they pay** 2× to 6×**: the CLI composes a chain of REST calls into a few high-level commands, while curl or the SDK re-derives the chain by hand every run.**On a stronger model, curl and the SDK work but stay wasteful.** On Sonnet they can't finish parts of the job (the writes, mostly); on GPT-5.5 they mostly succeed, hand-rolling the REST calls (or using the SDK) correctly, but still pay well over the CLI's token bill.\n\n## The hf-cli skill\n\n`hf`\n\nships a **skill**: a compact reference of the whole command surface that an agent loads as context. It's **auto-generated** from the live `hf`\n\ncommand tree, one line per command (its signature, a one-line description, and the flags that matter), grouped by resource, with a short glossary of common options. It deliberately skips the self-explanatory flags so it stays terse and light on context, and it's regenerated every release. Run `hf skills preview`\n\nto print it, or install it with:\n\n```\n# for Codex, Cursor, OpenCode, Pi and other agents that load skills from `.agents/skills`\nhf skills add\n# includes the above + Claude Code\nhf skills add --claude\n```\n\nWhat does it buy you? Mostly, the agent stops guessing. The clearest single view is how many commands each run takes, with the skill and without:\n\nOn both agents that's about ten commands per task down to about seven, roughly 30% fewer tool calls. That's because the agent isn't probing `--help`\n\nto find the right command and argument. The skill won't cut your token bill, because it prepends a fixed slice of info to the context, so tokens remain about the same or slightly tick up for the same task. The Skill won't make the CLI more reliable either, but it will help the agent spend time running your task rather than finding out how the tool works. This could be particularly helpful when using `hf`\n\nwith local models.\n\nWe ran each task in a fresh session, so the skill pays its context cost on every task. In a real multi-task session that cost amortizes (the agent learns the command surface once), so the token picture likely improves there; we didn't measure that case.\n\n## Try it yourself\n\nWe benchmarked all this because we think it matters. Agents are becoming real users of the Hub: they train models, build and clean datasets, and ship demos as Spaces, almost always on behalf of a person. A Hub that works well for agents is also a Hub that works better for the people using them. The better an agent's tools are, the more it can do for you.\n\nIf your agent interacts with the Hugging Face Hub, we recommend giving it the `hf`\n\nCLI:\n\n```\n# macOS / Linux\ncurl -LsSf https://hf.co/cli/install.sh | bash\n\n# Windows (PowerShell)\npowershell -ExecutionPolicy ByPass -c \"irm https://hf.co/cli/install.ps1 | iex\"\n```\n\nThen hand it the skill, so it knows the whole command surface from the first turn:\n\n```\nhf skills add            # Codex, Cursor, OpenCode, Pi and other agents that load skills from .agents/skills\nhf skills add --claude   # the above + Claude Code\n```\n\nThen point your agent at the Hub and let it work. Make sure you're logged in (`hf auth login`\n\n), then hand it a prompt like:\n\n```\nUse `hf` to list my Hugging Face Hub models, datasets, and Spaces.\nTake a look at how I am currently using the Hub and suggest a few ways you could help me.\n```\n\nIt'll work out the commands on its own and come back with something useful.\n\nThe full command reference lives in the [ hf CLI guide](https://huggingface.co/docs/huggingface_hub/guides/cli).\n\n## Register an agent harness\n\nBuilding an agent harness? **Get it registered!** That's how `hf`\n\nlearns to detect it, and how the Hub attributes its traffic to your harness. You simply need to open a small PR adding an entry to [ agent-harnesses.ts](https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/agent-harnesses.ts). Read the\n\n[Register your agent harness](https://huggingface.co/docs/hub/agents-overview#register-your-agent-harness)guide for more details.", "url": "https://wpnews.pro/news/designing-the-hf-cli-as-an-agent-optimized-way-to-work-with-the-hub", "canonical_source": "https://huggingface.co/blog/hf-cli-for-agents", "published_at": "2026-06-04 00:00:00+00:00", "updated_at": "2026-06-04 15:41:51.222055+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "mlops", "large-language-models"], "entities": ["Hugging Face", "Claude Code", "Codex", "Cursor", "Gemini", "Pi", "hf CLI", "huggingface_hub"], "alternates": {"html": "https://wpnews.pro/news/designing-the-hf-cli-as-an-agent-optimized-way-to-work-with-the-hub", "markdown": "https://wpnews.pro/news/designing-the-hf-cli-as-an-agent-optimized-way-to-work-with-the-hub.md", "text": "https://wpnews.pro/news/designing-the-hf-cli-as-an-agent-optimized-way-to-work-with-the-hub.txt", "jsonld": "https://wpnews.pro/news/designing-the-hf-cli-as-an-agent-optimized-way-to-work-with-the-hub.jsonld"}}