cd /news/ai-infrastructure/inference-providers-3-cents-per-requ… · home topics ai-infrastructure article
[ARTICLE · art-38122] src=discuss.huggingface.co ↗ pub= topic=ai-infrastructure verified=true sentiment=· neutral

Inference Providers: 3 cents per request?

Hugging Face's Inference Providers feature is currently using imperfect billing heuristics, charging a flat $0.03 per request regardless of token count, which does not reflect actual provider pricing. The company plans to ship accurate pricing within the next week.

read1 min views1 publishedJun 24, 2026
Inference Providers: 3 cents per request?
Image: Discuss (auto-discovered)

Wexly 1 Basically this is the question: the posts about using the Inference Providers seem to indicate that the price is the same as using the providers directly. But when I run a small sample with one word “Hello” it seems that $0.03 is deducted from my account. The provider’s pricing for this model says “$0.10 / M tokens” which is a far cry from $0.03 per a couple of tokens to send and receive “hello”.

Also the amount does not seem to depend on the actual amount of tokens: if I attach a 1Mb image I still get charged the same 3 cents.

So the question is if this is a bug in billing or if there is a charge of $0.03 per request? The 0.03/request seems to add up pretty fast…

This seems to be the case for all the providers I tried. Here’s my code in case I’m doing something wrong:

import os
from huggingface_hub import InferenceClient
import base64

model_name= "Qwen/Qwen2-VL-7B-Instruct"

client = InferenceClient(
    model=model_name,
    #provider="fireworks-ai",
    provider="hyperbolic",
    #provider="nebius",
    api_key=os.environ['HF_TOKEN']
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Hello"""
            }
        ]
    }
]

stream = client.chat.completions.create(
    #model=model_name,
    messages=messages, 
    max_tokens=500,
    #temperature=0.0,
    #stream=False
)

print([x.message.content for x in stream.choices])

@meganariley Question about pricing.

Hi @wexly,

we’re not billing inference providers usage yet (it’s only free included credits), so we are using imperfect approximation heuristics for some of the providers.

We will be shipping accurate pricing and it will go-live in the next week. I’ll post here when it’s live.

Wexly 4 I see, thank you very much, I was excited about this feature but got a bit scared when I saw the numbers … Sorry for a false alarm, looking forward to use this feature!

── more in #ai-infrastructure 4 stories · sorted by recency
── more on @hugging face 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/inference-providers-…] indexed:0 read:1min 2026-06-24 ·