Inference Providers: 3 cents per request?

wpnews.pro

cd /news/ai-infrastructure/inference-providers-3-cents-per-requ… · home › topics › ai-infrastructure › article

[ARTICLE · art-38122] src=discuss.huggingface.co ↗ pub=2026-06-24T16:47Z topic=ai-infrastructure verified=true sentiment=· neutral

Inference Providers: 3 cents per request?

Hugging Face's Inference Providers feature is currently using imperfect billing heuristics, charging a flat $0.03 per request regardless of token count, which does not reflect actual provider pricing. The company plans to ship accurate pricing within the next week.

read1 min views1 publishedJun 24, 2026

Inference Providers: 3 cents per request? — Image: Discuss (auto-discovered)

Wexly 1 Basically this is the question: the posts about using the Inference Providers seem to indicate that the price is the same as using the providers directly. But when I run a small sample with one word “Hello” it seems that $0.03 is deducted from my account. The provider’s pricing for this model says “$0.10 / M tokens” which is a far cry from $0.03 per a couple of tokens to send and receive “hello”.

Also the amount does not seem to depend on the actual amount of tokens: if I attach a 1Mb image I still get charged the same 3 cents.

So the question is if this is a bug in billing or if there is a charge of $0.03 per request? The 0.03/request seems to add up pretty fast…

This seems to be the case for all the providers I tried. Here’s my code in case I’m doing something wrong:

import os
from huggingface_hub import InferenceClient
import base64

model_name= "Qwen/Qwen2-VL-7B-Instruct"

client = InferenceClient(
    model=model_name,
    #provider="fireworks-ai",
    provider="hyperbolic",
    #provider="nebius",
    api_key=os.environ['HF_TOKEN']
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Hello"""
            }
        ]
    }
]

stream = client.chat.completions.create(
    #model=model_name,
    messages=messages, 
    max_tokens=500,
    #temperature=0.0,
    #stream=False
)

print([x.message.content for x in stream.choices])

@meganariley Question about pricing.

Hi @wexly,

we’re not billing inference providers usage yet (it’s only free included credits), so we are using imperfect approximation heuristics for some of the providers.

We will be shipping accurate pricing and it will go-live in the next week. I’ll post here when it’s live.

Wexly 4 I see, thank you very much, I was excited about this feature but got a bit scared when I saw the numbers … Sorry for a false alarm, looking forward to use this feature!

source & further reading

discuss.huggingface.co — original article Rakarrack-0.6.1 port making progress! ( AI assisted ) Cloud Storage Poll Welcome to Haiku basic(Haiku Docs, Haiku slide and Haiku sheets)

~/api · this article 200

$curl api.wpnews.pro/v1/news/inference-providers-3-ce…

Read original on discuss.huggingface.co → discuss.huggingface.co/t/inference-providers-3-c…

mentioned entities

Hugging Face

Inference Providers

Qwen/Qwen2-VL-7B-Instruct