{"slug": "inference-providers-3-cents-per-request", "title": "Inference Providers: 3 cents per request?", "summary": "Hugging Face's Inference Providers feature is currently using imperfect billing heuristics, charging a flat $0.03 per request regardless of token count, which does not reflect actual provider pricing. The company plans to ship accurate pricing within the next week.", "body_md": "[Wexly](https://discuss.huggingface.co/u/Wexly)\n1\nBasically this is the question: the posts about using the [Inference Providers](https://huggingface.co/blog/inference-providers) seem to indicate that the price is the same as using the providers directly. But when I run a small sample with one word “Hello” it seems that $0.03 is deducted from my account. The provider’s pricing for this model says “$0.10 / M tokens” which is a far cry from $0.03 per a couple of tokens to send and receive “hello”.\n\nAlso the amount does not seem to depend on the actual amount of tokens: if I attach a 1Mb image I still get charged the same 3 cents.\n\nSo the question is if this is a bug in billing or if there is a charge of $0.03 per request? The 0.03/request seems to add up pretty fast…\n\nThis seems to be the case for all the providers I tried. Here’s my code in case I’m doing something wrong:\n\n``` python\nimport os\nfrom huggingface_hub import InferenceClient\nimport base64\n\nmodel_name= \"Qwen/Qwen2-VL-7B-Instruct\"\n\nclient = InferenceClient(\n    model=model_name,\n    #provider=\"fireworks-ai\",\n    provider=\"hyperbolic\",\n    #provider=\"nebius\",\n    api_key=os.environ['HF_TOKEN']\n)\n\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"\"\"Hello\"\"\"\n            }\n        ]\n    }\n]\n\nstream = client.chat.completions.create(\n    #model=model_name,\n    messages=messages, \n    max_tokens=500,\n    #temperature=0.0,\n    #stream=False\n)\n\nprint([x.message.content for x in stream.choices])\n```\n\n[@meganariley](/u/meganariley) Question about pricing.\n\nHi [@wexly](/u/wexly),\n\nwe’re not billing inference providers usage yet (it’s only free included credits), so we are using imperfect approximation heuristics for some of the providers.\n\nWe will be shipping accurate pricing and it will go-live in the next week. I’ll post here when it’s live.\n\n[Wexly](https://discuss.huggingface.co/u/Wexly)\n4\nI see, thank you very much, I was excited about this feature but got a bit scared when I saw the numbers … Sorry for a false alarm, looking forward to use this feature!", "url": "https://wpnews.pro/news/inference-providers-3-cents-per-request", "canonical_source": "https://discuss.huggingface.co/t/inference-providers-3-cents-per-request/142396#post_6", "published_at": "2026-06-24 16:47:42+00:00", "updated_at": "2026-06-24 17:17:28.257391+00:00", "lang": "en", "topics": ["ai-infrastructure", "ai-tools"], "entities": ["Hugging Face", "Inference Providers", "Qwen/Qwen2-VL-7B-Instruct", "fireworks-ai", "hyperbolic", "nebius"], "alternates": {"html": "https://wpnews.pro/news/inference-providers-3-cents-per-request", "markdown": "https://wpnews.pro/news/inference-providers-3-cents-per-request.md", "text": "https://wpnews.pro/news/inference-providers-3-cents-per-request.txt", "jsonld": "https://wpnews.pro/news/inference-providers-3-cents-per-request.jsonld"}}