The Chinese AI lab is introducing surge pricing for its flagship models, a first-of-its-kind move that could reshape how developers budget for API costs
DeepSeek’s V4 series, expected to fully launch in mid-July, will implement a tiered pricing structure that doubles token rates during designated peak periods.
How the pricing works #
The peak windows are set from 9:00 to 12:00 and 14:00 to 18:00 Beijing Time. During those hours, both input and output token costs will double across the two V4 models: deepseek-v4-pro and deepseek-v4-flash.
V4-Flash currently sits at roughly $0.14 per million input tokens (on cache misses) and $0.28 per million output tokens. For V4-Pro, peak output pricing lands around 12 yuan per million tokens, approximately $1.76. V4-Flash peak output comes in at about 4 yuan per million tokens, or roughly $0.59.
DeepSeek says users will receive 24-hour advance notification before any price changes take effect.
What V4 actually brings to the table #
The V4 series first appeared as a preview on April 24, 2026. Both models use a Mixture of Experts (MoE) architecture. V4-Pro packs 1.6 trillion total parameters. V4-Flash comes in at 284 billion total parameters. Both support context windows of 1 million tokens and were trained on a dataset exceeding 32 trillion tokens. Both ship under an MIT license.
Why surge pricing matters for the AI industry #
Beijing Time peak hours correspond to late evening and overnight in the US (roughly 9 PM to 6 AM Eastern), which means American developers running workloads during their own business day would actually hit DeepSeek’s off-peak rates.
Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our