cd /news/artificial-intelligence/ai-api-price-war-deepseek-v4-pro-cut… · home topics artificial-intelligence article
[ARTICLE · art-36023] src=dev.to ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

AI API Price War: DeepSeek V4-Pro Cuts 75% & Gemini 3.5 Flash Lands

On May 31, 2026, DeepSeek made its V4-Pro API pricing permanent at a 75% discount, with output costs of $0.87 per million tokens—10x cheaper than GPT-4o and 5x cheaper than Claude Haiku 4.5. Meanwhile, Google launched Gemini 3.5 Flash at I/O 2026, offering multimodal support and a 1M context window at $9.00 per million output tokens, still 10x more expensive than DeepSeek for text-only tasks. The AI API price war is intensifying due to inference optimization, market competition, and developer demand for lower costs.

read3 min views1 publishedJun 22, 2026

May 31, 2026 is shaping up to be a landmark day in the AI API market. Two developments are converging:

The message is clear: the AI API price war is no longer simmering — it's boiling over.

Back on May 22, DeepSeek dropped a bombshell: V4-Pro API pricing would permanently lock in at roughly one-quarter of its original price. The 75% discount that was supposed to expire on May 31? It's now the permanent rate.

Here's what the new pricing looks like:

Model Input (per 1M tokens) Output (per 1M tokens) Context Window
DeepSeek V4-Pro
$0.435 $0.87 128K
DeepSeek V3
$0.14 $0.28 64K
Gemini 3.5 Flash
$1.50 $9.00 1M
Claude Haiku 4.5
$1.00 $5.00 200K
GPT-4o
$2.50 $10.00 128K

Pricing accurate as of May 2026. Sources: official API docs and third-party aggregators.

DeepSeek's V4-Pro output price of $0.87/M tokens is 10x cheaper than GPT-4o and 5x cheaper than Claude Haiku 4.5. For developers building AI agents, chatbots, or automated workflows that generate thousands of tokens per request, the savings compound fast.

This isn't just another "we're reducing prices" announcement. Three things make DeepSeek's move different:

Not to be outdone, Google used I/O 2026 to unveil Gemini 3.5 Flash, and the numbers are impressive:

Google is positioning Flash as the high-volume workhorse: fast enough for real-time applications, cheap enough to run at scale, and multimodal (text, vision, video, audio all supported natively).

The trade-off? At $9.00/M output, it's still 10x more expensive than DeepSeek V4-Pro for pure text workloads. If your app doesn't need multimodal capabilities, the cost difference is hard to ignore.

This isn't random. Three structural forces are driving prices down across the board:

Techniques like speculative decoding, quantization, and kernel fusion are squeezing more tokens per GPU-second. DeepSeek's own V4-Pro architecture is reportedly several times more inference-efficient than V3.

The market has gone from "OpenAI and everyone else" to a legitimate free-for-all:

HN threads, Reddit discussions, and Twitter debates show that API pricing is a top-3 concern for AI builders. Providers who ignore pricing lose developer mindshare fast.

Here's the practical takeaway for anyone building AI-powered applications:

If you're cost-sensitive (most of us are):

Start with DeepSeek V4-Pro. At $0.87/M output tokens, you can serve thousands of users before API costs become a concern. The OpenAI-compatible API means you can swap providers with minimal code changes.

If you need multimodal (vision, audio, video):

Gemini 3.5 Flash is the obvious choice — native multimodal support with a 1M context window at competitive pricing. No other model in this price range handles images and video natively.

If you're in a regulated industry (GDPR, HIPAA):

Consider Claude via AWS Bedrock or Azure's managed offerings. The compliance overhead is worth the premium.

The hybrid approach (recommended):

Use DeepSeek V4-Pro as your default, with fallback to Gemini Flash for multimodal tasks. This gives you the best of both worlds: cheap text, powerful vision — and no single-provider lock-in.

import openai

def route_request(prompt: str, needs_vision: bool = False):
    if needs_vision:
        client = openai.OpenAI(
            base_url="https://generativelanguage.googleapis.com/v1beta",
            api_key="YOUR_GEMINI_KEY"
        )
        model = "gemini-3.5-flash"
    else:
        client = openai.OpenAI(
            base_url="https://api.deepseek.com/v1",
            api_key="YOUR_DEEPSEEK_KEY"
        )
        model = "deepseek-v4-pro"

    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

Here's the uncomfortable truth behind all these price cuts: cheap API access doesn't matter if you can't get access at all.

DeepSeek's official API still requires a Chinese phone number for registration. Google's API is geo-restricted in several regions. And most international developers can't pay with regional payment methods.

That's exactly the problem AiCredits was built to solve.

We provide OpenAI-compatible access to DeepSeek V4-Pro with:

Need stable DeepSeek API access?Try[AiCredits]— OpenAI-compatible, no Chinese phone number, PayPal accepted. Plans start at $3 for 5M tokens.

Originally published on AiCredits Blog.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @deepseek 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ai-api-price-war-dee…] indexed:0 read:3min 2026-06-22 ·