AI API Price War: DeepSeek V4-Pro Cuts 75% & Gemini 3.5 Flash Lands

wpnews.pro

cd /news/artificial-intelligence/ai-api-price-war-deepseek-v4-pro-cut… · home › topics › artificial-intelligence › article

[ARTICLE · art-36023] src=dev.to ↗ pub=2026-06-22T01:21Z topic=artificial-intelligence verified=true sentiment=· neutral

AI API Price War: DeepSeek V4-Pro Cuts 75% & Gemini 3.5 Flash Lands

On May 31, 2026, DeepSeek made its V4-Pro API pricing permanent at a 75% discount, with output costs of $0.87 per million tokens—10x cheaper than GPT-4o and 5x cheaper than Claude Haiku 4.5. Meanwhile, Google launched Gemini 3.5 Flash at I/O 2026, offering multimodal support and a 1M context window at $9.00 per million output tokens, still 10x more expensive than DeepSeek for text-only tasks. The AI API price war is intensifying due to inference optimization, market competition, and developer demand for lower costs.

read3 min views1 publishedJun 22, 2026

May 31, 2026 is shaping up to be a landmark day in the AI API market. Two developments are converging:

The message is clear: the AI API price war is no longer simmering — it's boiling over.

Back on May 22, DeepSeek dropped a bombshell: V4-Pro API pricing would permanently lock in at roughly one-quarter of its original price. The 75% discount that was supposed to expire on May 31? It's now the permanent rate.

Here's what the new pricing looks like:

Model	Input (per 1M tokens)	Output (per 1M tokens)
DeepSeek V4-Pro
$0.435	$0.87	128K
DeepSeek V3
$0.14	$0.28	64K
Gemini 3.5 Flash
$1.50	$9.00	1M
Claude Haiku 4.5
$1.00	$5.00	200K
GPT-4o
$2.50	$10.00	128K

Pricing accurate as of May 2026. Sources: official API docs and third-party aggregators.

DeepSeek's V4-Pro output price of $0.87/M tokens is 10x cheaper than GPT-4o and 5x cheaper than Claude Haiku 4.5. For developers building AI agents, chatbots, or automated workflows that generate thousands of tokens per request, the savings compound fast.

This isn't just another "we're reducing prices" announcement. Three things make DeepSeek's move different:

Not to be outdone, Google used I/O 2026 to unveil Gemini 3.5 Flash, and the numbers are impressive:

Google is positioning Flash as the high-volume workhorse: fast enough for real-time applications, cheap enough to run at scale, and multimodal (text, vision, video, audio all supported natively).

The trade-off? At $9.00/M output, it's still 10x more expensive than DeepSeek V4-Pro for pure text workloads. If your app doesn't need multimodal capabilities, the cost difference is hard to ignore.

This isn't random. Three structural forces are driving prices down across the board:

Techniques like speculative decoding, quantization, and kernel fusion are squeezing more tokens per GPU-second. DeepSeek's own V4-Pro architecture is reportedly several times more inference-efficient than V3.

The market has gone from "OpenAI and everyone else" to a legitimate free-for-all:

HN threads, Reddit discussions, and Twitter debates show that API pricing is a top-3 concern for AI builders. Providers who ignore pricing lose developer mindshare fast.

Here's the practical takeaway for anyone building AI-powered applications:

If you're cost-sensitive (most of us are):

Start with DeepSeek V4-Pro. At $0.87/M output tokens, you can serve thousands of users before API costs become a concern. The OpenAI-compatible API means you can swap providers with minimal code changes.

If you need multimodal (vision, audio, video):

Gemini 3.5 Flash is the obvious choice — native multimodal support with a 1M context window at competitive pricing. No other model in this price range handles images and video natively.

If you're in a regulated industry (GDPR, HIPAA):

Consider Claude via AWS Bedrock or Azure's managed offerings. The compliance overhead is worth the premium.

The hybrid approach (recommended):

Use DeepSeek V4-Pro as your default, with fallback to Gemini Flash for multimodal tasks. This gives you the best of both worlds: cheap text, powerful vision — and no single-provider lock-in.

import openai

def route_request(prompt: str, needs_vision: bool = False):
    if needs_vision:
        client = openai.OpenAI(
            base_url="https://generativelanguage.googleapis.com/v1beta",
            api_key="YOUR_GEMINI_KEY"
        )
        model = "gemini-3.5-flash"
    else:
        client = openai.OpenAI(
            base_url="https://api.deepseek.com/v1",
            api_key="YOUR_DEEPSEEK_KEY"
        )
        model = "deepseek-v4-pro"

    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

Here's the uncomfortable truth behind all these price cuts: cheap API access doesn't matter if you can't get access at all.

DeepSeek's official API still requires a Chinese phone number for registration. Google's API is geo-restricted in several regions. And most international developers can't pay with regional payment methods.

That's exactly the problem AiCredits was built to solve.

We provide OpenAI-compatible access to DeepSeek V4-Pro with:

Need stable DeepSeek API access?Try[AiCredits]— OpenAI-compatible, no Chinese phone number, PayPal accepted. Plans start at $3 for 5M tokens.

Originally published on AiCredits Blog.

source & further reading

dev.to — original article The Asymmetric Fallacy: Why the Claude Fable Ban Hurts Cloud Defenders graphlens: a polyglot code-analysis framework that turns your repo into a typed graph Static Website Comments Section

~/api · this article 200

$curl api.wpnews.pro/v1/news/ai-api-price-war-deepsee…

Read original on dev.to → dev.to/yanlong_wang/ai-api-price-war-deepseek-v4…

mentioned entities

DeepSeek

Google

Gemini 3.5 Flash

DeepSeek V4-Pro

OpenAI

GPT-4o

Claude Haiku 4.5

AWS Bedrock

metadata

slugai-api-price-war-deepseek-v4-pro-cuts-75-gemini-3-5-flash-lands

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevgraphlens: a polyglot code-analy…

next →Trump is losing his war on offsh…

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 22 Jun · #artificial-intelligence

GeekNews AI Weekly Deep Dive - 2026-06-22

dev.to · 22 Jun · #artificial-intelligence

Imagen 3 & 4 Shut Down June 24: Migrate to Gemini Image (2026)

cryptobriefing.com · 22 Jun · #artificial-intelligence

US economy accounts for over half of global credit impulse, driven by AI spending spree

dev.to · 22 Jun · #artificial-intelligence

Shopify's UCP makes your catalog agent-buyable. The rest of your storefront still isn't — here's where WebMCP fits.

── more on @deepseek 3 stories trending now

wpnews · 21 Jun · #large-language-models

Anthropic faces a class action lawsuit accusing it of selling Claude Max subscribers far less than advertised

wpnews · 21 Jun · #artificial-intelligence

Plotting AI model release cadence: two labs are accelerating, three aren't

wpnews · 21 Jun · #ai-safety

Author Argues for Slower AI Despite Cancer Benefits

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required