Gemini 3.5 Flash Is Now GA: Three API Traps to Know

wpnews.pro

Gemini 3.5 Flash is now generally available to all developers via the Gemini API and Google AI Studio. The headline writes itself: a Flash model that outperforms last year’s Pro on coding and agentic benchmarks. The numbers mostly hold up. What Google’s blog post buries, though, is that migrating from gemini-3-flash-preview

to gemini-3.5-flash

introduces three breaking changes that will silently degrade your outputs before you notice anything is wrong.

What Actually Changed: The Benchmark Story Is Lopsided on Purpose #

Gemini 3.5 Flash doesn’t beat 3.1 Pro across the board. It beats it specifically on the tasks that production AI agents actually run.

On agentic and coding benchmarks, the lead is real: Terminal-Bench 2.1 jumps from 70.3% to 76.2%, MCP Atlas from 78.2% to 83.6%, and Finance Agent v2 from 43.0% to 57.9%. It runs at 289 tokens per second — roughly four times faster than other frontier models. For anything involving tool calls, parallel subagents, or high-throughput document processing, Gemini 3.5 Flash is the clear choice in its price bracket.

Where it trails 3.1 Pro: Humanity’s Last Exam (40.2% vs 44.4%), ARC-AGI-2 (72.1% vs 77.1%), and long-context retrieval at 128k tokens (77.3% vs 84.9%). If your workload is primarily complex reasoning — hard math, multi-step logic chains, research synthesis — don’t migrate yet. The new model sacrifices some of that depth in exchange for speed. That’s a reasonable trade for most production agents, and a bad one for tasks that aren’t agents at all.

The Three API Traps #

These are not edge cases. All three will hit most codebases on migration.

Trap 1: The Thinking Default Changed Silently

In gemini-3-flash-preview

, the default thinking effort was equivalent to high

. In gemini-3.5-flash

, it defaults to medium

. The new parameter name is thinking_level

— a string enum that replaces the old thinking_budget

integer. If you copy-paste a migration without setting thinking_level

explicitly, you’ll get quieter, less capable outputs with no error message. The model just answers differently.

You cannot pass both thinking_level

and thinking_budget

in the same request — that returns a 400 error.

Trap 2: Sampling Parameters Are Gone

temperature

, top_p

, and top_k

are no longer accepted. Remove them from all requests entirely. Any code copied from pre-3.x Gemini examples — which covers most existing codebases — needs a cleanup pass before migration.

Trap 3: Function Responses Now Require id and name

Every FunctionResponse

must now include both the id

from the corresponding FunctionCall

and a name

that matches exactly. This one fails loudly — you’ll see errors immediately — but it’s easy to miss if you have multiple tool-calling paths in your codebase. Check the official migration docs for the full function calling spec.

Which thinking_level Should You Actually Use? #

The counter-intuitive answer from Google’s own documentation: for agentic coding workflows, use “low”

, not “medium”

.

Google specifically retuned the low

setting for code and tool-calling workloads. It’s faster, cheaper, and on coding benchmarks comparable to medium

. Reaching for “high”

as a default is expensive and adds latency without proportional gains for most agent tasks.

Level	Best for
`minimal`	High-volume classification, trivial chat queries
`low`	Agentic coding loops, tool-calling workflows (recommended)
`medium`	Complex coding tasks, general default
`high`	Hard math, complex reasoning, research synthesis

One thing worth watching on long-running sessions: internal reasoning tokens are preserved automatically across multi-turn conversations. That improves coherence but inflates costs 30–50% on extended agent loops. Monitor the ThoughtsTokenCount

metric; if it exceeds 40% of PromptTokenCount

on later turns, restarting the session is cheaper than continuing it.

The Pricing Reality #

Hacker News noticed the price increase immediately. At $1.50/$9.00 per million input/output tokens, Gemini 3.5 Flash costs three times more than Gemini 3 Flash Preview and six times more than Flash-Lite. That’s worth acknowledging.

The relevant comparison, though, is against what it’s replacing in practice. At $2.00/$12.00 per million tokens, Gemini 3.1 Pro costs about 25% more for inferior performance on coding tasks. For teams already on Pro, 3.5 Flash is a pay cut with better results.

The biggest lever is caching. At $0.15 per million cached input tokens — a 90% discount — agents with large, stable system prompts can recover most of the premium. Google demoed 93 parallel subagents completing 15,000+ requests in 12 hours for under $1,000. That math works with aggressive caching in place.

Migration Checklist #

Update model ID: gemini-3-flash-preview

→gemini-3.5-flash

Replace thinking_budget

withthinking_level

(string:“minimal”

,“low”

,“medium”

,“high”

) - Set thinking_level

explicitly — don’t rely on the medium default - Remove temperature

,top_p

,top_k

from all requests - Add id

andname

to everyFunctionResponse

Keep gemini-3-flash-preview

running for: Computer Use, image generation, audio generation, and Live API - Watch ThoughtsTokenCount

growth across multi-turn sessions

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=prompt,
    config={
        "thinking_config": {
            "thinking_level": "low"  # optimal for agentic coding
        }
    },
)

The Verdict #

Migrate for agentic and coding workloads. The performance gains are real, the speed advantage is significant, and the pricing is defensible if you’re already on Pro. Hold on gemini-3-flash-preview

for Computer Use and Live API workloads — those aren’t supported in 3.5 yet.

The three breaking changes are concrete and fixable in an afternoon. The thinking_level

default change is the one most likely to ship silently to production. Set it explicitly on every request, run your evals before flipping the switch, and review Appwrite’s independent benchmark analysis if you need a second opinion beyond Google’s own numbers. The full technical spec is available on the Google DeepMind model card.

source & further reading

byteiota.com — original article Microsoft’s AI Coding Agent Study: 24% More PRs Linux Desktop Hits 10% in North America: Real or Artifact? Diátaxis: The Documentation Framework AI Agents Need