Gemini 3.5 Flash is now generally available to all developers via the Gemini API and Google AI Studio. The headline writes itself: a Flash model that outperforms last year’s Pro on coding and agentic benchmarks. The numbers mostly hold up. What Google’s blog post buries, though, is that migrating from gemini-3-flash-preview
to gemini-3.5-flash
introduces three breaking changes that will silently degrade your outputs before you notice anything is wrong.
What Actually Changed: The Benchmark Story Is Lopsided on Purpose #
Gemini 3.5 Flash doesn’t beat 3.1 Pro across the board. It beats it specifically on the tasks that production AI agents actually run.
On agentic and coding benchmarks, the lead is real: Terminal-Bench 2.1 jumps from 70.3% to 76.2%, MCP Atlas from 78.2% to 83.6%, and Finance Agent v2 from 43.0% to 57.9%. It runs at 289 tokens per second — roughly four times faster than other frontier models. For anything involving tool calls, parallel subagents, or high-throughput document processing, Gemini 3.5 Flash is the clear choice in its price bracket.
Where it trails 3.1 Pro: Humanity’s Last Exam (40.2% vs 44.4%), ARC-AGI-2 (72.1% vs 77.1%), and long-context retrieval at 128k tokens (77.3% vs 84.9%). If your workload is primarily complex reasoning — hard math, multi-step logic chains, research synthesis — don’t migrate yet. The new model sacrifices some of that depth in exchange for speed. That’s a reasonable trade for most production agents, and a bad one for tasks that aren’t agents at all.
The Three API Traps #
These are not edge cases. All three will hit most codebases on migration.
Trap 1: The Thinking Default Changed Silently
In gemini-3-flash-preview
, the default thinking effort was equivalent to high
. In gemini-3.5-flash
, it defaults to medium
. The new parameter name is thinking_level
— a string enum that replaces the old thinking_budget
integer. If you copy-paste a migration without setting thinking_level
explicitly, you’ll get quieter, less capable outputs with no error message. The model just answers differently.
You cannot pass both thinking_level
and thinking_budget
in the same request — that returns a 400 error.
Trap 2: Sampling Parameters Are Gone
temperature
, top_p
, and top_k
are no longer accepted. Remove them from all requests entirely. Any code copied from pre-3.x Gemini examples — which covers most existing codebases — needs a cleanup pass before migration.
Trap 3: Function Responses Now Require id and name
Every FunctionResponse
must now include both the id
from the corresponding FunctionCall
and a name
that matches exactly. This one fails loudly — you’ll see errors immediately — but it’s easy to miss if you have multiple tool-calling paths in your codebase. Check the official migration docs for the full function calling spec.
Which thinking_level Should You Actually Use? #
The counter-intuitive answer from Google’s own documentation: for agentic coding workflows, use “low”
, not “medium”
.
Google specifically retuned the low
setting for code and tool-calling workloads. It’s faster, cheaper, and on coding benchmarks comparable to medium
. Reaching for “high”
as a default is expensive and adds latency without proportional gains for most agent tasks.
| Level | Best for |
|---|---|
minimal |
High-volume classification, trivial chat queries |
low |
Agentic coding loops, tool-calling workflows (recommended) |
medium |
Complex coding tasks, general default |
high |
Hard math, complex reasoning, research synthesis |
One thing worth watching on long-running sessions: internal reasoning tokens are preserved automatically across multi-turn conversations. That improves coherence but inflates costs 30–50% on extended agent loops. Monitor the ThoughtsTokenCount
metric; if it exceeds 40% of PromptTokenCount
on later turns, restarting the session is cheaper than continuing it.
The Pricing Reality #
Hacker News noticed the price increase immediately. At $1.50/$9.00 per million input/output tokens, Gemini 3.5 Flash costs three times more than Gemini 3 Flash Preview and six times more than Flash-Lite. That’s worth acknowledging.
The relevant comparison, though, is against what it’s replacing in practice. At $2.00/$12.00 per million tokens, Gemini 3.1 Pro costs about 25% more for inferior performance on coding tasks. For teams already on Pro, 3.5 Flash is a pay cut with better results.
The biggest lever is caching. At $0.15 per million cached input tokens — a 90% discount — agents with large, stable system prompts can recover most of the premium. Google demoed 93 parallel subagents completing 15,000+ requests in 12 hours for under $1,000. That math works with aggressive caching in place.
Migration Checklist #
- Update model ID:
gemini-3-flash-preview
→gemini-3.5-flash
- Replace
thinking_budget
withthinking_level
(string:“minimal”
,“low”
,“medium”
,“high”
) - Set
thinking_level
explicitly — don’t rely on the medium default - Remove
temperature
,top_p
,top_k
from all requests - Add
id
andname
to everyFunctionResponse
- Keep
gemini-3-flash-preview
running for: Computer Use, image generation, audio generation, and Live API - Watch
ThoughtsTokenCount
growth across multi-turn sessions
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=prompt,
config={
"thinking_config": {
"thinking_level": "low" # optimal for agentic coding
}
},
)
The Verdict #
Migrate for agentic and coding workloads. The performance gains are real, the speed advantage is significant, and the pricing is defensible if you’re already on Pro. Hold on gemini-3-flash-preview
for Computer Use and Live API workloads — those aren’t supported in 3.5 yet.
The three breaking changes are concrete and fixable in an afternoon. The thinking_level
default change is the one most likely to ship silently to production. Set it explicitly on every request, run your evals before flipping the switch, and review Appwrite’s independent benchmark analysis if you need a second opinion beyond Google’s own numbers. The full technical spec is available on the Google DeepMind model card.