Gemini 3.5 Flash Is Now GA: Three API Traps to Know

Google has made Gemini 3.5 Flash generally available, offering faster performance and improved agentic and coding benchmarks over its predecessor, but developers migrating from the preview version must navigate three breaking API changes: a silent shift in default thinking effort, removal of sampling parameters, and stricter function response requirements. The model excels in high-throughput agentic tasks but trails in complex reasoning, and Google recommends using the "low" thinking level for coding workflows to optimize speed and cost.

Gemini 3.5 Flash is now generally available https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/ to all developers via the Gemini API and Google AI Studio. The headline writes itself: a Flash model that outperforms last year’s Pro on coding and agentic benchmarks. The numbers mostly hold up. What Google’s blog post buries, though, is that migrating from gemini-3-flash-preview to gemini-3.5-flash introduces three breaking changes that will silently degrade your outputs before you notice anything is wrong. What Actually Changed: The Benchmark Story Is Lopsided on Purpose Gemini 3.5 Flash doesn’t beat 3.1 Pro across the board. It beats it specifically on the tasks that production AI agents actually run. On agentic and coding benchmarks, the lead is real: Terminal-Bench 2.1 jumps from 70.3% to 76.2%, MCP Atlas from 78.2% to 83.6%, and Finance Agent v2 from 43.0% to 57.9%. It runs at 289 tokens per second — roughly four times faster than other frontier models. For anything involving tool calls, parallel subagents, or high-throughput document processing, Gemini 3.5 Flash is the clear choice in its price bracket. Where it trails 3.1 Pro: Humanity’s Last Exam 40.2% vs 44.4% , ARC-AGI-2 72.1% vs 77.1% , and long-context retrieval at 128k tokens 77.3% vs 84.9% . If your workload is primarily complex reasoning — hard math, multi-step logic chains, research synthesis — don’t migrate yet. The new model sacrifices some of that depth in exchange for speed. That’s a reasonable trade for most production agents, and a bad one for tasks that aren’t agents at all. The Three API Traps These are not edge cases. All three will hit most codebases on migration. Trap 1: The Thinking Default Changed Silently In gemini-3-flash-preview , the default thinking effort was equivalent to high . In gemini-3.5-flash , it defaults to medium . The new parameter name is thinking level — a string enum that replaces the old thinking budget integer. If you copy-paste a migration without setting thinking level explicitly, you’ll get quieter, less capable outputs with no error message. The model just answers differently. You cannot pass both thinking level and thinking budget in the same request — that returns a 400 error. Trap 2: Sampling Parameters Are Gone temperature , top p , and top k are no longer accepted. Remove them from all requests entirely. Any code copied from pre-3.x Gemini examples — which covers most existing codebases — needs a cleanup pass before migration. Trap 3: Function Responses Now Require id and name Every FunctionResponse must now include both the id from the corresponding FunctionCall and a name that matches exactly. This one fails loudly — you’ll see errors immediately — but it’s easy to miss if you have multiple tool-calling paths in your codebase. Check the official migration docs https://ai.google.dev/gemini-api/docs/interactions/whats-new-gemini-3.5 for the full function calling spec. Which thinking level Should You Actually Use? The counter-intuitive answer from Google’s own documentation: for agentic coding workflows, use “low” , not “medium” . Google specifically retuned the low setting for code and tool-calling workloads. It’s faster, cheaper, and on coding benchmarks comparable to medium . Reaching for “high” as a default is expensive and adds latency without proportional gains for most agent tasks. | Level | Best for | |---|---| minimal | High-volume classification, trivial chat queries | low | Agentic coding loops, tool-calling workflows recommended | medium | Complex coding tasks, general default | high | Hard math, complex reasoning, research synthesis | One thing worth watching on long-running sessions: internal reasoning tokens are preserved automatically across multi-turn conversations. That improves coherence but inflates costs 30–50% on extended agent loops. Monitor the ThoughtsTokenCount metric; if it exceeds 40% of PromptTokenCount on later turns, restarting the session is cheaper than continuing it. The Pricing Reality Hacker News noticed the price increase immediately. At $1.50/$9.00 per million input/output tokens, Gemini 3.5 Flash costs three times more than Gemini 3 Flash Preview and six times more than Flash-Lite. That’s worth acknowledging. The relevant comparison, though, is against what it’s replacing in practice. At $2.00/$12.00 per million tokens, Gemini 3.1 Pro costs about 25% more for inferior performance on coding tasks. For teams already on Pro, 3.5 Flash is a pay cut with better results. The biggest lever is caching. At $0.15 per million cached input tokens — a 90% discount — agents with large, stable system prompts can recover most of the premium. Google demoed 93 parallel subagents completing 15,000+ requests in 12 hours for under $1,000. That math works with aggressive caching in place. Migration Checklist - Update model ID: gemini-3-flash-preview → gemini-3.5-flash - Replace thinking budget with thinking level string: “minimal” , “low” , “medium” , “high” - Set thinking level explicitly — don’t rely on the medium default - Remove temperature , top p , top k from all requests - Add id and name to every FunctionResponse - Keep gemini-3-flash-preview running for: Computer Use, image generation, audio generation, and Live API - Watch ThoughtsTokenCount growth across multi-turn sessions response = client.models.generate content model="gemini-3.5-flash", contents=prompt, config={ "thinking config": { "thinking level": "low" optimal for agentic coding } }, The Verdict Migrate for agentic and coding workloads. The performance gains are real, the speed advantage is significant, and the pricing is defensible if you’re already on Pro. Hold on gemini-3-flash-preview for Computer Use and Live API workloads — those aren’t supported in 3.5 yet. The three breaking changes are concrete and fixable in an afternoon. The thinking level default change is the one most likely to ship silently to production. Set it explicitly on every request, run your evals before flipping the switch, and review Appwrite’s independent benchmark analysis https://appwrite.io/blog/post/gemini-3-5-flash-deep-dive if you need a second opinion beyond Google’s own numbers. The full technical spec is available on the Google DeepMind model card https://deepmind.google/models/gemini/flash/ .