# Gemini 3.5 Flash Is Now GA: Three API Traps to Know

> Source: <https://byteiota.com/gemini-3-5-flash-is-now-ga-three-api-traps-to-know/>
> Published: 2026-06-18 15:10:31+00:00

Gemini 3.5 Flash is now [generally available](https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/) to all developers via the Gemini API and Google AI Studio. The headline writes itself: a Flash model that outperforms last year’s Pro on coding and agentic benchmarks. The numbers mostly hold up. What Google’s blog post buries, though, is that migrating from `gemini-3-flash-preview`

to `gemini-3.5-flash`

introduces three breaking changes that will silently degrade your outputs before you notice anything is wrong.

## What Actually Changed: The Benchmark Story Is Lopsided on Purpose

Gemini 3.5 Flash doesn’t beat 3.1 Pro across the board. It beats it specifically on the tasks that production AI agents actually run.

On agentic and coding benchmarks, the lead is real: Terminal-Bench 2.1 jumps from 70.3% to 76.2%, MCP Atlas from 78.2% to 83.6%, and Finance Agent v2 from 43.0% to 57.9%. It runs at 289 tokens per second — roughly four times faster than other frontier models. For anything involving tool calls, parallel subagents, or high-throughput document processing, Gemini 3.5 Flash is the clear choice in its price bracket.

Where it trails 3.1 Pro: Humanity’s Last Exam (40.2% vs 44.4%), ARC-AGI-2 (72.1% vs 77.1%), and long-context retrieval at 128k tokens (77.3% vs 84.9%). If your workload is primarily complex reasoning — hard math, multi-step logic chains, research synthesis — don’t migrate yet. The new model sacrifices some of that depth in exchange for speed. That’s a reasonable trade for most production agents, and a bad one for tasks that aren’t agents at all.

## The Three API Traps

These are not edge cases. All three will hit most codebases on migration.

### Trap 1: The Thinking Default Changed Silently

In `gemini-3-flash-preview`

, the default thinking effort was equivalent to `high`

. In `gemini-3.5-flash`

, it defaults to `medium`

. The new parameter name is `thinking_level`

— a string enum that replaces the old `thinking_budget`

integer. If you copy-paste a migration without setting `thinking_level`

explicitly, you’ll get quieter, less capable outputs with no error message. The model just answers differently.

You cannot pass both `thinking_level`

and `thinking_budget`

in the same request — that returns a 400 error.

### Trap 2: Sampling Parameters Are Gone

`temperature`

, `top_p`

, and `top_k`

are no longer accepted. Remove them from all requests entirely. Any code copied from pre-3.x Gemini examples — which covers most existing codebases — needs a cleanup pass before migration.

### Trap 3: Function Responses Now Require id and name

Every `FunctionResponse`

must now include both the `id`

from the corresponding `FunctionCall`

and a `name`

that matches exactly. This one fails loudly — you’ll see errors immediately — but it’s easy to miss if you have multiple tool-calling paths in your codebase. Check [the official migration docs](https://ai.google.dev/gemini-api/docs/interactions/whats-new-gemini-3.5) for the full function calling spec.

## Which thinking_level Should You Actually Use?

The counter-intuitive answer from Google’s own documentation: for agentic coding workflows, use `“low”`

, not `“medium”`

.

Google specifically retuned the `low`

setting for code and tool-calling workloads. It’s faster, cheaper, and on coding benchmarks comparable to `medium`

. Reaching for `“high”`

as a default is expensive and adds latency without proportional gains for most agent tasks.

| Level | Best for |
|---|---|
`minimal` | High-volume classification, trivial chat queries |
`low` | Agentic coding loops, tool-calling workflows (recommended) |
`medium` | Complex coding tasks, general default |
`high` | Hard math, complex reasoning, research synthesis |

One thing worth watching on long-running sessions: internal reasoning tokens are preserved automatically across multi-turn conversations. That improves coherence but inflates costs 30–50% on extended agent loops. Monitor the `ThoughtsTokenCount`

metric; if it exceeds 40% of `PromptTokenCount`

on later turns, restarting the session is cheaper than continuing it.

## The Pricing Reality

Hacker News noticed the price increase immediately. At $1.50/$9.00 per million input/output tokens, Gemini 3.5 Flash costs three times more than Gemini 3 Flash Preview and six times more than Flash-Lite. That’s worth acknowledging.

The relevant comparison, though, is against what it’s replacing in practice. At $2.00/$12.00 per million tokens, Gemini 3.1 Pro costs about 25% more for inferior performance on coding tasks. For teams already on Pro, 3.5 Flash is a pay cut with better results.

The biggest lever is caching. At $0.15 per million cached input tokens — a 90% discount — agents with large, stable system prompts can recover most of the premium. Google demoed 93 parallel subagents completing 15,000+ requests in 12 hours for under $1,000. That math works with aggressive caching in place.

## Migration Checklist

- Update model ID:
`gemini-3-flash-preview`

→`gemini-3.5-flash`

- Replace
`thinking_budget`

with`thinking_level`

(string:`“minimal”`

,`“low”`

,`“medium”`

,`“high”`

) - Set
`thinking_level`

explicitly — don’t rely on the medium default - Remove
`temperature`

,`top_p`

,`top_k`

from all requests - Add
`id`

and`name`

to every`FunctionResponse`

- Keep
`gemini-3-flash-preview`

running for: Computer Use, image generation, audio generation, and Live API - Watch
`ThoughtsTokenCount`

growth across multi-turn sessions

```
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=prompt,
    config={
        "thinking_config": {
            "thinking_level": "low"  # optimal for agentic coding
        }
    },
)
```

## The Verdict

Migrate for agentic and coding workloads. The performance gains are real, the speed advantage is significant, and the pricing is defensible if you’re already on Pro. Hold on `gemini-3-flash-preview`

for Computer Use and Live API workloads — those aren’t supported in 3.5 yet.

The three breaking changes are concrete and fixable in an afternoon. The `thinking_level`

default change is the one most likely to ship silently to production. Set it explicitly on every request, run your evals before flipping the switch, and review [Appwrite’s independent benchmark analysis](https://appwrite.io/blog/post/gemini-3-5-flash-deep-dive) if you need a second opinion beyond Google’s own numbers. The full technical spec is available on the [Google DeepMind model card](https://deepmind.google/models/gemini/flash/).
