{"slug": "google-s-gemini-3-5-flash-is-4x-faster-than-other-frontier-models-here-is-how-to", "title": "Google's Gemini 3.5 Flash is 4x faster than other frontier models. Here is how to call it from TypeScript.", "summary": "Google shipped Gemini 3.5 Flash on May 19 at Google I/O 2026, claiming four times faster output tokens per second compared to other frontier models. The model, positioned as the fast tier in the 3.5 family, scored 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas for agentic and coding workloads. Google provides a TypeScript SDK via the `@google/genai` package, supporting both blocking and streaming calls for latency-sensitive applications like agentic loops and code generation.", "body_md": "Google shipped Gemini 3.5 Flash on May 19 at Google I/O 2026. The headline claim is four times faster output tokens per second compared to other frontier models. That is not a marketing tier label. The claim is a throughput number, and for latency-sensitive work like streaming chat, code generation, or agentic loops, it changes what is worth reaching for.\n\nHere is what the model actually is, how to wire it up in TypeScript, and what the cost and rate limit picture looks like before you depend on it in production.\n\n| Dimension | Gemini 3.5 Flash | Gemini 2.5 Flash |\n|---|---|---|\n| Output speed | 4x faster than other frontier models | Best price-performance for high-volume tasks |\n| Primary use | Agentic workflows, coding, long-horizon tasks | Cost-sensitive, high-volume, reasoning tasks |\n| Input price | $1.50 per 1M tokens | $0.30 per 1M tokens |\n| Output price | $9.00 per 1M tokens | $2.50 per 1M tokens |\n| Free tier | Yes (limited) | Yes (standard rate limits) |\n| SDK package | `@google/genai` |\n`@google/genai` |\n| Model ID | `gemini-3.5-flash` |\n`gemini-2.5-flash` |\n| Released | May 19, 2026 | Earlier in 2026 |\n\nGoogle positions Gemini 3.5 Flash as the fast tier in the 3.5 family. The framing from the announcement is \"frontier intelligence with action,\" which is a wordy way of saying: this model runs complex agentic tasks at a speed where the latency is not the bottleneck anymore.\n\nThe benchmarks Google published back this up. On Terminal-Bench 2.1, 3.5 Flash scores 76.2%. On MCP Atlas it hits 83.6%. On CharXiv Reasoning, a multimodal benchmark, it reaches 84.2%. Google published those scores for agentic and coding workloads, not general chat.\n\nWhere does it fit against the rest of the lineup? The 2.5 Flash is cheaper per token and designed for high-volume reasoning tasks where cost per call matters more than raw throughput. The 3.5 Flash costs more but delivers output fast enough that the wall-clock time for an agentic loop shrinks, which can lower your per-task cost even at a higher per-token rate. Google's own framing is \"often at less than half the cost of other frontier models\" for full tasks, not individual calls.\n\nFor most TypeScript projects, the decision point is: does your user wait for the output, or does a pipeline consume it? If a user is staring at a cursor, speed matters and 3.5 Flash is worth the price premium. If a background job is processing documents at scale, 2.5 Flash is likely the right call.\n\nThe SDK is `@google/genai`\n\n. Node.js 18 or later required.\n\n```\nnpm install @google/genai\n```\n\nSet your API key from [Google AI Studio](https://aistudio.google.com):\n\n```\nexport GEMINI_API_KEY=\"your-key-here\"\n```\n\nBasic call:\n\n``` js\nimport { GoogleGenAI } from \"@google/genai\";\n\nconst ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });\n\nconst response = await ai.models.generateContent({\n  model: \"gemini-3.5-flash\",\n  contents: \"Summarize the key breaking changes in Node.js 22 for a TypeScript developer.\",\n});\n\nconsole.log(response.text);\n```\n\nThat is the whole surface for a one-shot request. The `GoogleGenAI`\n\nconstructor accepts the key directly or reads `GEMINI_API_KEY`\n\nfrom the environment when called with an empty object `{}`\n\n. Prefer the explicit key reference so your intent is clear at the call site.\n\nWorth noting: `response.text`\n\nis a convenience accessor. The full response tree lives at `response.candidates[0].content.parts`\n\n. You only need to go that deep when handling multi-modal outputs or function call responses.\n\nFour times faster output speed matters most when you stream. A blocking `generateContent`\n\ncall holds the connection open until the model finishes. For a 1,000-token response at high throughput, that is still a perceivable wait for a user. Streaming pipes each chunk to the client as the model produces it.\n\n``` js\nimport { GoogleGenAI } from \"@google/genai\";\n\nconst ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });\n\nasync function streamToStdout(prompt: string): Promise<void> {\n  const stream = await ai.models.generateContentStream({\n    model: \"gemini-3.5-flash\",\n    contents: prompt,\n  });\n\n  for await (const chunk of stream) {\n    process.stdout.write(chunk.text ?? \"\");\n  }\n\n  process.stdout.write(\"\\n\");\n}\n\nawait streamToStdout(\"Write a TypeScript function that retries a promise up to N times with exponential backoff.\");\n```\n\nIn a Next.js API route or an Express server, you would pipe `chunk.text`\n\ninto a `ReadableStream`\n\nand set `Content-Type: text/event-stream`\n\n. The pattern is the same: iterate the async generator, forward each chunk.\n\n``` js\n// pages/api/generate.ts (Next.js App Router example)\nimport { NextRequest } from \"next/server\";\nimport { GoogleGenAI } from \"@google/genai\";\n\nconst ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });\n\nexport async function POST(req: NextRequest) {\n  const { prompt } = await req.json();\n\n  const stream = await ai.models.generateContentStream({\n    model: \"gemini-3.5-flash\",\n    contents: prompt,\n  });\n\n  const readable = new ReadableStream({\n    async start(controller) {\n      for await (const chunk of stream) {\n        controller.enqueue(new TextEncoder().encode(chunk.text ?? \"\"));\n      }\n      controller.close();\n    },\n  });\n\n  return new Response(readable, {\n    headers: { \"Content-Type\": \"text/plain; charset=utf-8\" },\n  });\n}\n```\n\nThe 4x throughput claim shows up in the time between the first chunk and the last. At high output speeds, the stream feels snappy from the user's side even when total token count is large.\n\nGemini 3.5 Flash handles function calling with a three-step cycle: you declare the tool, the model returns a function call request, you execute and send back the result.\n\nOne thing to know before you write any code: Gemini 3 model APIs attach a unique `id`\n\nto every function call. You must echo that `id`\n\nback in the function response or the model cannot match results to calls. This changed in the 3.x API line.\n\n``` js\nimport { GoogleGenAI, Type } from \"@google/genai\";\n\nconst ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });\n\n// Step 1: Declare the tool\nconst getWeatherDeclaration = {\n  name: \"get_weather\",\n  description: \"Returns current weather conditions for a city.\",\n  parameters: {\n    type: Type.OBJECT,\n    properties: {\n      city: {\n        type: Type.STRING,\n        description: \"City name, e.g. Tokyo\",\n      },\n      units: {\n        type: Type.STRING,\n        description: \"Temperature unit: celsius or fahrenheit\",\n      },\n    },\n    required: [\"city\"],\n  },\n};\n\n// Step 2: Send the initial request\nconst response = await ai.models.generateContent({\n  model: \"gemini-3.5-flash\",\n  contents: \"What is the weather in Oslo right now?\",\n  config: {\n    tools: [{ functionDeclarations: [getWeatherDeclaration] }],\n  },\n});\n\n// Step 3: Handle the function call\nif (response.functionCalls && response.functionCalls.length > 0) {\n  const call = response.functionCalls[0];\n\n  // Your real implementation here\n  const weatherData = await fetchWeatherFromYourAPI(call.args as { city: string; units?: string });\n\n  // Build conversation history with the function result\n  const history = [\n    { role: \"user\", parts: [{ text: \"What is the weather in Oslo right now?\" }] },\n    response.candidates![0].content,\n    {\n      role: \"user\",\n      parts: [\n        {\n          functionResponse: {\n            id: call.id,       // Required in Gemini 3.x\n            name: call.name,\n            response: { result: weatherData },\n          },\n        },\n      ],\n    },\n  ];\n\n  // Step 4: Get the final natural-language response\n  const final = await ai.models.generateContent({\n    model: \"gemini-3.5-flash\",\n    contents: history,\n    config: {\n      tools: [{ functionDeclarations: [getWeatherDeclaration] }],\n    },\n  });\n\n  console.log(final.text);\n}\n\nasync function fetchWeatherFromYourAPI(args: { city: string; units?: string }) {\n  // Placeholder. Replace with your actual weather API call.\n  return { temperature: 12, condition: \"cloudy\", city: args.city };\n}\n```\n\nTwo practical notes. The `Type`\n\nenum imported from `@google/genai`\n\nis mandatory for the parameter schema. Do not pass raw strings like `\"object\"`\n\nfor the type field. The model also accepts an array of tool declarations, and you can include more than one function if your agentic workflow needs to route between them.\n\nFor parallel tool calls in a single turn, the model may return more than one entry in `response.functionCalls`\n\n. Iterate the array, execute each, and send all results back in one follow-up request.\n\nThe pricing numbers above in the TL;DR table come from Google AI Studio's pricing page as of May 2026. Two practical caveats before you budget anything.\n\nGemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens on the paid tier. Output pricing includes thinking tokens if the model uses internal reasoning steps. In a chat or code-generation workflow, output typically runs 2 to 4 times the input token count, so budget accordingly.\n\nThe 2.5 Flash at $0.30 input / $2.50 output is a meaningful difference at scale. A task that generates 10,000 output tokens costs $0.025 on 2.5 Flash and $0.09 on 3.5 Flash. That is 3.6x more per call. The gap can close if the 4x speed advantage means 3.5 Flash completes a multi-turn agentic task in fewer wall-clock seconds and the task itself needs fewer total tokens because the model gets there faster. Test against your actual workload rather than extrapolating from single-call pricing.\n\nBoth models have a free tier through the Gemini API with rate limits Google does not publish precisely on the pricing page. The paid tier removes the per-day caps. If you are prototyping, the free tier is enough. If you are running production traffic, use a paid project and set a monthly spend cap in the Google Cloud console.\n\nOne hard ceiling worth knowing: Google Search grounding requests share a 5,000 prompt monthly quota across all Gemini 3 models on the free tier, then $14 per 1,000 queries on paid. If your tool-calling setup routes through Search grounding, that quota burns faster than you expect.\n\nGemini 3.5 Flash is worth adding to your model comparison list. Google's own benchmarks back the 4x output speed claim, and the numbers line up with the agentic workload focus. The TypeScript SDK is straightforward. The function calling API has one new rule compared to older Gemini versions: always echo the `id`\n\nfield back in your function response.\n\nThe price premium over 2.5 Flash is real. Whether it pays back depends on whether your users wait for output and whether your agentic loops shrink enough in wall-clock time to offset the per-token cost difference. Run both models against your actual task shape before committing either to production.\n\nWhat kind of workload are you considering Gemini 3.5 Flash for? Drop a comment, especially if you have run latency comparisons against other frontier models.\n\n**GDS K S** · [thegdsks.com](https://thegdsks.com) · follow on X [@thegdsks](https://x.com/thegdsks)\n\n*Speed is only free if you would have paid for the wall-clock time anyway.*", "url": "https://wpnews.pro/news/google-s-gemini-3-5-flash-is-4x-faster-than-other-frontier-models-here-is-how-to", "canonical_source": "https://dev.to/thegdsks/googles-gemini-35-flash-is-4x-faster-than-other-frontier-models-here-is-how-to-call-it-from-2ih5", "published_at": "2026-05-27 17:20:41+00:00", "updated_at": "2026-05-27 17:41:19.466390+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "generative-ai", "ai-products", "ai-tools"], "entities": ["Google", "Gemini 3.5 Flash", "Gemini 2.5 Flash", "Google I/O 2026", "Terminal-Bench 2.1", "TypeScript"], "alternates": {"html": "https://wpnews.pro/news/google-s-gemini-3-5-flash-is-4x-faster-than-other-frontier-models-here-is-how-to", "markdown": "https://wpnews.pro/news/google-s-gemini-3-5-flash-is-4x-faster-than-other-frontier-models-here-is-how-to.md", "text": "https://wpnews.pro/news/google-s-gemini-3-5-flash-is-4x-faster-than-other-frontier-models-here-is-how-to.txt", "jsonld": "https://wpnews.pro/news/google-s-gemini-3-5-flash-is-4x-faster-than-other-frontier-models-here-is-how-to.jsonld"}}