Google's Gemini 3.5 Flash is 4x faster than other frontier models. Here is how to call it from TypeScript.

Google shipped Gemini 3.5 Flash on May 19 at Google I/O 2026, claiming four times faster output tokens per second compared to other frontier models. The model, positioned as the fast tier in the 3.5 family, scored 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas for agentic and coding workloads. Google provides a TypeScript SDK via the `@google/genai` package, supporting both blocking and streaming calls for latency-sensitive applications like agentic loops and code generation.

Google shipped Gemini 3.5 Flash on May 19 at Google I/O 2026. The headline claim is four times faster output tokens per second compared to other frontier models. That is not a marketing tier label. The claim is a throughput number, and for latency-sensitive work like streaming chat, code generation, or agentic loops, it changes what is worth reaching for. Here is what the model actually is, how to wire it up in TypeScript, and what the cost and rate limit picture looks like before you depend on it in production. | Dimension | Gemini 3.5 Flash | Gemini 2.5 Flash | |---|---|---| | Output speed | 4x faster than other frontier models | Best price-performance for high-volume tasks | | Primary use | Agentic workflows, coding, long-horizon tasks | Cost-sensitive, high-volume, reasoning tasks | | Input price | $1.50 per 1M tokens | $0.30 per 1M tokens | | Output price | $9.00 per 1M tokens | $2.50 per 1M tokens | | Free tier | Yes limited | Yes standard rate limits | | SDK package | @google/genai | @google/genai | | Model ID | gemini-3.5-flash | gemini-2.5-flash | | Released | May 19, 2026 | Earlier in 2026 | Google positions Gemini 3.5 Flash as the fast tier in the 3.5 family. The framing from the announcement is "frontier intelligence with action," which is a wordy way of saying: this model runs complex agentic tasks at a speed where the latency is not the bottleneck anymore. The benchmarks Google published back this up. On Terminal-Bench 2.1, 3.5 Flash scores 76.2%. On MCP Atlas it hits 83.6%. On CharXiv Reasoning, a multimodal benchmark, it reaches 84.2%. Google published those scores for agentic and coding workloads, not general chat. Where does it fit against the rest of the lineup? The 2.5 Flash is cheaper per token and designed for high-volume reasoning tasks where cost per call matters more than raw throughput. The 3.5 Flash costs more but delivers output fast enough that the wall-clock time for an agentic loop shrinks, which can lower your per-task cost even at a higher per-token rate. Google's own framing is "often at less than half the cost of other frontier models" for full tasks, not individual calls. For most TypeScript projects, the decision point is: does your user wait for the output, or does a pipeline consume it? If a user is staring at a cursor, speed matters and 3.5 Flash is worth the price premium. If a background job is processing documents at scale, 2.5 Flash is likely the right call. The SDK is @google/genai . Node.js 18 or later required. npm install @google/genai Set your API key from Google AI Studio https://aistudio.google.com : export GEMINI API KEY="your-key-here" Basic call: js import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI { apiKey: process.env.GEMINI API KEY } ; const response = await ai.models.generateContent { model: "gemini-3.5-flash", contents: "Summarize the key breaking changes in Node.js 22 for a TypeScript developer.", } ; console.log response.text ; That is the whole surface for a one-shot request. The GoogleGenAI constructor accepts the key directly or reads GEMINI API KEY from the environment when called with an empty object {} . Prefer the explicit key reference so your intent is clear at the call site. Worth noting: response.text is a convenience accessor. The full response tree lives at response.candidates 0 .content.parts . You only need to go that deep when handling multi-modal outputs or function call responses. Four times faster output speed matters most when you stream. A blocking generateContent call holds the connection open until the model finishes. For a 1,000-token response at high throughput, that is still a perceivable wait for a user. Streaming pipes each chunk to the client as the model produces it. js import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI { apiKey: process.env.GEMINI API KEY } ; async function streamToStdout prompt: string : Promise<void { const stream = await ai.models.generateContentStream { model: "gemini-3.5-flash", contents: prompt, } ; for await const chunk of stream { process.stdout.write chunk.text ?? "" ; } process.stdout.write "\n" ; } await streamToStdout "Write a TypeScript function that retries a promise up to N times with exponential backoff." ; In a Next.js API route or an Express server, you would pipe chunk.text into a ReadableStream and set Content-Type: text/event-stream . The pattern is the same: iterate the async generator, forward each chunk. js // pages/api/generate.ts Next.js App Router example import { NextRequest } from "next/server"; import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI { apiKey: process.env.GEMINI API KEY } ; export async function POST req: NextRequest { const { prompt } = await req.json ; const stream = await ai.models.generateContentStream { model: "gemini-3.5-flash", contents: prompt, } ; const readable = new ReadableStream { async start controller { for await const chunk of stream { controller.enqueue new TextEncoder .encode chunk.text ?? "" ; } controller.close ; }, } ; return new Response readable, { headers: { "Content-Type": "text/plain; charset=utf-8" }, } ; } The 4x throughput claim shows up in the time between the first chunk and the last. At high output speeds, the stream feels snappy from the user's side even when total token count is large. Gemini 3.5 Flash handles function calling with a three-step cycle: you declare the tool, the model returns a function call request, you execute and send back the result. One thing to know before you write any code: Gemini 3 model APIs attach a unique id to every function call. You must echo that id back in the function response or the model cannot match results to calls. This changed in the 3.x API line. js import { GoogleGenAI, Type } from "@google/genai"; const ai = new GoogleGenAI { apiKey: process.env.GEMINI API KEY } ; // Step 1: Declare the tool const getWeatherDeclaration = { name: "get weather", description: "Returns current weather conditions for a city.", parameters: { type: Type.OBJECT, properties: { city: { type: Type.STRING, description: "City name, e.g. Tokyo", }, units: { type: Type.STRING, description: "Temperature unit: celsius or fahrenheit", }, }, required: "city" , }, }; // Step 2: Send the initial request const response = await ai.models.generateContent { model: "gemini-3.5-flash", contents: "What is the weather in Oslo right now?", config: { tools: { functionDeclarations: getWeatherDeclaration } , }, } ; // Step 3: Handle the function call if response.functionCalls && response.functionCalls.length 0 { const call = response.functionCalls 0 ; // Your real implementation here const weatherData = await fetchWeatherFromYourAPI call.args as { city: string; units?: string } ; // Build conversation history with the function result const history = { role: "user", parts: { text: "What is the weather in Oslo right now?" } }, response.candidates 0 .content, { role: "user", parts: { functionResponse: { id: call.id, // Required in Gemini 3.x name: call.name, response: { result: weatherData }, }, }, , }, ; // Step 4: Get the final natural-language response const final = await ai.models.generateContent { model: "gemini-3.5-flash", contents: history, config: { tools: { functionDeclarations: getWeatherDeclaration } , }, } ; console.log final.text ; } async function fetchWeatherFromYourAPI args: { city: string; units?: string } { // Placeholder. Replace with your actual weather API call. return { temperature: 12, condition: "cloudy", city: args.city }; } Two practical notes. The Type enum imported from @google/genai is mandatory for the parameter schema. Do not pass raw strings like "object" for the type field. The model also accepts an array of tool declarations, and you can include more than one function if your agentic workflow needs to route between them. For parallel tool calls in a single turn, the model may return more than one entry in response.functionCalls . Iterate the array, execute each, and send all results back in one follow-up request. The pricing numbers above in the TL;DR table come from Google AI Studio's pricing page as of May 2026. Two practical caveats before you budget anything. Gemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens on the paid tier. Output pricing includes thinking tokens if the model uses internal reasoning steps. In a chat or code-generation workflow, output typically runs 2 to 4 times the input token count, so budget accordingly. The 2.5 Flash at $0.30 input / $2.50 output is a meaningful difference at scale. A task that generates 10,000 output tokens costs $0.025 on 2.5 Flash and $0.09 on 3.5 Flash. That is 3.6x more per call. The gap can close if the 4x speed advantage means 3.5 Flash completes a multi-turn agentic task in fewer wall-clock seconds and the task itself needs fewer total tokens because the model gets there faster. Test against your actual workload rather than extrapolating from single-call pricing. Both models have a free tier through the Gemini API with rate limits Google does not publish precisely on the pricing page. The paid tier removes the per-day caps. If you are prototyping, the free tier is enough. If you are running production traffic, use a paid project and set a monthly spend cap in the Google Cloud console. One hard ceiling worth knowing: Google Search grounding requests share a 5,000 prompt monthly quota across all Gemini 3 models on the free tier, then $14 per 1,000 queries on paid. If your tool-calling setup routes through Search grounding, that quota burns faster than you expect. Gemini 3.5 Flash is worth adding to your model comparison list. Google's own benchmarks back the 4x output speed claim, and the numbers line up with the agentic workload focus. The TypeScript SDK is straightforward. The function calling API has one new rule compared to older Gemini versions: always echo the id field back in your function response. The price premium over 2.5 Flash is real. Whether it pays back depends on whether your users wait for output and whether your agentic loops shrink enough in wall-clock time to offset the per-token cost difference. Run both models against your actual task shape before committing either to production. What kind of workload are you considering Gemini 3.5 Flash for? Drop a comment, especially if you have run latency comparisons against other frontier models. GDS K S · thegdsks.com https://thegdsks.com · follow on X @thegdsks https://x.com/thegdsks Speed is only free if you would have paid for the wall-clock time anyway.