Google's Gemini 3.5 Flash is 4x faster than other frontier models. Here is how to call it from TypeScript. Google shipped Gemini 3.5 Flash on May 19 at Google I/O 2026, claiming four times faster output tokens per second compared to other frontier models. The model, positioned as the fast tier in the 3.5 family, scored 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas for agentic and coding workloads. Google provides a TypeScript SDK via the `@google/genai` package, supporting both blocking and streaming calls for latency-sensitive applications like agentic loops and code generation. Google shipped Gemini 3.5 Flash on May 19 at Google I/O 2026. The headline claim is four times faster output tokens per second compared to other frontier models. That is not a marketing tier label. The claim is a throughput number, and for latency-sensitive work like streaming chat, code generation, or agentic loops, it changes what is worth reaching for. Here is what the model actually is, how to wire it up in TypeScript, and what the cost and rate limit picture looks like before you depend on it in production. | Dimension | Gemini 3.5 Flash | Gemini 2.5 Flash | |---|---|---| | Output speed | 4x faster than other frontier models | Best price-performance for high-volume tasks | | Primary use | Agentic workflows, coding, long-horizon tasks | Cost-sensitive, high-volume, reasoning tasks | | Input price | $1.50 per 1M tokens | $0.30 per 1M tokens | | Output price | $9.00 per 1M tokens | $2.50 per 1M tokens | | Free tier | Yes limited | Yes standard rate limits | | SDK package | @google/genai | @google/genai | | Model ID | gemini-3.5-flash | gemini-2.5-flash | | Released | May 19, 2026 | Earlier in 2026 | Google positions Gemini 3.5 Flash as the fast tier in the 3.5 family. The framing from the announcement is "frontier intelligence with action," which is a wordy way of saying: this model runs complex agentic tasks at a speed where the latency is not the bottleneck anymore. The benchmarks Google published back this up. On Terminal-Bench 2.1, 3.5 Flash scores 76.2%. On MCP Atlas it hits 83.6%. On CharXiv Reasoning, a multimodal benchmark, it reaches 84.2%. Google published those scores for agentic and coding workloads, not general chat. Where does it fit against the rest of the lineup? The 2.5 Flash is cheaper per token and designed for high-volume reasoning tasks where cost per call matters more than raw throughput. The 3.5 Flash costs more but delivers output fast enough that the wall-clock time for an agentic loop shrinks, which can lower your per-task cost even at a higher per-token rate. Google's own framing is "often at less than half the cost of other frontier models" for full tasks, not individual calls. For most TypeScript projects, the decision point is: does your user wait for the output, or does a pipeline consume it? If a user is staring at a cursor, speed matters and 3.5 Flash is worth the price premium. If a background job is processing documents at scale, 2.5 Flash is likely the right call. The SDK is @google/genai . Node.js 18 or later required. npm install @google/genai Set your API key from Google AI Studio https://aistudio.google.com : export GEMINI API KEY="your-key-here" Basic call: js import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI { apiKey: process.env.GEMINI API KEY } ; const response = await ai.models.generateContent { model: "gemini-3.5-flash", contents: "Summarize the key breaking changes in Node.js 22 for a TypeScript developer.", } ; console.log response.text ; That is the whole surface for a one-shot request. The GoogleGenAI constructor accepts the key directly or reads GEMINI API KEY from the environment when called with an empty object {} . Prefer the explicit key reference so your intent is clear at the call site. Worth noting: response.text is a convenience accessor. The full response tree lives at response.candidates 0 .content.parts . You only need to go that deep when handling multi-modal outputs or function call responses. Four times faster output speed matters most when you stream. A blocking generateContent call holds the connection open until the model finishes. For a 1,000-token response at high throughput, that is still a perceivable wait for a user. Streaming pipes each chunk to the client as the model produces it. js import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI { apiKey: process.env.GEMINI API KEY } ; async function streamToStdout prompt: string : Promise