Nano Banana Pro (Gemini 3 Pro Image): Developer Guide & API 2026

Google shipped Nano Banana Pro (formerly Gemini 3 Pro Image) to general availability in June 2026, positioning it as the most capable reasoning-driven image model with a public API. The model, priced at $0.134 per 1K or 2K image, introduces native image editing through a joint reasoning-generation process and achieves industry-leading text rendering in generated images. Google grounds the model's output in Search data for factual accuracy, and its 2–5 second generation speed significantly outperforms the 15–30 second latency of Imagen 4 Ultra for iterative workflows.

Google shipped Nano Banana Pro to general availability in June 2026 and nobody made a big deal of it. The I/O keynote spotlight went to Gemini Omni and Managed Agents. But for anyone building an app that generates or edits images, the model formerly known as Gemini 3 Pro Image is now the most capable reasoning-driven image model with a public API — at $0.134 per 1K or 2K image, $0.24 for 4K. The name is a Google internal codename that leaked and stuck. Nano Banana 2 Gemini 3.1 Flash Image is the cheaper, faster sibling. Nano Banana Pro is the high-quality lane. Both are now generally available in the Gemini API. Most image generation models work the same way: you send a text prompt, they return pixels. Nano Banana Pro adds a layer that matters if you build anything beyond basic generation: native image editing through a joint reasoning-generation process. You don't patch pixels externally. You send the original image plus an instruction in natural language, and the model applies changes while preserving everything you didn't ask it to touch. That sounds incremental. The specific thing it does better than the alternatives is text rendering. Accurate text inside generated images — product labels, UI mockups, infographic callouts, signage — has been an industry failure mode since the original Stable Diffusion era. Nano Banana Pro is the first model where "add the text 'Sale' in bold white on the product" reliably produces readable text rather than decorative gibberish. Google grounds its image generation in Search data, which means when you ask for "the Eiffel Tower at sunset, autumn 2026" you get factual geometry and verified lighting, not an impressionist interpretation. For factual data visualizations and product mockups, this grounding is genuinely useful. For surreal or stylized output, it's a constraint — Imagen 4 Ultra performs better there. | Model | API ID | Speed | Best For | Price/image | |---| | Nano Banana Pro | gemini-3-pro-image-preview | 2–5s | Text rendering, editing, complex scenes | $0.134 2K | | Nano Banana 2 | gemini-3-1-flash-image | <2s | High-volume, quick iterations | $0.02–$0.04 | | Imagen 4 Ultra | imagen-4.0-ultra-generate-001 | 15–30s | Photorealism, portraits, product photography | $0.06 | The speed gap is the real story. Nano Banana Pro generates in 2–5 seconds. Imagen 4 Ultra takes 15–30 seconds. A designer exploring 20–30 creative directions with Nano Banana Pro generates all of them in the time Imagen 4 Ultra takes to produce 3. For iterative workflows — agency mockups, A/B variant generation, UI wireframe illustration — that throughput difference compounds quickly. The quality trade-off is real too. In independent user testing from June 2026, 78% of participants preferred Imagen 4 Ultra for portrait photography skin texture, eye detail , and 73% chose it for product shots material accuracy, lighting . But 54% preferred Nano Banana Pro for stylized and creative output. The honest read: if you need photographic realism for headshots or luxury product shots, Imagen 4 Ultra wins. If you need volume, text accuracy, or editing control, Nano Banana Pro wins. You need Python SDK version 1.52+ or the JavaScript/TypeScript SDK version 1.30+. The generation call is synchronous — unlike Veo 3.1's async video generation, images come back directly: python from google import genai from google.genai import types import base64 client = genai.Client response = client.models.generate images model='gemini-3-pro-image-preview', prompt='A close-up product shot of a matte black coffee mug with the text "FOCUS" in minimalist serif font, white background, studio lighting', config=types.GenerateImagesConfig number of images=1, output mime type='image/png', aspect ratio='1:1', Save the image for i, image in enumerate response.generated images : with open f'output {i}.png', 'wb' as f: f.write image.image.image bytes The aspect ratio parameter accepts '1:1' , '16:9' , '9:16' , '4:3' , and '3:4' . For 4K output, set output image config={'width': 4096, 'height': 4096} — billing jumps to $0.24 per image at 4K. The editing model uses a separate endpoint ID: gemini-3-pro-image-preview-edit . You pass the original image as base64 alongside the instruction. The model preserves everything you didn't explicitly ask to change, which makes it genuinely useful for iterative design work: python from google import genai from google.genai import types import base64 client = genai.Client Load existing image with open 'product shot.png', 'rb' as f: image bytes = base64.b64encode f.read .decode response = client.models.generate images model='gemini-3-pro-image-preview-edit', prompt='Change the background to a warm wooden kitchen countertop, keep the mug identical', config=types.GenerateImagesConfig reference images= types.ReferenceImage reference image=types.Image image bytes=base64.b64decode image bytes , mime type='image/png' , number of images=1, for i, image in enumerate response.generated images : with open f'edited {i}.png', 'wb' as f: f.write image.image.image bytes The catch: complex inpainting editing a specific masked region while leaving the rest untouched still behaves inconsistently if the instruction is ambiguous. "Change the background to wood" works well because the foreground subject is unambiguous. "Make the shadow slightly softer" is less reliable — the model occasionally interprets it as "change the entire lighting setup." Be literal with editing instructions. If you want targeted changes, describe exactly what you want and what should stay the same. Two API surfaces exist. The Gemini API ai.google.dev is simpler: one API key, no project configuration. The Vertex AI path requires GOOGLE CLOUD PROJECT , GOOGLE CLOUD LOCATION , and GOOGLE GENAI USE VERTEXAI=True . Vertex adds enterprise features — VPC Service Controls, data residency, CMEK — plus access to the Batch/Flex route pricing. If you're building a prototype or internal tool: use the Gemini API. If you're building a production app with 500 image generations per day, run the numbers on Vertex Batch mode first. Batch/Flex pricing cuts standard rates in half — $0.067 per 2K image instead of $0.134 — at the cost of async delivery. For non-realtime workflows nightly product image refresh, bulk content generation , the savings stack up fast. 1,000 images per day at standard pricing costs $49/day. At Batch pricing: $24.50/day. That's $893/month savings on a modest workload. Every image generated by Nano Banana Pro ships with an invisible SynthID watermark embedded in the pixel data — no visible mark, no impact on image quality, but detectable by Google's verification tools. This is non-optional. You cannot generate without the watermark. For most use cases, this is a feature: you can verify your own AI-generated assets, comply with emerging disclosure requirements, and trace misuse. The one scenario where it matters negatively: if a client explicitly requires undetectable AI image generation for contractual or competitive reasons, Nano Banana Pro is not the right tool. Alternatives like Midjourney v8 or Flux Pro don't embed detectable watermarks in the same way. Google's SynthID verification API is also public, so third-party tools can detect Nano Banana Pro output. Factor that into workflows where the AI-generated nature of images needs to stay undisclosed. The per-image pricing hides some complexity. $0.134 per image applies at 1K and 2K resolution. That's because both consume approximately 1,120 output tokens in Google's billing model, and output pricing is $12.00 per million tokens. 4K images consume around 2,000 tokens, pricing them at $0.024 per thousand — which rounds to the $0.24 published rate. The token-based billing matters if you're mixing image and text generation in a single session. Input tokens your prompt + any reference images bill at $2.00 per million. Complex editing prompts that include high-resolution reference images can add meaningful token cost on top of the per-image rate. For a batch pipeline: benchmark your average session token count before committing to volume pricing tiers. Three scenarios where it's clearly the right choice right now. UI and product mockups at scale. If you're generating dozens of marketing variants, social media assets, or app screenshots, the 2–5 second generation time and reliable text rendering make Nano Banana Pro the only reasonable option. Imagen 4 is too slow for iteration; DALL-E 4 still struggles with text in most configurations. Content production pipelines. Blogs, newsletters, and content sites that need custom illustrations for every article can automate thumbnail and header image generation. At $0.134 per image and 3 seconds per call, a site publishing 10 articles per day spends $1.34/day on image generation — effectively replacing stock photo subscriptions. Product image variation. E-commerce teams can generate background variants, seasonal styling, and locale-specific adaptations from a single hero product shot. The editing model preserves product identity across variations with reasonable consistency. Where it's not the right choice: photorealistic human portraits Imagen 4 Ultra , anything requiring the surreal aesthetic typical of Midjourney v8, or use cases where SynthID detectability is a deal-breaker. The model also has no video output capability — that's Veo 3.1's lane, and the two models are separate API calls with no native chaining. Nano Banana Pro is generally available today. The API is stable, pricing is published, and the editing endpoint works in production. It is not the highest-quality image model available — Imagen 4 Ultra beats it on photorealism, and Midjourney v8 beats it on artistic range. What it is: the fastest, most controllable, best-at-text-rendering model with a Gemini API key and no waitlist. Originally published at wowhow.cloud