{"slug": "i-gave-gemini-3-5-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same", "title": "I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file.", "summary": "Here is a factual summary of the article:\n\nThe author built a code-review agent using Google's newly announced Gemini 3.5 Flash model and tested it on three real production pull requests. The model successfully identified three legitimate bugs with zero hallucinations, including an unrelated regex bug in the same file as a patch for a known Fastify security vulnerability (CVE-2026-25223). The entire agent was built in roughly two hours using approximately 80 lines of TypeScript and the `@google/genai` SDK with structured JSON output.", "body_md": "*This is a submission for the Google I/O Writing Challenge*\n\nAcross **3 real production PRs**, I asked Gemini 3.5 Flash to do a code review. The model — announced this week at Google I/O 2026 — caught **3 legitimate bugs, hallucinated 0**, in roughly 4 seconds per PR. The middle PR was the patch for a known security vulnerability in Fastify (CVE-2026-25223, a validation-bypass). The model flagged a second, unrelated regex bug **in the exact file being patched**.\n\nHere's what I learned building a code-review agent in about 2 hours with Google's new model.\n\n## Why I tested this\n\nAt the I/O keynote, Sundar Pichai pitched Gemini 3.5 Flash as \"frontier intelligence combined with action\" — optimized for agentic coding and long-horizon tasks. Code review is the perfect stress test: it requires reasoning about code semantics, cross-file context, and judgment about what matters.\n\nReading another 50 hype threads on X felt pointless. So I built the smallest possible agent that could actually use the model on real code, ran it on three concrete PRs, and counted what it got right, what it made up, and what it missed.\n\n## The architecture\n\nThree stages, ~80 lines of TypeScript, runs on Node 20+:\n\n```\nINPUT                  PROCESSING                       OUTPUT\n─────                  ──────────                       ──────\nowner/repo#N    →      1. fetch the .diff URL      →    stdout (colored summary)\n                       2. truncate if > 150k chars      out/{slug}.json\n                       3. build prompt + schema         out/{slug}.md\n                       4. Gemini 3.5 Flash call\n                       5. Zod-parse the response\n```\n\nNo GitHub token (public PRs use the unauthenticated `.diff`\n\nURL). No octokit. No frameworks. Just the new `@google/genai`\n\nSDK with structured output.\n\n## The core\n\nThe heart of the pipeline is a single `review()`\n\nfunction — pass it a diff, get back a typed array of issues:\n\n``` js\nimport { GoogleGenAI } from \"@google/genai\";\nimport { z } from \"zod\";\nimport { zodToJsonSchema } from \"zod-to-json-schema\";\n\nconst ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });\n\nconst IssueSchema = z.object({\n  file: z.string(),\n  line: z.number().nullable(),\n  severity: z.enum([\"low\", \"medium\", \"high\", \"critical\"]),\n  category: z.enum([\"bug\", \"security\", \"performance\", \"style\", \"logic\", \"maintainability\"]),\n  message: z.string(),\n  suggestion: z.string().nullable(),\n});\n\nconst ReviewSchema = z.object({\n  summary: z.string(),\n  issues: z.array(IssueSchema),\n});\n\nconst SYSTEM_PROMPT = `You are a senior code reviewer. Analyze the unified git\ndiff below and produce a JSON review.\n\nRules:\n- Flag REAL issues only — no nitpicks, no style preferences.\n- Prefer fewer, higher-quality issues over volume.\n- Each \"message\" must explain WHY it matters (impact, not just observation).\n- If you cannot see enough context to be sure, lower the severity.\n\nReturn the full review as JSON matching the provided schema.`;\n\nasync function review(diff: string) {\n  const res = await ai.models.generateContent({\n    model: \"gemini-3.5-flash\",\n    contents: `${SYSTEM_PROMPT}\\n\\n--- DIFF ---\\n${diff}`,\n    config: {\n      responseMimeType: \"application/json\",\n      responseJsonSchema: zodToJsonSchema(ReviewSchema),\n    },\n  });\n  return ReviewSchema.parse(JSON.parse(res.text ?? \"{}\"));\n}\n```\n\nA few details worth flagging:\n\n-\n**Model string:**`\"gemini-3.5-flash\"`\n\n. GA since May 19, 2026. -\n**Structured output:** use`responseJsonSchema`\n\n(not the older`responseSchema`\n\n). It validates against the Zod-derived schema and returns conformant JSON. No regex-parsing the response, no try/catch for malformed output. -\n**No temperature tuning:** Google explicitly recommends not setting`temperature`\n\n,`top_p`\n\n, or`top_k`\n\non the 3.5 family — the model handles sampling internally.\n\nFull repo at the end. Now the interesting part.\n\n## The three PRs\n\nI picked PRs with very different shapes to see how the model behaved across contexts.\n\n| PR | Type | Lines | Why |\n|---|---|---|---|\n|\n\n[fastify#6414](https://github.com/fastify/fastify/pull/6414)[express#6100](https://github.com/expressjs/express/pull/6100)###\n\nFinal scorecard\n\n```\nPR #1 (express#6190):    +0  −0   Model agreed: no issues\nPR #2 (fastify#6414):    +3  −0   3 hits, 0 hallucinations\nPR #3 (express#6100):    +0  −0   Model agreed: no issues\n──────────────────────────────────────────────────────────────\nTotal:                   +3  −0   Zero false positives.\n```\n\n## What it caught — the headline\n\nPR #2 is the one that mattered. Fastify pull #6414 rewrote the entire content-type parser to fix a security flaw (CVE-2026-25223) where attackers could bypass body validation by appending a tab character to `Content-Type`\n\n(e.g. `application/json\\tx`\n\n). The fix introduced a new `ContentType`\n\nclass and replaced the old loose string-matching logic.\n\nThis is exactly the kind of high-stakes, security-sensitive refactor where an automated reviewer either earns its place or doesn't.\n\nThe model flagged three issues. Here's each one, verified against the actual code.\n\n### Hit 1: inconsistent variable use in `existingParser`\n\nMEDIUM · logic— The`existingParser`\n\nmethod checks`contentType === \"application/json\"`\n\nand`this.customParsers.has(contentType)`\n\nusing the original`contentType`\n\nstring instead of the newly calculated, normalized`ct`\n\nvariable.\n\nLooking at the new code in `lib/content-type-parser.js`\n\n:\n\n```\nContentTypeParser.prototype.existingParser = function (contentType) {\n  if (typeof contentType === 'string') {\n    const ct = new ContentType(contentType).toString()\n    if (contentType === 'application/json' && this.customParsers.has(contentType)) {\n      return this.customParsers.get(ct).fn !== this[kDefaultJsonParse]\n    }\n    if (contentType === 'text/plain' && this.customParsers.has(contentType)) {\n      return this.customParsers.get(ct).fn !== defaultPlainTextParser\n    }\n  }\n  return this.hasParser(contentType)\n}\n```\n\nThe model is right. `ct`\n\nis the normalized version, but the conditional guards still test the raw `contentType`\n\n. Since `customParsers`\n\nonly holds normalized keys (see line 85: `this.customParsers.set(normalizedContentType, parser)`\n\n), any header with a different case or trailing parameters silently skips the fast path. Subtle, easy to miss in review.\n\n### Hit 2: a regex missing its end anchor\n\nHIGH · security— The`subtypeNameReg`\n\nregular expression is missing a trailing`$`\n\nanchor. Consequently, any string starting with a valid subtype will match successfully.\n\nThis one is the headline. In the **brand new file** `lib/content-type.js`\n\n, the patch defines two parallel regexes:\n\n``` js\nconst typeNameReg     = /^[\\w!#$%&'*+.^`|~-]+$/      // has $\nconst subtypeNameReg  = /^[\\w!#$%&'*+.^`|~-]+\\s*/    // no $\n```\n\nThe subtype regex anchors at the start but not at the end. Inputs like `application/json/extra`\n\npass the validation gate where they shouldn't. In a PR whose entire purpose is fixing a validation-bypass CVE, a senior reviewer would put this in red on the first pass. The model put it in HIGH on the first pass.\n\nI am not claiming this is itself exploitable at the same severity as the original CVE — the downstream parsers may not be reachable in a way that materializes the bug. But the pattern is exactly the class of issue that *did* materialize as CVE-2026-25223. Pattern-recognition of dangerous shapes is half of what code review is.\n\n### Hit 3: stateful global regex\n\nMEDIUM · bug— The`keyValuePairsReg`\n\nregex is defined globally with the`/g`\n\nflag. Because of this, it is stateful and relies on`lastIndex`\n\n. If parsing throws an exception or future modifications exit the loop early,`lastIndex`\n\nwill not reset to 0.\n\nConfirmed at the top of `lib/content-type.js`\n\n:\n\n``` js\nconst keyValuePairsReg = /([\\w!#$%&'*+.^`|~-]+)=([^;]*)/gm\n```\n\nUsed inside a class constructor with `.exec()`\n\nin a loop. In healthy execution, `lastIndex`\n\nresets to 0 when `exec`\n\nreturns `null`\n\n. But the failure mode — exception inside the loop body, or any future `break`\n\n— silently corrupts every subsequent parse for the lifetime of the process. The model's suggested fix (use `matchAll`\n\ninstead) is exactly the JavaScript-idiomatic answer.\n\nThis is a latent footgun, not a live bug. Severity MEDIUM is arguably high. But it's a real thing the model saw.\n\n## What it didn't catch — the honest part\n\nTwo failure modes worth being honest about.\n\n**Cross-file context.** The model only sees the diff. It can't tell whether a function called by the changed code is safe, whether a removed branch was load-bearing somewhere else, or whether tests actually cover the new behavior. For PR #6414 in particular, the upstream callers of the new `ContentType`\n\nclass are not in the diff, and the model never reasoned about them.\n\n**Severity calibration is rough.** The regex-without-anchor is HIGH. The stateful `/g`\n\nis MEDIUM. In practice, those probably want to swap — the regex one is a clear pattern with security relevance, the global-regex one is a latent footgun unlikely to fire. Junior-reviewer instincts.\n\nI also can't conclusively measure what the model missed without reviewing every comment thread on the PR by hand. The merged commit went through multiple rounds of feedback (commits like \"address feedback\", \"refactor algorithm\", \"appease coverage\"), so reviewers did catch things, but how many of those are in-diff issues a tool could have seen versus broader design decisions — I'd need another afternoon to know.\n\n## What I'd actually use this for\n\nThree takeaways after running this on real code:\n\n-\n**It earns a place as a first-layer pre-review.** Specifically: PRs that touch parsers, validators, or anything that consumes external input. The cost is around $0.003 per PR. The cost of*not*running it is shipping a regex without an anchor on a security-sensitive code path. -\n**It does not replace human reviewers.** It cannot reason about distributed state, concurrency, transactions, or anything that requires understanding multiple files in concert. -\n**Hallucination rate was zero in this sample**— but the sample is tiny. The literature on similar models suggests false positives in the 15-25% range on real-world PRs. Three out of three being valid is great but is not a benchmark.\n\nThe 80 lines of TypeScript that produced this run are on [GitHub](https://github.com/vicente-r-junior/gemini-code-review). Two things that are non-obvious about the setup:\n\n-\n`@google/genai`\n\nv2 uses`responseJsonSchema`\n\n, not`responseSchema`\n\n. Easy to get wrong if you're translating tutorial code from an older Gemini. - Public GitHub PRs expose a\n`.diff`\n\nendpoint that requires no auth. You don't need octokit for an MVP.\n\nIf you try it on PRs with shapes I didn't test — concurrency-heavy, multi-file, generated code — tell me what you find. The interesting question is where the model breaks, not where it works.\n\n*Built and tested in May 2026 with Gemini 3.5 Flash, GA two days before publication.*", "url": "https://wpnews.pro/news/i-gave-gemini-3-5-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same", "canonical_source": "https://dev.to/vicente_junior_dev/i-gave-gemini-35-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same-file-1g24", "published_at": "2026-05-22 22:40:14+00:00", "updated_at": "2026-05-22 23:02:10.772560+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools", "cybersecurity", "open-source"], "entities": ["Gemini 3.5 Flash", "Google", "Fastify", "Sundar Pichai", "TypeScript", "Node"], "alternates": {"html": "https://wpnews.pro/news/i-gave-gemini-3-5-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same", "markdown": "https://wpnews.pro/news/i-gave-gemini-3-5-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same.md", "text": "https://wpnews.pro/news/i-gave-gemini-3-5-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same.txt", "jsonld": "https://wpnews.pro/news/i-gave-gemini-3-5-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same.jsonld"}}