{"slug": "claude-vs-gemini-across-4-security-domains-a-dead-heat-and-the-hardening-63-of", "title": "Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips", "summary": "A developer's comparison of Claude and Gemini across four security domains found both AI models missed the same critical hardening steps, with 63% of 700 AI-generated functions shipping with a vulnerability. The test, using ESLint security plugins mapped to CWEs, showed a statistical dead heat between Gemini 2.5 Flash and Claude Sonnet 4.6, with one Gemini win, two ties, and one split. The most concerning finding was that both models omitted audience validation in JWT authentication middleware, a gap that typically survives human code review.", "body_md": "The interesting result isn't who won. It's that across four security domains, Claude and Gemini missed **the same hardening steps** — and if you've shipped AI-generated auth middleware this year, your code almost certainly has the same gaps, and your review didn't catch them either.\n\nFor the record, the scoreboard: **one Gemini win, two ties, one split — a statistical dead heat.** That's the last time the *winner* matters in this article.\n\nHere's the number that should bother you more than any leaderboard: across 700 AI-generated functions scored by the rules I'm about to use, **63% shipped a vulnerability**. So \"which model writes more secure code?\" is mostly the wrong question — I've [run that leaderboard myself](https://dev.to/ofri-peretz/we-ranked-5-ai-models-by-security-the-leaderboard-is-wrong-5a4o) and argued it's the wrong frame. But people keep asking it, so I ran it properly — on the ESLint security plugins I wrote specifically to catch these bugs, each mapped to a CWE — to show you what actually matters.\n\nFour domains, four of my plugins. For each, the *same* feature-only prompt (no \"make it secure\" hint — that's how people actually use these tools), generated once by **Gemini 2.5 Flash via the Gemini CLI** and once by **Claude Sonnet 4.6 via the Claude CLI**, then linted with the domain's plugin on `recommended`\n\n.\n\n*Method honesty: this is Gemini Flash vs Claude Sonnet — the comparable price/latency tier each vendor's CLI defaults to (Pro and Opus are a separate bracket; more on that below). It compares CLI tooling, system prompt included, not raw models under controlled decoding. n=1 per domain — but I re-ran the JWT round, and both models landed on 5 findings again with the same core misses, so treat these as directional with stable failure modes, not ±0 gospel.*\n\n| Domain | Prompt | Plugin | Gemini | Claude |\n|---|---|---|---|---|\nNestJS service |\nusers + auth + admin | `nestjs-security` |\n2 |\n6 |\nJWT auth |\nlogin + verify middleware | `jwt` |\n5 | 5 |\nMongoDB data layer |\nMongoose model + search | `mongodb-security` |\n8 | 8 |\nGeneral API (injection) |\nimport + search + reset | `secure-coding` |\n9 | 13* |\n\nOne Gemini win, two dead heats, one split. The frontier security gap is **smaller than the discourse suggests** — and the count is the least interesting number here.\n\n*Table legend below: ✗ = one violation of that rule, ✗✗ = two, ✗✗✗ = three, — = rule didn't fire (clean).*\n\nThe one clean win, [written up in full separately](https://dev.to/ofri-peretz/i-ran-the-same-nestjs-prompt-on-claude-and-gemini-one-got-6-security-errors-heres-what-both-1fnf). Short version: asked for a users service, Gemini's CLI reached for idiomatic NestJS — class-level `@UseGuards`\n\n, `@Exclude()`\n\non the password field, `class-validator`\n\non every DTO. `nestjs-security`\n\nfound **2** issues. Claude wrote functionally identical code with none of that scaffolding and drew **6**.\n\nIn an opinionated framework, Gemini defaults to the secure idiom. Hold that thought.\n\nBoth wrote clean `jsonwebtoken`\n\ncode: a signed login token, middleware that *verifies* (no `jwt.decode`\n\nshortcut, no `alg: none`\n\n, no hardcoded secret — every catastrophic JWT footgun avoided by both). Then both stopped at exactly the same place:\n\n`jwt` rule |\nCWE | Gemini | Claude |\n|---|---|---|---|\n`require-algorithm-whitelist` |\n\n`require-audience-validation`\n\n`require-issuer-validation`\n\n`require-max-age`\n\n`no-sensitive-payload`\n\nHere's *why it survives review*: a reviewer reading `jwt.verify(token, secret)`\n\nsees a verify call and ships it. Nobody asks the next question — verifies *for whom?* Without an `audience`\n\noption, a token your service minted for a *different* API sails straight through. That blind spot is exactly what `require-audience-validation`\n\nencodes, and it's why both models — and most human review — walk past it. Call the round 5–5.\n\nThe finding that should make you check your own repo first: both models wrote the search to return **whole documents — password hashes included — with no projection**.\n\n``` js\n// Both models, essentially:\nconst results = await User.find(filter);   // ships passwordHash to the caller\n// the fix neither wrote:\nconst results = await User.find(filter).select('-passwordHash').lean();\n```\n\nThat's `require-projection`\n\n(CWE-200) and `no-select-sensitive-fields`\n\nfiring on both sides. The pleasant surprise: the prompt hands a user-supplied search object straight into a Mongoose query — a textbook `$where`\n\n/operator-injection trap — and **both models sidestepped it.** Zero `no-operator-injection`\n\n, zero `no-unsafe-where`\n\n, zero `no-unsafe-query`\n\non either side. The frontier has internalized \"don't interpolate untrusted input into a query.\" It just hasn't internalized \"don't hand back the password column.\"\n\n`mongodb-security` rule |\nCWE | Gemini | Claude |\n|---|---|---|---|\n`require-schema-validation` |\nCWE-20 | ✗✗✗ | ✗ |\n`require-projection` |\n\n`require-lean-queries`\n\n`no-select-sensitive-fields`\n\n`no-unbounded-find`\n\n`no-bypass-middleware`\n\nDifferent distribution, same total (8–8) — but one cell deserves an honest call-out, because it cuts *against* my own headline: `require-schema-validation`\n\nfired **three times on Gemini and once on Claude**. Here, Claude was the more disciplined one — it wired up more of Mongoose's schema-level validation, where Gemini leaned on looser typing. \"Gemini is frontier-grade\" doesn't mean \"Gemini wins every cell\"; this is a cell it lost. (And yes, `require-lean-queries`\n\nis CWE-400, not classic injection — `.lean()`\n\nreturns plain objects instead of hydrated Mongoose documents, and on an unbounded search that's a real memory-exhaustion lever, which is why it's scored as a resource control, not a nice-to-have.)\n\n*The asterisk. On a raw injection-prone API (JSON/XML import, dynamic search, password reset), `secure-coding`\n\nflagged Gemini **9** and Claude **13** — but that count is backwards. Claude's extra findings came from Claude *doing more*: it explicitly rejected XML `DOCTYPE`\n\n/`ENTITY`\n\n(XXE-hardened), allowlisted the search field, and actually implemented token verification. And here's the honest part — it implemented some of that *insecurely*:\n\n```\n// Claude's reset flow — CWE-208, timing-unsafe:\nif (providedToken === storedToken) { /* ...reset... */ }\n\n// The fix — hash both to a fixed length first, then compare:\nimport { createHash, timingSafeEqual } from 'crypto';\nconst hash = (s: string) => createHash('sha256').update(s).digest();\nif (timingSafeEqual(hash(providedToken), hash(storedToken))) { /* ...reset... */ }\n// Direct timingSafeEqual(Buffer.from(a), Buffer.from(b)) throws if lengths differ,\n// leaking token length to an attacker — always normalise lengths first.\n```\n\nClaude wrote that `===`\n\ncomparison **five times** (`no-insecure-comparison`\n\n, CWE-208). It's the one *real* vulnerability either model introduced across this entire benchmark — and it exists precisely *because* Claude built the verification surface at all. Gemini's leaner 97 lines issued a token and never compared one, so it had no surface to get wrong. Count favored Gemini; substance is genuinely mixed: Claude hardened more **and** shipped the only real bug.\n\nBefore anyone screenshots \"Gemini ties Claude on security\" — that holds for *realistic, structured* tasks. On **isolated, security-sensitive functions** it inverts. In a [separate 700-function run](https://dev.to/ofri-peretz/aggregate-benchmarks-lie-heres-what-700-ai-functions-look-like-by-security-domain-1hgj) scored by these same plugins, the average vulnerability rate was **63%** — and **Gemini 2.5 Pro was the most vulnerable model at 72.9%** (Flash sat mid-pack at 63.6%). Build a\n\n(The whole method rests on \"scored by the plugins I wrote,\" so a fair question is whether the *scorer* is trustworthy — [here's what ground truth caught that my own unit tests missed](https://dev.to/ofri-peretz/what-ground-truth-caught-that-unit-tests-missed-3-real-bugs-in-9-flagship-lint-rules-o0b).)\n\nStrip out the leaderboard and two things are left:\n\n`alg: none`\n\n, no `jwt.decode`\n\n-without-verify, no `eval`\n\n, no hardcoded credentials, in any domain. (The lone `aud`\n\n/`iss`\n\nvalidation — is the one most appsec engineers would patch first. \"Hardening\" undersells it; I'm flagging it as the missing control, not as harmless.) If you're building with Gemini, you're starting from a credible security baseline.Which is the whole point of static analysis: it asks the questions your prompt didn't.\n\n``` python\n// eslint.config.mjs\nimport jwt from 'eslint-plugin-jwt';\nimport mongodbSecurity from 'eslint-plugin-mongodb-security';\nimport nestjsSecurity from 'eslint-plugin-nestjs-security';\nimport secureCoding from 'eslint-plugin-secure-coding';\nimport tsParser from '@typescript-eslint/parser';\n\nexport default [\n  // TypeScript parser so decorators and types resolve\n  { files: ['**/*.ts'], languageOptions: { parser: tsParser } },\n  // Each plugin ships a flat `recommended` preset (plugin + rules)\n  jwt.configs.recommended,\n  mongodbSecurity.configs.recommended,\n  nestjsSecurity.configs.recommended,\n  secureCoding.configs.recommended,\n];\nnpm install --save-dev eslint-plugin-jwt eslint-plugin-mongodb-security \\\n  eslint-plugin-nestjs-security eslint-plugin-secure-coding\nnpx eslint src/\n```\n\nEvery rule maps to a CWE so an AI agent and a human read the same signal. Full docs at [eslint.interlace.tools](https://eslint.interlace.tools).\n\nWhich hardening step does *your* AI-generated code skip most — the algorithm allowlist, the audience check, or the query projection? Open the file and look. I'll bet it's at least two of the three. Tell me which ones — I'm collecting scorecards.\n\n*Part of the AI Security Benchmark Series:*\n\n📦 [ eslint-plugin-jwt](https://www.npmjs.com/package/eslint-plugin-jwt) ·\n\n`eslint-plugin-mongodb-security`\n\n`eslint-plugin-nestjs-security`\n\n`eslint-plugin-secure-coding`\n\n[GitHub](https://github.com/ofri-peretz) | [X](https://x.com/ofriperetzdev) | [LinkedIn](https://linkedin.com/in/ofri-peretz) | [Dev.to](https://dev.to/ofri-peretz) | [ofriperetz.dev](https://ofriperetz.dev)\n\n👇 **Drop your scorecard below** — algorithm allowlist, audience check, or query projection: which one does your AI-generated code skip? I'm collecting them.", "url": "https://wpnews.pro/news/claude-vs-gemini-across-4-security-domains-a-dead-heat-and-the-hardening-63-of", "canonical_source": "https://dev.to/ofri-peretz/claude-vs-gemini-across-4-security-domains-a-dead-heat-and-the-hardening-63-of-ai-code-skips-mpp", "published_at": "2026-05-31 03:39:06+00:00", "updated_at": "2026-05-31 04:12:00.667870+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-safety", "ai-research", "ai-tools"], "entities": ["Claude", "Gemini", "ESLint", "CWE", "Gemini 2.5 Flash", "Claude Sonnet 4.6", "Gemini CLI", "Claude CLI"], "alternates": {"html": "https://wpnews.pro/news/claude-vs-gemini-across-4-security-domains-a-dead-heat-and-the-hardening-63-of", "markdown": "https://wpnews.pro/news/claude-vs-gemini-across-4-security-domains-a-dead-heat-and-the-hardening-63-of.md", "text": "https://wpnews.pro/news/claude-vs-gemini-across-4-security-domains-a-dead-heat-and-the-hardening-63-of.txt", "jsonld": "https://wpnews.pro/news/claude-vs-gemini-across-4-security-domains-a-dead-heat-and-the-hardening-63-of.jsonld"}}