{"slug": "llm-agents-are-now-finding-zero-days-how-ai-is-autonomously-rewriting-the-rules", "title": "LLM Agents Are Now Finding Zero-Days: How AI is Autonomously Rewriting the Rules of Vulnerability Research", "summary": "LLM agents are now autonomously hunting zero-day vulnerabilities at massive scale, with Anthropic's Claude Mythos Preview finding over 10,000 critical or high-severity CVEs in under a month. In a landmark achievement, Apple credited Calif.io in collaboration with Claude and Anthropic Research for discovering CVE-2026-28952, a kernel-level privilege escalation vulnerability in macOS Tahoe 26.5 that allows arbitrary apps to gain root access. Unlike traditional scanners that match patterns, these AI agents reason about programmer intent versus actual behavior, chaining multiple low-severity bugs into high-severity exploit chains that human security teams had missed.", "body_md": "💡\n\nTL;DR:LLM agents like Claude Mythos Preview and GPT-5.5 are now autonomously hunting zero-days at massive scale — 10,000+ critical CVEs found in weeks. This post breaks down the agentic harness architecture, real-world results, and gives you runnable code to deploy your own AI security pipeline today.\n\n*Published: May 26, 2026 · ⏱️ 18 min read · Tags: security, llm, ai-agents, vulnerability-research, devops, cybersecurity*\n\nOn May 11, 2026 — just days ago — Apple published its security advisory for macOS Tahoe 26.5. Tucked among dozens of credited human researchers was one unusual line:\n\nCVE-2026-28952— An integer overflow addressed with improved input validation. Impact: An app may be able to gain root privileges.\n\nDiscovered by: Calif.io in collaboration with Claude and Anthropic Research.\n\nRead that again. A kernel-level privilege escalation vulnerability — the kind that allows arbitrary apps to gain root access on macOS — was credited to an **AI model**.\n\nThis wasn't a toy benchmark or a controlled research sandbox. This was a real CVE, now patched and assigned by Apple, found in critical kernel code by a large language model operating as an autonomous security research agent. The same week, Anthropic's Project Glasswing announced that Claude Mythos Preview had found over **10,000 critical or high-severity vulnerabilities** across the world's most systemically important software in under a month.\n\nIf you're a security engineer, a platform developer, or anyone who ships software that other people depend on — this changes your threat model. Permanently. This post breaks down exactly what happened, how these LLM vulnerability research agents work under the hood, and what you need to do about it right now.\n\nBefore LLMs, automated vulnerability detection fell into well-understood categories:\n\nLLM vulnerability research is none of these — and all of them at once.\n\nWhat makes frontier LLMs different is **contextual reasoning at scale**. A traditional SAST scanner matches patterns. An LLM *understands* what the code is *trying* to do, can reason about multi-file call graphs, can hypothesize about trust boundaries, and can generate the proof that a bug is exploitable — all in a single reasoning pass.\n\nThe key insight that the research community has arrived at in 2026 is this: **LLMs don't just find bugs by recognizing patterns. They find bugs by understanding programmer intent vs. actual behavior — and finding where those diverge.**\n\nA 20-year-old XSLT bug in Firefox wasn't missed by fuzzers because the input space wasn't covered. It was missed because understanding the bug required knowing that `reentrant key() calls cause a hash table rehash that frees its backing store while a raw entry pointer is still in use`\n\n— a multi-step logical chain that requires *semantic* understanding of the codebase's memory model. Claude Mythos found it.\n\nThis is the paradigm shift. We're no longer talking about automated scanners. We're talking about **AI agents that reason like senior security researchers**.\n\nCloudflare's security team spent weeks with Mythos Preview on their own infrastructure, and their writeup identified two capabilities that distinguish it from all prior tooling:\n\nReal exploits rarely use a single vulnerability. They chain multiple primitives together — a use-after-free (UAF) becomes an arbitrary read/write primitive, which enables control-flow hijacking, which enables a full sandbox escape. Each step is individually low-severity; together they're critical.\n\nTraditional scanners report bugs in isolation. Mythos Preview **reasons about how to chain them**. Given a set of identified primitives, it evaluates:\n\nCloudflare observed the model taking bugs that would normally sit ignored in a low-severity backlog and constructing high-severity exploit chains that their own security team hadn't considered. This isn't just vulnerability *finding* — it's vulnerability *weaponization*, in service of defenders understanding true risk.\n\nFinding a bug and *proving* it's exploitable are two very different things. Mythos Preview closes this gap with an autonomous PoC generation loop:\n\nThis loop runs autonomously. Cloudflare described watching the model read compiler errors, adjust its exploit logic, and retry — behavior that previously required a human researcher sitting at a terminal. The result is a **finding backed by a working proof of concept**, not a speculative observation hedged with \"might\" and \"potentially.\"\n\nThe numbers from Project Glasswing's first month are genuinely staggering:\n\n| Organization | Bugs Found | Severity | Notes |\n|---|---|---|---|\nProject Glasswing Partners (~50 orgs) |\n10,000+ |\nCritical/High | Collectively across critical infrastructure |\nCloudflare |\n2,000 |\n400 Critical/High | Scanned 50+ internal repos |\nMozilla Firefox |\n271 |\nMixed | 10x more than Firefox 148 with Opus 4.6 |\nOpen Source Projects (1,000+) |\n6,202 (high/critical est.) |\nHigh/Critical | 90.6% true-positive rate after triage |\nPalo Alto Networks |\n5x normal patch volume |\n— | Accelerated release cadence |\n\nMozilla's Hacks blog published their harness methodology and even disclosed specific bug IDs — an unusual level of transparency that gives us a rare window into what AI-found bugs actually look like in practice. A few highlights:\n\n`<legend>`\n\nHTML element triggered by an intricate orchestration of recursion stack depth limits, expando properties, and cycle collection across distant parts of the browser.These aren't simple buffer overflows. These are **complex, multi-system, architecture-aware bugs** that require deep understanding of browser internals. Fuzzers, which work by exploring input space, simply can't reason about the semantic relationships between components that make these bugs possible.\n\nOne important caveat: early LLM-based security scanning (2024–early 2025 era models) was plagued by **AI-generated slop bug reports** — plausible-sounding but entirely wrong findings that wasted maintainer time. Several open-source projects created policies explicitly rejecting AI-generated issues.\n\nMythos Preview represents a step-change improvement here. Cloudflare reported that the model's output had **noticeably higher quality: fewer hedged findings, clearer reproduction steps, and less work to reach a fix-or-dismiss decision.** Critically, findings backed by a working PoC have a false-positive rate that approaches zero by definition — if the exploit runs and produces the expected output, the bug is real.\n\nThe key lesson from all successful deployments is this: **naïvely pointing an LLM at a repository and asking \"find bugs\" doesn't work well.** The quality of results scales dramatically with the sophistication of the harness around the model. Here's the architecture that state-of-the-art practitioners are converging on:\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                    SECURITY AGENT PIPELINE                  │\n├─────────────────────────────────────────────────────────────┤\n│  1. THREAT MODELER          │  Maps codebase, identifies    │\n│     (LLM + static analysis) │  attack surfaces, prioritizes │\n├─────────────────────────────────────────────────────────────┤\n│  2. SCANNER ORCHESTRATOR    │  Spins up parallel sub-agents │\n│     (Agent coordinator)     │  per module/subsystem         │\n├─────────────────────────────────────────────────────────────┤\n│  3. VULN DETECTOR           │  Per-file/function analysis   │\n│     (LLM sub-agent)         │  with semantic reasoning      │\n├─────────────────────────────────────────────────────────────┤\n│  4. EXPLOIT SYNTHESIZER     │  Generates PoC code,          │\n│     (LLM + code executor)   │  compiles, and runs in sandbox│\n├─────────────────────────────────────────────────────────────┤\n│  5. TRIAGE ENGINE           │  Multi-model consensus,       │\n│     (Ensemble of models)    │  severity rating, dedup       │\n├─────────────────────────────────────────────────────────────┤\n│  6. REPORT GENERATOR        │  CVE-formatted output,        │\n│     (LLM)                   │  fix suggestions, CVSS scoring│\n└─────────────────────────────────────────────────────────────┘\n```\n\nThe single biggest productivity multiplier is spending compute on **threat modeling before scanning**. Ask the LLM to:\n\nThis turns unfocused scanning into targeted analysis. Mozilla's team found this dramatically improved signal-to-noise: instead of 10,000 low-confidence findings across the whole codebase, they got 500 high-confidence findings in the highest-risk subsystems.\n\nEach high-priority module gets its own sub-agent instance with:\n\n``` python\n# Simplified sub-agent invocation pattern\nasync def scan_module(module_path: str, context: SecurityContext) -> list[Finding]:\n    \"\"\"\n    Launch a sandboxed LLM sub-agent to analyze a single module.\n    Returns structured findings with severity, description, and PoC.\n    \"\"\"\n    system_prompt = build_security_analyst_prompt(\n        language=context.language,\n        vulnerability_classes=context.priority_vuln_classes,\n        trust_model=context.trust_model,\n        output_schema=FindingSchema\n    )\n\n    file_content = load_with_dependencies(module_path, context.repo_root)\n\n    findings = await llm_client.chat(\n        model=\"claude-opus-4-7\",           # or gpt-5.5 for high-value targets\n        system=system_prompt,\n        messages=[{\n            \"role\": \"user\",\n            \"content\": f\"Analyze this module for security vulnerabilities:\\n\\n{file_content}\"\n        }],\n        response_schema=list[Finding],      # structured output enforces quality\n        max_tokens=8192,\n        timeout=120\n    )\n\n    return findings\n```\n\nThis is where the magic happens — and where the false-positive rate collapses:\n\n``` php\nasync def validate_finding(finding: Finding, sandbox: SandboxEnv) -> ValidatedFinding:\n    \"\"\"\n    Attempt to generate and run a PoC for a suspected vulnerability.\n    A finding backed by a working PoC has effectively 0% false positive rate.\n    \"\"\"\n    max_iterations = 5\n\n    for attempt in range(max_iterations):\n        # Step 1: Synthesize PoC code\n        poc_code = await llm_client.chat(\n            model=\"claude-opus-4-7\",\n            messages=[{\n                \"role\": \"user\", \n                \"content\": f\"\"\"\n                Write a minimal proof-of-concept that triggers this vulnerability:\n\n                Finding: {finding.description}\n                Affected code: {finding.code_snippet}\n                Expected behavior: {finding.expected_trigger}\n\n                Write executable {finding.language} code only. No explanations.\n                \"\"\"\n            }]\n        )\n\n        # Step 2: Execute in isolated sandbox\n        result = await sandbox.execute(\n            code=poc_code,\n            language=finding.language,\n            timeout=30,\n            memory_limit=\"512mb\"\n        )\n\n        # Step 3: Did it trigger the expected vulnerability?\n        if result.crashed and matches_expected_behavior(result, finding):\n            return ValidatedFinding(\n                finding=finding,\n                poc_code=poc_code,\n                execution_result=result,\n                confidence=\"HIGH\",\n                false_positive=False\n            )\n\n        # Step 4: Iterate — feed failure back to model\n        finding = await refine_hypothesis(finding, result, llm_client)\n\n    # Couldn't reproduce after max_iterations — flag as unconfirmed\n    return ValidatedFinding(finding=finding, confidence=\"LOW\", false_positive=True)\n```\n\nOne of the most powerful techniques for reducing false positives — borrowed from [Milvus's research on AI code review](https://milvus.io/blog/ai-code-review-gets-better-when-models-debate-claude-vs-gemini-vs-codex-vs-qwen-vs-minimax.md) — is **running multiple independent models and requiring consensus**. A finding reported by Claude Opus, GPT-5.5, *and* Gemini independently is orders of magnitude more likely to be real than one reported by a single model.\n\n``` python\nasync def triage_with_consensus(\n    finding: Finding,\n    models: list[str] = [\"claude-opus-4-7\", \"gpt-5.5\", \"gemini-2.5-pro\"]\n) -> ConsensusResult:\n    \"\"\"\n    Submit a finding to multiple models for independent verification.\n    Require 2/3 agreement to advance to human review queue.\n    \"\"\"\n    verdicts = await asyncio.gather(*[\n        verify_finding_with_model(finding, model) \n        for model in models\n    ])\n\n    confirmed_count = sum(1 for v in verdicts if v.is_valid)\n\n    return ConsensusResult(\n        finding=finding,\n        verdicts=verdicts,\n        consensus_reached=confirmed_count >= 2,\n        confidence_score=confirmed_count / len(models),\n        advance_to_human_review=confirmed_count >= 2\n    )\n```\n\nAs of May 2026, two models dominate the LLM vulnerability research space. Here's how they compare based on independent benchmarks and real-world deployments:\n\n| Capability | Claude Mythos Preview | GPT-5.5 |\n|---|---|---|\nAvailability |\nRestricted (Project Glasswing / Enterprise) | Generally available |\nVulnerability Miss Rate |\n~5-8% (est.) |\n10% (XBOW benchmark) |\nBlack-box performance |\nExcellent | Excellent — outperforms GPT-5 with source code |\nWhite-box performance |\nBest-in-class | \"Effectively killed\" XBOW's benchmark |\nExploit chain construction |\n✅ Core capability | ✅ Strong |\nPoC generation |\n✅ Autonomous loop | ✅ Strong |\nPersist vs. pivot decision-making |\nStrong | Improved (50% fewer bad persist decisions vs. GPT-5.4) |\nConsistency/guardrails |\nInconsistent organic refusals | More consistent behavior |\nToken efficiency |\n\"Absolutely unprecedented precision\" (XBOW) | Good |\n\n**The key practical difference today:** Claude Mythos Preview is not publicly available — it's restricted to Project Glasswing partners and enterprise security teams with a verified use case. GPT-5.5 is generally available and, per XBOW's benchmarks, delivers Mythos-class performance in white-box scenarios.\n\nFor most security teams *today*, **GPT-5.5 in a well-architected harness is the path to production**. If your organization qualifies for Anthropic's Cyber Verification Program or Claude Security enterprise beta, Mythos-class capabilities are accessible via Claude Opus 4.7 as well.\n\nHere's the uncomfortable truth that Project Glasswing has surfaced for the entire software industry:\n\n**AI has solved the hard part. The bottleneck is now entirely human.**\n\nFor decades, the security community's limiting factor was *finding* vulnerabilities — it required expensive, senior human expertise and took weeks per codebase. That constraint has evaporated. Mythos Preview is finding critical bugs faster than any team of human researchers could. The new constraint is triage, disclosure, patch development, and deployment.\n\nSome maintainers in Project Glasswing's open-source scanning initiative have **asked Anthropic to slow down disclosure** because they can't keep up. That's an extraordinary sentence. A world-class AI is producing so much valid, actionable security research that human maintainers are begging it to stop.\n\nThe downstream implications for your engineering organization:\n\n**Shorten patch cycles aggressively.** The 90-day standard disclosure window was designed for the old world. As AI-found bugs become public CVEs faster, the exploitation window is compressing.\n\n**Invest in automated patch generation pipelines.** Claude Security (now in public beta for Enterprise) can generate proposed fixes, not just identify bugs. This is the next frontier for reducing the triage burden.\n\n**Memory-safe languages matter more than ever.** Both Cloudflare and Mozilla's data confirm significantly higher false-positive rates and more severe findings in C/C++ codebases vs. Rust or Go. The ROI on memory-safe rewrites just got a lot more concrete.\n\n**Staged rollout policies are critical.** With AI accelerating both attack and defense, end users need to be able to receive patches faster. Frictionless update mechanisms aren't just a UX concern — they're a security posture.\n\nYou don't need access to Mythos Preview to start today. Here's a practical, production-ready approach using generally available models:\n\n``` bash\n#!/usr/bin/env python3\n\"\"\"\nminimal_vuln_scanner.py\nA basic LLM-powered vulnerability scanner for CI/CD integration.\nRequires: anthropic>=0.30.0, pip install anthropic\n\"\"\"\n\nimport asyncio\nimport json\nfrom pathlib import Path\nfrom anthropic import AsyncAnthropic\n\nclient = AsyncAnthropic()\n\nSECURITY_SYSTEM_PROMPT = \"\"\"You are an expert security researcher performing a \nwhite-box vulnerability audit. Analyze the provided code for:\n\n1. Memory safety issues (buffer overflows, UAF, null deref — especially in C/C++)\n2. Injection vulnerabilities (SQL, command, LDAP, path traversal)  \n3. Authentication/authorization bypasses\n4. Race conditions and TOCTOU bugs\n5. Cryptographic weaknesses\n6. Unsafe deserialization\n7. Integer overflow/underflow conditions\n8. Logic bugs affecting security-critical code paths\n\nFor each finding, provide:\n- Vulnerability class (CWE ID if applicable)\n- Severity (Critical/High/Medium/Low)\n- Affected code location (file:line)\n- Root cause explanation (2-3 sentences)\n- Proof-of-concept trigger (how would an attacker trigger this?)\n- Recommended fix\n\nReturn your response as a JSON array of findings. If no vulnerabilities are found,\nreturn an empty array []. Do NOT speculate — only report findings you are confident about.\"\"\"\n\nasync def scan_file(file_path: Path) -> list[dict]:\n    \"\"\"Scan a single file for vulnerabilities using Claude.\"\"\"\n\n    content = file_path.read_text(errors='replace')\n\n    # Skip files that are too short to be meaningful\n    if len(content.strip()) < 50:\n        return []\n\n    message = await client.messages.create(\n        model=\"claude-opus-4-5\",  # Use claude-opus-4-7 for higher accuracy\n        max_tokens=4096,\n        system=SECURITY_SYSTEM_PROMPT,\n        messages=[{\n            \"role\": \"user\",\n            \"content\": f\"File: {file_path}\\n\\n```\n{% endraw %}\n\\n{content[:50000]}\\n\n{% raw %}\n```\"\n            # Truncate to 50k chars; for large files, chunk by function\n        }]\n    )\n\n    response_text = message.content[0].text.strip()\n\n    try:\n        # Extract JSON array from response\n        start = response_text.find('[')\n        end = response_text.rfind(']') + 1\n        if start != -1 and end > start:\n            findings = json.loads(response_text[start:end])\n            # Annotate each finding with source file\n            for f in findings:\n                f['source_file'] = str(file_path)\n            return findings\n    except json.JSONDecodeError:\n        pass\n\n    return []\n\nasync def scan_repository(repo_path: str, extensions: list[str] = None) -> dict:\n    \"\"\"\n    Scan an entire repository for vulnerabilities.\n\n    Args:\n        repo_path: Path to the repository root\n        extensions: File extensions to scan (default: common security-relevant types)\n\n    Returns:\n        Dict with findings grouped by severity\n    \"\"\"\n    if extensions is None:\n        extensions = ['.c', '.cpp', '.h', '.py', '.js', '.ts', '.go', '.rs', '.java']\n\n    repo = Path(repo_path)\n    files_to_scan = [\n        f for f in repo.rglob('*')\n        if f.suffix in extensions\n        and '.git' not in f.parts\n        and 'node_modules' not in f.parts\n        and 'vendor' not in f.parts\n    ]\n\n    print(f\"[*] Scanning {len(files_to_scan)} files in {repo_path}\")\n\n    # Scan files concurrently (respect API rate limits)\n    semaphore = asyncio.Semaphore(5)  # Max 5 concurrent API calls\n\n    async def scan_with_limit(f):\n        async with semaphore:\n            print(f\"    Scanning: {f.relative_to(repo)}\")\n            return await scan_file(f)\n\n    all_results = await asyncio.gather(*[scan_with_limit(f) for f in files_to_scan])\n\n    # Flatten and group by severity\n    all_findings = [f for sublist in all_results for f in sublist]\n\n    grouped = {\n        'critical': [f for f in all_findings if f.get('severity', '').lower() == 'critical'],\n        'high':     [f for f in all_findings if f.get('severity', '').lower() == 'high'],\n        'medium':   [f for f in all_findings if f.get('severity', '').lower() == 'medium'],\n        'low':      [f for f in all_findings if f.get('severity', '').lower() == 'low'],\n    }\n\n    return grouped\n\nasync def main():\n    import sys\n    repo_path = sys.argv[1] if len(sys.argv) > 1 else '.'\n\n    results = await scan_repository(repo_path)\n\n    total = sum(len(v) for v in results.values())\n    print(f\"\\n{'='*60}\")\n    print(f\"SCAN COMPLETE — {total} findings\")\n    print(f\"{'='*60}\")\n    print(f\"  🔴 Critical: {len(results['critical'])}\")\n    print(f\"  🟠 High:     {len(results['high'])}\")\n    print(f\"  🟡 Medium:   {len(results['medium'])}\")\n    print(f\"  🟢 Low:      {len(results['low'])}\")\n    print(f\"{'='*60}\\n\")\n\n    # Print critical and high findings in detail\n    for severity in ['critical', 'high']:\n        for finding in results[severity]:\n            print(f\"[{finding['severity'].upper()}] {finding.get('vulnerability_class', 'Unknown')}\")\n            print(f\"  File: {finding.get('source_file')}\")\n            print(f\"  {finding.get('root_cause', 'No description')}\")\n            print(f\"  Fix: {finding.get('recommended_fix', 'See full report')}\\n\")\n\n    # Save full report\n    with open('security_report.json', 'w') as f:\n        json.dump(results, f, indent=2)\n    print(\"[*] Full report saved to security_report.json\")\n\nif __name__ == '__main__':\n    asyncio.run(main())\n# .github/workflows/ai-security-scan.yml\nname: AI Security Scan\n\non:\n  pull_request:\n    types: [opened, synchronize]\n  schedule:\n    - cron: '0 2 * * 1'  # Weekly full scan every Monday at 2am\n\njobs:\n  llm-vuln-scan:\n    runs-on: ubuntu-latest\n    permissions:\n      pull-requests: write\n      security-events: write\n\n    steps:\n      - uses: actions/checkout@v4\n        with:\n          fetch-depth: 0  # Full history for diff-based scanning on PRs\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.12'\n\n      - name: Install dependencies\n        run: pip install anthropic>=0.30.0\n\n      - name: Run AI Security Scanner\n        env:\n          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}\n        run: |\n          # On PRs: scan only changed files for speed\n          if [ \"${{ github.event_name }}\" = \"pull_request\" ]; then\n            git diff --name-only origin/${{ github.base_ref }}...HEAD > changed_files.txt\n            python minimal_vuln_scanner.py --files-list changed_files.txt\n          else\n            # On scheduled run: full repository scan\n            python minimal_vuln_scanner.py .\n          fi\n\n      - name: Check for Critical Findings\n        run: |\n          CRITICAL_COUNT=$(python -c \"\n          import json\n          with open('security_report.json') as f:\n              report = json.load(f)\n          print(len(report.get('critical', [])))\n          \")\n          echo \"Critical findings: $CRITICAL_COUNT\"\n          # Fail the build on critical findings\n          if [ \"$CRITICAL_COUNT\" -gt \"0\" ]; then\n            echo \"::error::$CRITICAL_COUNT critical security vulnerabilities found!\"\n            exit 1\n          fi\n\n      - name: Post PR Comment with Findings\n        if: github.event_name == 'pull_request'\n        uses: actions/github-script@v7\n        with:\n          script: |\n            const fs = require('fs');\n            const report = JSON.parse(fs.readFileSync('security_report.json'));\n            const total = Object.values(report).flat().length;\n\n            const body = `## 🔍 AI Security Scan Results\n\n            | Severity | Count |\n            |---|---|\n            | 🔴 Critical | ${report.critical?.length || 0} |\n            | 🟠 High | ${report.high?.length || 0} |\n            | 🟡 Medium | ${report.medium?.length || 0} |\n            | 🟢 Low | ${report.low?.length || 0} |\n\n            ${total === 0 ? '✅ No vulnerabilities found!' : '⚠️ Review findings in the security_report.json artifact.'}`;\n\n            github.rest.issues.createComment({\n              issue_number: context.issue.number,\n              owner: context.repo.owner,\n              repo: context.repo.repo,\n              body: body\n            });\n```\n\nFor high-value codebases, the production-grade approach is multi-model consensus to approach near-zero false-positive rates:\n\n```\n# multi_model_consensus.py\n# Run findings through multiple models; only surface results where ≥2 agree.\n# Requires ANTHROPIC_API_KEY and OPENAI_API_KEY env vars.\n\nimport asyncio\nimport json\nfrom anthropic import AsyncAnthropic\nfrom openai import AsyncOpenAI\n\nanthropic_client = AsyncAnthropic()\nopenai_client = AsyncOpenAI()\n\nVERIFICATION_PROMPT = \"\"\"You are an expert security researcher verifying whether\na reported vulnerability is real or a false positive.\n\nGiven the following finding and source code, answer:\n1. Is this vulnerability real? (yes/no/uncertain)\n2. If real: can an attacker trigger it from an untrusted context? (yes/no/uncertain)\n3. Confidence: (high/medium/low)\n\nRespond in JSON: {\"is_real\": bool, \"triggerable\": bool, \"confidence\": \"high\"|\"medium\"|\"low\", \"reasoning\": \"one sentence\"}\"\"\"\n\nasync def verify_with_claude(finding: dict, source_code: str) -> dict:\n    msg = await anthropic_client.messages.create(\n        model=\"claude-opus-4-5\",\n        max_tokens=512,\n        system=VERIFICATION_PROMPT,\n        messages=[{\"role\": \"user\", \"content\": f\"Finding:\\n{json.dumps(finding)}\\n\\nCode:\\n{source_code}\"}]\n    )\n    return json.loads(msg.content[0].text)\n\nasync def verify_with_gpt(finding: dict, source_code: str) -> dict:\n    resp = await openai_client.chat.completions.create(\n        model=\"gpt-4.1\",\n        messages=[\n            {\"role\": \"system\", \"content\": VERIFICATION_PROMPT},\n            {\"role\": \"user\", \"content\": f\"Finding:\\n{json.dumps(finding)}\\n\\nCode:\\n{source_code}\"}\n        ],\n        max_tokens=512,\n        response_format={\"type\": \"json_object\"}\n    )\n    return json.loads(resp.choices[0].message.content)\n\nasync def consensus_verify(finding: dict, source_code: str) -> dict:\n    \"\"\"Verify a finding with multiple models; return consensus result.\"\"\"\n    claude_result, gpt_result = await asyncio.gather(\n        verify_with_claude(finding, source_code),\n        verify_with_gpt(finding, source_code)\n    )\n\n    # Require both to agree the finding is real\n    both_confirm = claude_result.get('is_real') and gpt_result.get('is_real')\n\n    return {\n        \"finding\": finding,\n        \"consensus\": both_confirm,\n        \"claude_verdict\": claude_result,\n        \"gpt_verdict\": gpt_result,\n        \"advance_to_human_review\": both_confirm,\n        \"false_positive_probability\": \"low\" if both_confirm else \"high\"\n    }\n```\n\nIt would be irresponsible to discuss this technology without addressing the elephant in the room: **the same capability that finds bugs for defenders can be used by attackers**.\n\nAnthropic has been explicit about this tension. From their Glasswing update:\n\n\"Models as capable as Mythos Preview will soon be developed by many different AI companies. At present, no company — including Anthropic — has developed safeguards strong enough to prevent such models from being misused.\"\n\nThis is why Mythos Preview is not publicly released. But it's also why this matters: **the capability genie is not going back in the bottle**. The question isn't whether powerful AI vulnerability research tools will exist — they will. The question is whether defenders can gain and hold an asymmetric advantage before those tools proliferate to malicious actors.\n\nKey ethical considerations for engineers building in this space:\n\n**Responsible disclosure, always.** AI is going to accelerate vulnerability discovery dramatically. The 90-day disclosure standard exists for good reason — it gives end users time to patch. Don't let the excitement of AI-found bugs shortcut this process.\n\n**Scope your harness carefully.** Ensure your scanning pipeline only touches infrastructure you own or have explicit written authorization to test. The fact that a tool is effective doesn't change the legal and ethical requirements for authorization.\n\n**Verify before you disclose.** Submit only confirmed, PoC-backed findings to maintainers. The open-source community is already overwhelmed by low-quality AI-generated bug reports. Be part of the solution, not the problem.\n\n**Watch for model inconsistency.** Cloudflare's team documented that Mythos Preview's organic guardrails are inconsistent — the same task framed differently could produce completely different refusal behavior. Don't treat model-level safeguards as a substitute for process-level controls.\n\nBased on the current trajectory, here's what the next 12–18 months look like for LLM vulnerability research:\n\n**Near-term (3–6 months):**\n\n**Medium-term (6–18 months):**\n\n**Long-term:**\n\nWe are living through a genuine phase transition in software security. The tools that found a kernel CVE in macOS, 271 latent bugs in Firefox, and 2,000 vulnerabilities across Cloudflare's infrastructure in weeks — these are not research prototypes. They are production systems, available today, finding real bugs in real code.\n\nThe LLM vulnerability research agent isn't coming. **It's here.** And if you're shipping software that other people depend on, the question is not whether to engage with this technology — it's whether you engage with it proactively, as a defender, or reactively, after an attacker already has.\n\n**Three things you can do this week:**\n\n**Run the minimal scanner above** against your most critical service. Set your ANTHROPIC_API_KEY, point it at a repo, and see what it finds. The marginal cost of a scan is a few API dollars. The marginal cost of an unpatched critical is not.\n\n**Set up the GitHub Actions workflow** for your team's most security-sensitive repositories. Automated scanning on every PR is now table stakes.\n\n**Apply to Anthropic's Cyber Verification Program** if your organization does legitimate security research, red-teaming, or penetration testing. Access to higher-capability models in this domain is now a significant professional advantage.\n\nThe Glasswing era of software security has begun. The organizations that understand the architecture behind these tools — not just that they exist, but *how* they work and how to deploy them effectively — will have a structural security advantage for the next decade.\n\n**The bugs are being found. The question is who finds them first.**\n\n*Found this useful? Drop a ⭐ on the companion GitHub repo with the full harness implementation, contribute to the discussion in the comments, and share this with the security engineer on your team who hasn't heard about Project Glasswing yet.*\n\n**Tags:** `llm-vulnerability-research`\n\n`generative-ai`\n\n`cybersecurity`\n\n`agentic-ai`\n\n`claude`\n\n`gpt-5`\n\n`devsecops`\n\n`security-engineering`\n\n`zero-day`\n\n`project-glasswing`", "url": "https://wpnews.pro/news/llm-agents-are-now-finding-zero-days-how-ai-is-autonomously-rewriting-the-rules", "canonical_source": "https://dev.to/monuminu/llm-agents-are-now-finding-zero-days-how-ai-is-autonomously-rewriting-the-rules-of-vulnerability-3d2i", "published_at": "2026-05-26 04:21:27+00:00", "updated_at": "2026-05-26 04:33:20.929334+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-safety", "ai-research"], "entities": ["Apple", "Claude", "Anthropic", "Calif.io", "Claude Mythos Preview", "GPT-5.5", "Project Glasswing", "macOS Tahoe"], "alternates": {"html": "https://wpnews.pro/news/llm-agents-are-now-finding-zero-days-how-ai-is-autonomously-rewriting-the-rules", "markdown": "https://wpnews.pro/news/llm-agents-are-now-finding-zero-days-how-ai-is-autonomously-rewriting-the-rules.md", "text": "https://wpnews.pro/news/llm-agents-are-now-finding-zero-days-how-ai-is-autonomously-rewriting-the-rules.txt", "jsonld": "https://wpnews.pro/news/llm-agents-are-now-finding-zero-days-how-ai-is-autonomously-rewriting-the-rules.jsonld"}}