# What I learned from my first AI-assisted bug bounty submissions

> Source: <https://dev.to/taiman724/what-i-learned-from-my-first-ai-assisted-bug-bounty-submissions-4fh>
> Published: 2026-05-29 04:08:57+00:00

Third post in my "AI-assisted OSS contribution" series. The first two were about

[pre-fork due diligence]and[shipping a fix to ONNX with my own PR scanner]. This one is about a harder game: security research and coordinated disclosure.

For a while my AI-assisted open-source work was about *contributions* — typo fixes, docs, small bug fixes, the occasional feature. Pull requests have a forgiving feedback loop: if a PR is wrong, a maintainer comments and you iterate. Bug bounty work is different. The feedback loop is slower, the bar for "novel and correct" is much higher, and a lot of the difficulty has nothing to do with the vulnerability itself.

I ran a small experiment: use Claude (Opus) to help me find, verify, and write up vulnerabilities in **public, in-scope open-source bug bounty programs** — the kind that publish a scope and a safe-harbor policy and explicitly invite testing. Here's what actually mattered, mostly the things I didn't expect.

The single biggest risk to a bounty submission is not "is it a real bug" — it's "did someone already report it." And you usually *cannot see* the answer.

I built a small novelty-checking toolchain around the assistant: query published advisories (GHSA via the GitHub API), aggregate cross-ecosystem advisory data (OSV), search the target repository's own issues and PRs, and pull recent security-research feeds. It catches a lot. But it has a fundamental blind spot: **privately submitted reports are invisible until they're disclosed.** One of my submissions was closed as a duplicate of a report filed months earlier that I had no way of seeing. The finding was correct. It just wasn't *first*.

The lesson isn't "check harder." Public OSINT can only ever reduce duplicate risk, never eliminate it. The realistic takeaways:

It is very easy to read code, build a clean mental model of a bug, write a confident report — and be wrong, because the runtime doesn't behave the way the reference manual says it does. I got burned by exactly this kind of gap between "what the spec says" and "what the implementation does."

The discipline that fixed it: **no claim without a runnable proof of concept, executed against the actual runtime.** Not pseudocode. Not "this should work." A minimal, contained reproduction on my own machine — localhost only, no third-party or production systems touched — that either fires or it doesn't. An AI assistant is genuinely good at the first 80% of building that PoC fast; the last 20% (does it *actually* reproduce?) is non-negotiable and is where most false positives die.

Modern bounty platforms ration your ability to submit. New researchers get a limited number of "trial" reports, and a reputation/signal score that drops when you file invalid or duplicate reports — low enough, and you get blocked from submitting at all.

This completely changes the optimal strategy. When submissions are cheap, volume wins. When each submission costs scarce signal, **quality dominates volume**, and a single duplicate or "informative" close is genuinely expensive. With an AI assistant that can generate plausible-looking reports quickly, this is the most important guardrail: the bottleneck must be *verification*, not generation.

Some honest context, because it shaped my results. The open-source bounty landscape contracted noticeably in early 2026:

A widely-cited reason: AI-assisted discovery started producing vulnerability reports faster than open-source maintainers could triage and remediate them. The irony isn't lost on me — the same tooling that makes an individual researcher more productive, in aggregate, helped congest the system that pays them. If you're starting now, plan for fewer open programs and lower-but-real payouts than the headline numbers from a year ago.

I disclose AI assistance in every submission. Not as a disclaimer-shaped apology — as a fact, the same way you'd note any tool in your methodology. Two practical reasons beyond honesty:

The model does the heavy lifting on code review, hypothesis generation, and drafting. I own scope selection, the decision to submit, the ethics, and the final verification. That division of labor is the whole point.

I'm still early — a couple of submissions in, one under triage as I write this, plenty unproven. But the meta-lessons above transferred cleanly from the PR work in my earlier posts: the assistant compresses the *mechanical* effort, and that just relocates all the value to judgment — what to look at, whether it's really true, and whether you should hit submit.

*Developed with AI assistance (Claude Opus); all findings were reviewed, reproduced locally, and verified by me before submission. No unpatched or undisclosed vulnerability details are included in this post.*