How I hunt for vulnerabilities with AI

Tsvetan Stoychev, a former principal software engineer at Akamai Technologies, used AI tools including GitHub Copilot, Claude Opus, and Gemini to find vulnerabilities in the ClickHouse codebase, successfully reporting real bugs to ClickHouse's bug bounty program. His approach involved prompting AI to review code for security issues, leveraging his naivety as a newcomer to explore deeper than seasoned researchers might.

This is a guest post from Tsvetan Stoychev, ex-Principal Software Engineer at Akamai Technologies. TL;DR:I'm an experienced software engineer, but I'm not a seasoned bug bounty hunter. I used GitHub Copilot in combination with Claude Opus and Gemini models to search for vulnerabilities in the ClickHouse codebase a large C++ codebase , generate hypotheses, and speed up validation in local environments. It worked remarkably well; I found a few real vulnerabilities and reported them to the ClickHouse bug bounty program . This post doesn’t focus on real vulnerabilities in the ClickHouse codebase, but it describes how I approach using AI for vulnerability research, and what I’ve learned so far. In late 2025, my manager Nic Jansma https://www.linkedin.com/in/nicjansma/ , asked our team a simple question: “How are you using AI in your day-to-day work?” I had the usual answers - autocomplete, quick prototypes, and documentation lookups - but he encouraged me to explore more. Around the same time, I was also inspired by a colleague, Rajesh Sharma https://www.linkedin.com/in/dude0/ . For the last two years he’s been doing bug bounty and attending CTF competitions for fun, and he’s also helped identify and fix security issues in our day job. Conversations with him helped me build intuition for common terminology in the software security space like: path traversal, “null byte”, SSRF, XSS, RCE, etc. We actually poked at ClickHouse together in mid-2025 to see whether there was anything interesting to explore. His take after auditing the code was: this is professional-grade C++ - there’s no obvious low-hanging fruit, and anything real would likely be subtle and hard to exploit. Said in simple words: one has to work hard in order to find something in the ClickHouse codebase. My assumption was, if there’s anything interesting, it probably will be hiding in very old but rarely touched code - or in the newest features. ClickHouse ships fast, adds new capabilities constantly, and grows every month. Rapid growth is great, but it also means fresh code paths and integrations that haven’t had years of battle testing. I’m a ClickHouse user for a personal project , and I’m not an experienced security researcher. In fact, I strongly believe that my naivety as a newcomer helped me explore deeper. Where an expert might spot a mitigation and immediately think "dead end". I often didn't know enough to give up. I kept pushing Copilot to "try harder" or "explain why". This led us down complex paths that a more seasoned researcher might think “nah, waste of time”. My initial experience with AI-assisted vulnerability research was working with GitHub Copilot but after a few months I switched to Claude and ChatGPT subscriptions. To use both ChatGPT and Claude effectively for vulnerability research, I went through their cybersecurity verification / trusted-access processes. My approach I follow a few steps single-threaded approach: Expand to read the flow as text - In Visual Studio Code, prompt GitHub Copilot to review the ClickHouse codebase. - Sometimes the prompt is open-ended: “Check if you see any vulnerability issues in the code introduced in the last 2 weeks.” - Sometimes the prompt is more specific: “Do you see any interesting candidates for memory corruption among the ClickHouse aggregate functions?” - Observe how GitHub Copilot browses folders, reads file contents, reasons and displays a few paragraphs of intermediate summaries. - By observing how Copilot works I understand more about the ClickHouse file structure and features. If I see something interesting I look at the ClickHouse source code myself or look at the official documentation. - When Copilot finishes with the review I carefully read the summary and I select interesting and novel ideas. - Over time Copilot will start “sharing” the same ideas or will produce complete hallucinations. Example hallucination will be when the model will claim that ClickHouse could be hacked when the default admin user has an empty password set because anyone would be able to login as admin user. Of course this is an unrealistic scenario because this is not a ClickHouse security problem but a misconfiguration introduced by the engineer who provisioned the ClickHouse server. - When I pick an interesting idea I become curious and I often fall into a short loop where I ask a few more times “Why?” or “How?” where often the questions are related to the current code review session but occasionally ask a question related to something I noticed during previous review sessions. - “How does idea 1 relate to idea 5?” - “Do you think that idea 3 is applicable in the Y ClickHouse subsystem?” - In this case I am asking about “Y ClickHouse subsystem” because I learned something from a previous GitHub Copilot session. - When I am happy with all the generated ideas I ask the copilot to write a summary of the generated ideas to a markdown file. This allows me to come back later and explore something that I thought wasn’t interesting during the time of the Copilot assisted review. - I pick an idea and explicitly ask Copilot to use Python for any scripts that eventually reproduce the vulnerability on a running local ClickHouse Docker. - At this point Copilot is on cruise control and autonomously creates Python scripts, runs them against the running ClickHouse Docker container and corrects the Python scripts based on the ClickHouse runtime behaviour. - When the previous step produces convincing results I review manually the produced Python scripts and I run them manually making sure we didn’t get a false positive. - When I find a real vulnerability I send an email to security@clickhouse.com mailto:security@clickhouse.com and I submit a report to https://bugcrowd.com/engagements/clickhouse https://bugcrowd.com/engagements/clickhouse if the vulnerability is in scope of the bug bounty program. Example I am a strong believer of the “learning by doing” approach and if we put enough time on a problem the worst thing that can happen is to learn something new. Things are changing so dynamically in the LLM space and what we thought is the way to do things today could become obsolete in a few weeks when the frontier labs release a new model. That’s why my personal belief is that there is no absolute rule what your first prompt should be, but if you are inexperienced like me, try something open-ended and use it to gain more context about writing your next prompt. In this example I will be using Visual Studio Code + GitHub Copilot in Agent mode using the Opus 4.6 model. My assumption is that we are not familiar with the ClickHouse code and subsystems; we would like to explore, gather ideas and later start diving in. The first prompt Let’s try something like: “In summary, what is the code that the ClickHouse team shipped last month and come with 3 hypotheses: what vulnerability risk could be introduced?” The agent will start running “git log” commands and gathering intelligence: bash 1$ git log --oneline --since="2026-02-21" --until="2026-03-21" | head -80 2 31bc68528ea9 Merge pull request 98540 from ClickHouse/test-column-alias-set-operations 43099161b0c1 Merge pull request 98544 from ClickHouse/test-lc-null-full-join 54020adae771 Merge pull request 98704 from ClickHouse/fix-minio-restart-timeout-hang 6c90fedd0932 Merge pull request 98709 from ClickHouse/fix-flaky-test-hedged-requests-send-data 7… After a few minutes of work the agent will generate a report like: | | Hypothesis | Status | CWE | Severity | Attack Vector | |---|---|---|---|---|---| | 1 | Wire-format decoder trusts caller-controlled field width as source length → OOB read past remaining input buffer | Not confirmed | CWE-125 | CRITICAL | Tampered on-disk or in-flight data | | 2 | Untrusted module triggers arithmetic wrap in host allocator + logging side-channel | Not confirmed | CWE-190, CWE-117 | HIGH | User-uploaded executable payload | | 3 | Shared acceleration layer doesn't incorporate row-filter identity into hash | Not confirmed | CWE-200 | MEDIUM-HIGH | Parallel sessions with differing visibility rules | Note: The hypotheses in the table above were anonymized. All the models I experimented with always returned very convincing findings labeled as high and critical severity but often these were false-positives/hallucinations. Testing a hypothesis The hypotheses look convincing, but we need to dig deeper to understand if they are real and reproducible: 1Let’s explore hypothesis 1 “Wire-format decoder trusts caller-controlled field width as source length → OOB read past remaining input buffer” . 2 3A few hints: 41. We have Docker installed on this computer and in order to test the hypothesis you are allowed to run the ClickHouse in a Docker container from the latest official ClickHouse image. 5 62. Please write any proof of concept code in Python. 7 83. Make sure that you explore the hypothesis as a low-privileged ClickHouse user when connecting to our ClickHouse Docker container. At this point I observe carefully because sometimes the agent decides to take unexpected paths. A few times I noticed the agent running /proc/ pid /mem from inside the running ClickHouse Docker container, reading bytes from the heap and declaring victory. Running /proc/ pid /mem could be used as an exploratory technique but must not be used to trigger the vulnerability we are going to report because it bypasses the actual security boundary. In such cases I stop the agent and ask the agent for a handoff prompt: “What do we need to make it work without /proc/ pid /mem ? Please, generate a handoff prompt describing our direction, what was already tried and instructions that we must not use /proc/ pid /mem .” . After that I start a fresh agent session with the handoff prompt. Sometimes the agent gives up too quickly and I give it a little nudge with one of the following prompts: - Great progress and I think that we are getting close. What else do you think we could try? - These findings are marginal. Please expand your approach and try again. Most of the time we reach a dead end. This doesn’t mean that the time was wasted and I ask the agent to write a summary in a markdown file of what was tried and what blocked us. From time to time I try something unusual in case we reach a dead end or even if a hypothesis happens to be valid: “Do you see similar symptoms or bugs in sibling classes of the classes we’ve already explored?” - To my surprise a few times this led to a real discovery. If the original hypothesis is valid I move to the next step. Preparing to report Over time I found what works well for the Bugcrowd and the ClickHouse teams when sharing a report of a given vulnerability. The hypothesis we are exploring is about an “out-of-bounds read” vulnerability which allows an adversary to read memory that doesn’t belong to them but to other processes or tenants on a given ClickHouse instance. A triager would need clear evidence that the vulnerability is not being run as ClickHouse administrator user and that there is clear evidence that “target” and “adversary” authenticate as different ClickHouse users and have different grants. I prompt with the following template that produces the required files for demonstrating how the “out-of-bounds read” vulnerability works: 1 Bug-bounty PoC bundle — prompt template 2 3Produce a self-contained PoC bundle in the current directory. A triager should be able to run the scripts in order against a stock container of the target and watch a low-privileged adversary retrieve a secret it has no legitimate path to. 4 5 Files 6 7- requirements.txt — pinned Python deps. 8- 01 setup users.py — admin-driven setup. Creates target and adversary users/DBs/tables. Writes the usernames and passwords required for the next steps to poc config.json . 9- 02 target activity.py constantly seeds a synthetic secret clearly labeled DEMO / example- into target storage. 10- 03 adversary exploit.py — three phases: 11 1. Privilege probe — runs actual CAN-DO / CANNOT-DO / CROSS-TENANT probes live and prints each result proves the adversary has no direct path to the secret. 12 2. Fire the primitive — single request using only built-in functions of the target; rotate tunables if they broaden coverage; loop until the secret is recovered or timeout. 13 3. Structured report — labeled block printing the recovered secret verbatim alongside whatever collateral leaked. Exit non-zero if the secret was never seen. 14- 04 crash trace.py only if a variant cleanly crashes the target — fires the crash payload, restarts the container, pulls the fault block from logs, prints resolved frames. 15- README.md — triager-facing repro: docker run , venv, numbered steps, expected output excerpt showing the recovered secret, scope. 16- WRITEUP.md — engineer-facing RCA: defect file:line + excerpt, any guard bypassed and why, worked example if arithmetic, affected-versions list "verified live" vs "source-verified" , related public state. 17 18 Conventions 19 20- Target the latest stable clickhouse/clickhouse-server image on Docker Hub , unmodified. Look up the current stable tag at submission time do not assume the tag baked into an older PoC is still latest ; pin the exact tag in README.md so the run is reproducible months later. No sanitizer, no debug symbols, no custom build. 21- Scripts share state only via poc config.json . No hidden config. 22- Synthetic seeded secrets labeled DEMO / example- so they cannot be mistaken for real credentials. 23- README and WRITEUP cite source file:line for every defect claim. 24 25 Acceptance 26 27Running the scripts in order on a clean host ends with the adversary printing the seeded secret it had no grant to read; the privilege probe shows every direct path to that secret denied. This prompt generates a few files: - 01 setup users.py - 02 target activity.py - 03 adversary exploit.py - README.md - WRITEUP.md I read and review the generated README.md and WRITEUP.md and sometimes ask for advice on the current or different model where I need help understanding. I manually go through all the steps that a Bugcrowd triager would go through. I run all the Python scripts manually and I identify things that need to be fine-tuned. A few times I found out that the 02 target activity.py script wasn’t writing secrets to the database frequently enough and the 03 adversary exploit.py wasn't able to capture them. I also record a short screen recording that later I attach to the vulnerability report. This helps the triager to understand the flow and steps to reproduce. Submitting a vulnerability report The report template below proved over time to be working well when submitting a report to the Bugcrowd team. It demonstrates clearly an isolation between where ClickHouse runs and how a limited ClickHouse tenant can read heap data used by other tenants. The template is highly obfuscated and doesn’t contain real code from the ClickHouse codebase. It should be read with that in mind that it only demonstrates the structure of a Bugcrowd report. 1======================================================================= 2This is an AI assisted report. 3 4The PoC scripts and code analysis of the root cause were AI-generated and assisted. The report was hand-written and a few snippets copied from AI-generated code. 5 6Manually tested and verified before submitting. 7======================================================================= 8 9 Summary 10 11We demonstrated in a PoC where we provide tampered content in a simple SQL query that we can read bytes from the heap and demonstrated that we can access cross-tenant data. 12 13Example query: 14 15SELECT x 16FROM XXXXXXXXXXX 17 18 19 Video evidence: xxxxxxxx-video-evidence.mp4 20 21 PoC 22 23For the PoC we will need: 24 25- Docker 26- Python 3.9+ 27 28Required files: 29 30- requirements.txt 31- 01 setup users.py 32- 02 target activity.py 33- 03 adversary exploit.py 34 35The PoC uses 2 tenants - regular tenant and limited adversary . The regular tenant is sending queries to ClickHouse and the limited adversary is sending queries that read from the leaked heap. 36 37Steps: 38 39 1 - Run the ClickHouse in a Docker container: 40 41docker run -d --name ch-x86-lts 42 -e CLICKHOUSE USER=default 43 -e CLICKHOUSE PASSWORD=clickhouse 44 -e CLICKHOUSE DEFAULT ACCESS MANAGEMENT=1 45 -p 0.0.0.0:8123:8123 46 --ulimit nofile=262144:262144 47 clickhouse/clickhouse-server:26.3 48 49 50 2 - Create Python venv and install the dependencies. 51 52python3 -m venv .venv 53source .venv/bin/activate 54pip3 install -r requirements.txt 55 56 57 3 - Create regular tenant and limited adversary : 58 59python3 01 setup users.py 60 61 62This will create 2 ClickHouse users and a poc config.json file with the user credentials required for the next steps: 63 64============================================================================== 65 target : regular tenant password: bug 1778537821 pswd 91dc609e 66 adversary : limited adversary password: att 1778537821 d26ccb0b 67 config : /xxxxxxxxxxxx/poc config.json 68============================================================================== 69 70 71The limited adversary can do mostly simple SELECT queries but can’t read from tables owned by regular tenant . 72 73 4 - In one terminal simulate regular tenant activity where the user will be writing secrets: 74 75python3 02 target activity.py 76 77 78Example output: 79 80 target regular tenant active; mix of SELECTs and INSERTs against target db 81 target INSERTed secret values land in CH's AST / query-text heap 82 target q 110 SELECT id, secret FROM target db.sensitive WHERE id = 2 2 OAUTH=eyJhbGciOiJIUzI1NiJ9.v 83 84 85 5 - While 02 target activity.py is running, open another terminal and run the 03 adversary exploit.py script that will be reading from the heap and it will find data from other tenants data. 86 87Activate the venv in the other terminal: 88 89source .venv/bin/activate 90 91 92Run the script: 93 94python3 03 adversary exploit.py 95 96 97Example output: 98 99============================================================================== 100 adversary identity : limited adversary 101 target tenant : regular tenant database: target db 102 goal : read target db data via shared-memory leak 103============================================================================== 104 105--- privilege probe run by adversary --- 106 107 ✓ WHAT ADVERSARY CAN DO: 108 ✓ Run SELECT queries 109 ✓ Use unhex to make bytes 110 ✓ Use hex to read as hex 111 ✓ Use format table function 112 ✓ Use aggregate functions 113 ✓ Use CAST 114 115 ✗ WHAT ADVERSARY CANNOT DO: 116 ✗ CREATE USER — Access Denied 117 ✗ DROP USER — Access Denied 118 ✗ Query system.users — Access Denied 119 ✗ SHOW GRANTS for other users — Access Denied 120 ✗ Query system.query log — Access Denied 121 ✗ Query system.processes — Access Denied 122 ✗ Use file table function — Access Denied 123 ✗ Use url table function — Access Denied 124 125 ✗ CROSS-TENANT ACCESS RBAC ISOLATION : 126 ✗ Query target db.sensitive — Access Denied 127 ✗ Access default database — Access Denied resource hidden 128 ✗ INSERT into target db — Access Denied 129 ✗ DROP TABLE in target db — Access Denied 130... 131... 132... 133============================================================================== 134 CROSS-TENANT DATA RECOVERED FROM PROCESS HEAP 135============================================================================== 136 ✓ target secret API KEY= : 9 distinct value s 137 'API KEY=sk-target-prod-7d3f9a-row-21API KEY' 138 'API KEY=sk-target-prod-7d3f9a-row-15API KEY' 139 'API KEY=sk-target-prod-7d3f9a-row-33API KEY' 140 ✓ target secret OAUTH= : 1 distinct value s 141 'OAUTH=eyJhbGciOiJIUzI1NiJ9.targetDB PWD' 142 ✓ target secret DB PWD= : 1 distinct value s 143 'DB PWD=target-mysql-2026API KEY=sk-target-prod-7d3f9a-row-3API KEY=sk-target-prod-7d3f9' 144 target INSERT statement: not seen 145 target table path: not seen 146 target query WHERE clause: not seen 147 ✓ target username in heap: 1 distinct value s 148 'regular tenant' 149 150 151 Root cause 152 153File: drivers/usb/diag/endpoint summary.c Function: format endpoint summary lines 312–338 in v6.8 / mainline 154 155 156static int format endpoint summary 157 const struct usb endpoint descriptor ep, 158 const char interface name, 159 char outbuf, size t outbuf size 160{ 161 char line 80 ; 162 int n; 163 164 n = snprintf line, sizeof line , / 1 / 165 "iface=%s ep=0x%02x maxpkt=%u type=%u", 166 interface name, 167 ep- bEndpointAddress, 168 le16 to cpu ep- wMaxPacketSize , 169 ep- bmAttributes & USB ENDPOINT XFERTYPE MASK ; 170 171 if n < 0 / 2 / 172 return -EINVAL; 173 if size t n outbuf size 174 return -ENOSPC; 175 176 memcpy outbuf, line, n ; / 3 / 177 return n; 178} 179 180 1811. 1 snprintf itself is bounded against sizeof line == 80 , so the write into line is safe — at most 79 chars plus a terminator land in the buffer. The trap is its return value: ISO C99 / POSIX specify that snprintf returns the number of bytes that would have been written had the buffer been unbounded , not the number actually emitted. An adversary-supplied interface name of 300 characters delivered as a USB string descriptor and surfaced through the diag node drives n well past sizeof line even though line itself was correctly truncated. 182 1832. 2 The only guards on n are n < 0 snprintf error and n outbuf size caller's buffer too small . Neither compares n against sizeof line . The function carries on as if n valid bytes are sitting in the local stack buffer. 184 1853. 3 memcpy outbuf, line, n reads n bytes from an 80-byte stack array. When the formatted-but-truncated length exceeds 80, the copy walks off the end of line and through whatever sits below it in the current stack frame — saved frame pointer, return address, caller-side spills, neighbouring locals from usb set configuration further up the call chain. Those bytes are then handed to userspace through outbuf , which the diag node makes readable via a sysfs attribute and an ioctl . The result is a kernel-stack read primitive triggered each time userspace reads the endpoint summary for an adversary-controlled USB device — no privilege required beyond plugging in the device. Reducing noise I would like to share a few tricks that helped to save time and to avoid dead ends. I learned these tricks by observing the work of the agent long enough. Deleting agent instructions The ClickHouse source code contains markdown files like .github/copilot-instructions.md and others that serve as instructions for AI. These instructions are useful if you’re developing ClickHouse, but having these files inside the worktree means our agent could potentially inherit instructions that are not relevant for our security research. I typically delete: - .claude - .cursor - .github - AGENTS.md Deleting other files The ClickHouse codebase contains test folders, configurations and utilities. I noticed a few times that Copilot explored these parts of the codebase and reported false positives. Copilot suggested that I can use clickhouse-local https://clickhouse.com/docs/operations/utilities/clickhouse-local in order to run a query that will allow me later to get an RCE but this was a hallucination because clickhouse-local is a program that we run from the command line where ClickHouse server is being hosted and the fact that we can run commands via clickhouse-local means that we must have already gained control over the ClickHouse. That’s why usually I delete the following folders: - benchmark - ci - cmake - docker - docs - packages - programs - tests - utils Temporary local patches Over time I noticed the agent finding the same things over and over again. For example the ClickHouse url function was flagged confidently that it can be used for an SSRF attack or that a given function is vulnerable to buffer overflow but in reality this was a false positive because the function was well protected and never accepted user input. In other cases the agent was detecting the same vulnerability that we just reported to the ClickHouse team and Bugcrowd. In such cases it takes time for an official patch and mitigation to be provided by the ClickHouse team, which meant that I was working on code where the vulnerability still existed. I asked Copilot to create a patch or to completely delete a file. The quality of the patch didn’t matter but it was enough to ensure that Copilot won’t keep reporting things we’ve already found. Cleanup or archive agent’s artifacts The agent tends to create many scripts and markdown files in the workspace from all the iterations. In my case some of the markdown files happened to be root cause analysis from vulnerabilities that were discovered in prior sessions. This is a double-edged sword because I use these markdown files when I continue exploring a path that was put on hold, but in a few occasions the agent discovered these markdown files and declared victory: “Jackpot We already found a heap memory disclosure We should stop here and report to Bugcrowd the vulnerability we documented in RAC OOB READ XXXXXX.MD.” At this point, it is my call. Sometimes I decide to start working from a clean workspace and sometimes I acknowledge the risk and I continue working from the same workspace. From discovery to report Earlier I noted that I need a few hours to manually verify and to prepare a report that I send to the ClickHouse team and Bugcrowd. It usually takes from 3 to 4 hours because I want to make sure that the report is factual and that it will be well understood. This is the part of the process that I still can’t and probably shouldn’t optimize because there is a great value in going manually through writing a report and understanding the exact root cause. A few months back I was finding mostly simple bugs that were easy to understand because I’ve been patching such vulnerabilities in personal projects e.g. path traversal. I believe that over time most of the simple bugs were already discovered by the ClickHouse team and security researchers and today we are left with the more interesting ones - Out-of-bounds Write and Out-of-bounds Read . Out-of-bounds Write and Out-of-bounds Read are really interesting but sophisticated at the same time. They require clever ways to be triggered and good understanding of how memory allocators work and knowledge about CPU architectures and instructions. An experienced C++ engineer would have an easy time to understand and report such types of vulnerabilities but in my case I work with programming languages that do not require deep understanding of how the computer memory works. The Out-of-bounds Write and Out-of-bounds Read bugs consume most of my time. They are challenging but keep me curious. I use the opportunity to understand them and to research if there are similar issues in other parts of the ClickHouse codebase. As a matter of fact due to this extended research I found out that there was a bug in a third-party library that ClickHouse uses. Of course I reported the bug to the vendor. Writing a quality report is probably the most important part when doing AI assisted vulnerability research. It leaves a good impression on the Bugcrowd team and the engineers that are going to work on patching vulnerable code. As a last step I recently added to my workflow a step where I combine Opus and GPT models and ask them to check if the report is factual. Every time one of the models suggests something that the other misses. Creating a PoC became easy In vulnerability research a PoC Proof of Concept is the program that demonstrates a vulnerability. Sometimes a PoC could be a simple program but other times it could be a complex and hundred or thousands of lines long code. A few times I worked on PoCs where I needed to “talk” binary protocols or to prepare a payload where I had to tamper binary file formats that I wasn’t familiar with at all. The latest frontier models created such PoCs in a matter of minutes where without AI assistance I probably would need days of coding and studying file formats. The AI assistance is a huge time saver when creating PoCs. It frees so much time for exploration work and It’s definitely an enabler for researchers like me that got interested in vulnerability research in the era of AI assisted coding. One thing that I changed to save time was asking the agent to write the PoCs in Python. Between December 2025 and January 2026 I was instructing the agent to write the PoCs in Node JS but I was often getting code with syntax errors. I decided to change to Python and since then I don’t have any issues. I suppose that nowadays the models have improved and don’t have problems with making syntax errors with Node JS but as general advice I would say to try a different programming language in case the agent produces PoCs with syntax errors. Frontier labs outages Almost every month there are a few outages in the data centers of the frontier labs. Sometimes the outages are short, just for a few minutes, other times longer, a few hours. Based on the status pages as of June 4, 2026: - Claude - 98.82% uptime in the last 90 days. - ChatGPT - 99.89% uptime in the last 30 days. Other times the models were becoming slower or let’s say less capable for a short time and practically unusable for AI assisted vulnerability research. Earlier I was using a subscription from one of the frontier labs and in case of an outage I was taking a break or continued exploring manually the ClickHouse codebase. At the moment I am using subscriptions from both ChatGPT and Claude and in case of an outage of one of the providers I switch to the other provider. This is the reality at the moment but I am considering experimenting with local LLMs and subscribing to OpenRouter https://openrouter.ai/ and experimenting with other models. Closing thoughts In closing I would like to share a few things that could be helpful for other researchers or others that would like to give a try to AI assisted vulnerability research. Start small and scale your tooling gradually About 6 months ago I was using GitHub Copilot Pro+ for $39 but today I use Claude Max 20x for 200$ and ChatGPT Pro for $200 because my usage increased. This investment definitely paid off for me but I reached this point gradually. I believe that a $100 Claude or ChatGPT subscription could be a good start for someone who would like to give a try to AI assisted vulnerability research. Trusted access matters for security research I also had to go through a verification process for both ChatGPT and Claude in order to use their models for vulnerability research. Presenting evidence of prior vulnerability discoveries helped during the process and I was lucky that I already found a few vulnerabilities with the help of GitHub Copilot. I am not sure how easy it is to complete the verification process nowadays but if that is not possible, I would suggest trying some of the AI vendors available at OpenRouter https://openrouter.ai/ . Guided single-agent workflows worked best for me I experimented with automating more and chaining different models in agentic workflows which led to a few new discoveries but not major breakthroughs. I will be attempting to improve in automation but so far the most productive approach where I found high-severity issues was when I guided a single agent. Stay flexible because the tooling changes fast The AI space is changing rapidly and I personally don’t think in absolute terms. What one has built today could become obsolete in a few weeks when a new harness or a new model drops from frontier AI labs. I personally witnessed a case where an engineer had built their own memory system for agentic workflows and after a few months the frontier labs released a generally available memory system in the form of MEMORY.MD . I personally had to accept that I should not stick to a single frontier lab and play with every new available capability and model. The pace is exciting Definitely interesting times for the security industry. I find myself finding novel vulnerabilities every time after a new release comes out. It’s a bit scary but exciting at the same time for the moment as powerful cybersecurity-focused models such as Claude Mythos Preview become available to more vetted organizations and researchers. Persistence is key I have been exploring the ClickHouse codebase for six months. I made my first discovery after two weeks of research, and it turned out to be out of scope for the bug bounty program. However, this made me more curious, and I continued exploring. A few weeks later, I found an “out-of-bounds read” bug, which was a real finding and led to a payout from the ClickHouse bug bounty program. I continued researching, and over time I became more comfortable and built a mental map of the ClickHouse codebase. Today, I discover vulnerabilities more often, but, to reach this point, I had to be persistent and invest time. Manual verification is what builds trust And last but not least, AI-generated vulnerability reports may contain non-factual information or could be complete AI hallucinations. There are already a few cases where popular software projects ended their public bug bounty programs due to too many false AI-generated submissions. I strongly believe that manually written and verified reports are the key for learning and building a good reputation.