Security auditing is broken.
If you’ve ever run a static analysis tool (SAST) on a large codebase, you know the pain: thousands of alerts, zero context, and a 90% false-positive rate. On the other end of the spectrum, hiring human penetration testers is incredibly expensive and impossible to scale alongside modern CI/CD pipelines.
For the **Qwen Cloud Global AI Hackathon**, we decided to rethink the problem entirely. What if, instead of using a single monolithic AI to "find bugs," we built an entire specialized *civilization* of agents?
Meet NEXUS, an autonomous society of 10 distinct AI agents that discovers, triages, exploits, patches, and reports security vulnerabilities in real open-source software.
Instead of asking an LLM to "find a bug and fix it" (which usually results in hallucinations), we split the vulnerability lifecycle into 10 distinct, highly-specialized roles.
Powered by the DashScope API (using Qwen-Max
and Qwen-Plus
), our pipeline looks like this:
Qwen-Plus
):`Qwen-Plus`
):`Qwen-Max`
):`Qwen-Max`
):`Qwen-Max`
):`Qwen-Max`
):`Qwen-Max`
):`Qwen-Plus`
):By forcing the system to generate a PoC and independently verify it, we shifted from a model of guessing bugs to proving them. Zero false positives by design.
One of the coolest features we built is the Governance Council.
When the Hunter agent finds a verified vulnerability, we don't just ask a single LLM to rate its severity. Instead, we spin up three distinct agents with completely different system prompts:
These three agents independently evaluate the finding, and the orchestrator mathematically averages their scores to reach a consensus. Watching them debate a vulnerability in real-time on our dashboard feels like a glimpse into the future of autonomous organizations.
To make NEXUS actually learn from its scans, we couldn't just rely on context windows. We built a 3-tier memory engine:
pgvector
):pgvector
. Over time, NEXUS actively learns what vulnerabilities "look" like across different codebases.NEXUS isn't just an API wrapper; it's deeply integrated into the Alibaba Cloud ecosystem.
dashscope-intl.aliyuncs.com
API for Qwen inference. We routed high-reasoning tasks to Qwen-Max
and summarization/routing tasks to Qwen-Plus
to optimize our API credits.oss2
Python SDK so that the moment the Report agent finishes its job, the final Markdown advisory is immutably uploaded to We built a Next.js "Mission Control" dashboard that connects to our backend via WebSockets. When you paste a GitHub URL into NEXUS, you get to sit back and watch 10 AI agents systematically dismantle, exploit, and patch the codebase in real-time.
Building NEXUS taught us that the future of AI isn't a single, omniscient chatbot. It's specialized, communicative, and governed societies of agents working together to solve problems that humans simply don't have the scale to tackle alone.
Built for the Qwen Cloud Global AI Hackathon 2026. Check out the code on GitHub!