Building a 10-Agent Security Civilization with Qwen and Alibaba Cloud 🛡️🤖

A team of developers built NEXUS, an autonomous society of 10 specialized AI agents for security auditing, using Qwen models via Alibaba Cloud's DashScope API. The system discovers, triages, exploits, patches, and reports vulnerabilities in real open-source software with zero false positives by design. NEXUS features a governance council for severity scoring and a three-tier memory engine for learning across codebases.

Security auditing is broken. If you’ve ever run a static analysis tool SAST on a large codebase, you know the pain: thousands of alerts, zero context, and a 90% false-positive rate. On the other end of the spectrum, hiring human penetration testers is incredibly expensive and impossible to scale alongside modern CI/CD pipelines. For the Qwen Cloud Global AI Hackathon , we decided to rethink the problem entirely. What if, instead of using a single monolithic AI to "find bugs," we built an entire specialized civilization of agents? Meet NEXUS , an autonomous society of 10 distinct AI agents that discovers, triages, exploits, patches, and reports security vulnerabilities in real open-source software. Instead of asking an LLM to "find a bug and fix it" which usually results in hallucinations , we split the vulnerability lifecycle into 10 distinct, highly-specialized roles. Powered by the DashScope API using Qwen-Max and Qwen-Plus , our pipeline looks like this: Qwen-Plus : Qwen-Plus : Qwen-Max : Qwen-Max : Qwen-Max : Qwen-Max : Qwen-Max : Qwen-Plus :By forcing the system to generate a PoC and independently verify it, we shifted from a model of guessing bugs to proving them. Zero false positives by design. One of the coolest features we built is the Governance Council . When the Hunter agent finds a verified vulnerability, we don't just ask a single LLM to rate its severity. Instead, we spin up three distinct agents with completely different system prompts: These three agents independently evaluate the finding, and the orchestrator mathematically averages their scores to reach a consensus. Watching them debate a vulnerability in real-time on our dashboard feels like a glimpse into the future of autonomous organizations. To make NEXUS actually learn from its scans, we couldn't just rely on context windows. We built a 3-tier memory engine: pgvector : pgvector . Over time, NEXUS actively learns what vulnerabilities "look" like across different codebases.NEXUS isn't just an API wrapper; it's deeply integrated into the Alibaba Cloud ecosystem. dashscope-intl.aliyuncs.com API for Qwen inference. We routed high-reasoning tasks to Qwen-Max and summarization/routing tasks to Qwen-Plus to optimize our API credits. oss2 Python SDK so that the moment the Report agent finishes its job, the final Markdown advisory is immutably uploaded to We built a Next.js "Mission Control" dashboard that connects to our backend via WebSockets. When you paste a GitHub URL into NEXUS, you get to sit back and watch 10 AI agents systematically dismantle, exploit, and patch the codebase in real-time. Building NEXUS taught us that the future of AI isn't a single, omniscient chatbot. It's specialized, communicative, and governed societies of agents working together to solve problems that humans simply don't have the scale to tackle alone. Built for the Qwen Cloud Global AI Hackathon 2026. Check out the code on GitHub