I was reviewing a system prompt for an MCP agent I'd written three weeks earlier when something hit me hard: the prompt was accepting instructions from the output of an external tool. No sanitization. No validation. No limits whatsoever on what it could do with that output. The tool called a public API, got back JSON, and that JSON landed directly in the model's context.
That's when I opened the OWASP LLM Top 10 and stopped reading it like a list of best practices — and started using it for what it actually is: an audit framework.
My thesis is simple: most posts about the OWASP LLM Top 10 explain the ten risks to you. None of them show you how to run them against your own stack and what you actually find when you do it seriously. That's the difference between "reading the checklist" and "auditing the pipeline." This post is the second thing.
Before getting into the checklist, some context: I have a TypeScript agent pipeline with three layers that interact with each other:
Each layer has a different attack surface. That's exactly what the OWASP LLM Top 10 let me see with surgical precision.
This was the biggest finding. My MCP agent was receiving output from external tools and injecting it directly into context with zero sanitization layer. In an adversarial scenario, any API the agent queried could return text specifically crafted to overwrite the system prompt instructions.
The broken pattern looked like this:
// ❌ Insecure pattern: external output goes straight into context
async function fetchContextAndInject(url: string): Promise<string> {
const response = await fetch(url);
const data = await response.json();
// data.content reaches the model context with no filtering whatsoever
return data.content;
}
What I changed it to:
// ✅ Structural validation before injecting into context
import { z } from "zod";
const ExternalResponseSchema = z.object({
// Only accept fields with defined types — free-form strings flagged as suspicious
title: z.string().max(200),
summary: z.string().max(1000),
// Anything not in the schema gets discarded
});
async function fetchContextSafe(url: string): Promise<string> {
const response = await fetch(url);
const raw = await response.json();
// If the schema fails, the agent gets a structured error — not the raw payload
const parsed = ExternalResponseSchema.parse(raw);
return `Title: ${parsed.title}\nSummary: ${parsed.summary}`;
}
I used Zod — which was already in the stack for API validation — as the first line of defense. It's not a complete solution to prompt injection, but it reduces the structural attack surface.
The second problem: agent output was reaching the UI without escaping. In an agent that generates HTML or Markdown, that's potential XSS if the output gets rendered directly.
I traced every place where model output touched the DOM and added explicit sanitization before any render. If the agent generates code, that code goes into a <pre>
block with escaped characters — not into an innerHTML
.
Here the OWASP LLM Top 10 points to risks in the base model, not the application. In my case the model is Claude via API — I don't control the fine-tuning or the dataset. My only action was to document this dependency explicitly: if Anthropic has a problem here, I have a problem here. No system prompt compensates for that.
Honest limit: you can't audit this from the application layer. It's a dependency you take as a trust boundary.
I checked whether I had rate limiting on the endpoints that trigger model calls. I didn't — at least not in the local testing context. In a production scenario this is critical: a badly designed loop or a tool that calls recursively can fire dozens of model requests in seconds.
I added a simple iteration cap to the agent loop:
// Iteration control to prevent infinite loops in the agent
const MAX_ITERATIONS = 10;
let iterations = 0;
while (agentShouldContinue && iterations < MAX_ITERATIONS) {
iterations++;
const result = await runAgentStep();
agentShouldContinue = result.continueLoop;
}
if (iterations >= MAX_ITERATIONS) {
// Explicit log — I want to know if this ever fires
console.warn("[agent] Iteration limit reached — review loop");
}
This risk made me look at two things: the npm packages I use to interact with the model API, and the dependencies of my MCP tools. With pnpm workspaces (something I covered in the monorepo with Railway post) you get lockfile visibility — but that's not the same as auditing.
What I added: pnpm audit
as an explicit CI step before deploying any agent. It doesn't eliminate the risk, but it makes it visible.
This is where the second uncomfortable finding showed up: my system prompts contained configuration context that included names of internal tools, data structure details, and some system defaults. That context reaches the model — and if the model echoes it in its output, it's exposed.
The rule I applied: nothing you wouldn't want to see in a public log should be in a system prompt without explicit confidentiality marking. And even that isn't a guarantee — it's mitigation.
// Separate technical config from agent instructions
const SYSTEM_PROMPT_PUBLIC = `
You are a development assistant. You can use available tools
to answer technical questions.
`;
// This does NOT go into the system prompt — it lives in a separate config layer
const AGENT_CONFIG_PRIVATE = {
toolEndpoints: process.env.TOOL_ENDPOINTS,
internalSchema: process.env.INTERNAL_SCHEMA,
};
My MCP tools are essentially plugins. The risk here is that a tool has broader permissions than it actually needs. I reviewed each tool and applied least privilege: a tool that reads files doesn't need write access; a tool that queries an API doesn't need filesystem access.
This connects directly to what I wrote about OAuth scope creep — the same audit pattern applies to an agent's tools.
This is the risk that concerns me most specifically with Cline. The agent has terminal access, can execute commands, can modify files. If the reasoning loop fails, it can cause real damage.
What I implemented: "confirm before execute" mode for any tool with an irreversible side effect. It's not automatable — it requires deliberate human friction. And that friction is the entire point.
// Explicit tool classification by impact
type ToolImpact = "read-only" | "reversible" | "destructive";
const TOOL_IMPACT_MAP: Record<string, ToolImpact> = {
readFile: "read-only",
listDirectory: "read-only",
writeFile: "reversible",
deleteFile: "destructive",
runCommand: "destructive",
};
async function executeTool(toolName: string, args: unknown) {
const impact = TOOL_IMPACT_MAP[toolName] ?? "destructive"; // safe fallback
if (impact === "destructive") {
// and wait for human confirmation before executing
await requireHumanApproval(toolName, args);
}
return runTool(toolName, args);
}
This isn't a purely technical risk — it's organizational. The problem is trusting the agent's output without external validation. In my pipeline, any output going to production passes through a structural validation layer before it's used as input to another system. The model can be fine, the pipeline can be fine, and the output can still be wrong.
This risk doesn't close with code. It closes with process and human review at critical nodes.
In my TypeScript agent context, this mainly applies to protecting system prompts. A well-crafted system prompt represents real work — and if it leaks, it can be replicated or used to bypass restrictions.
What I implemented: system prompts don't live in frontend code. They're served from an authenticated endpoint, they're not logged in plain text, and they don't get exposed in the client bundle.
Here's what the list doesn't resolve on its own:
It doesn't tell you the priority order for your stack. LLM01 (prompt injection) was critical in my case; LLM03 (training data poisoning) is irrelevant from the application layer. Without applying it against your concrete architecture, you don't know which one is urgent.
It gives you no criteria for the trust boundary of the base model. If you use Claude, GPT-4, or any external API, LLM03 and part of LLM05 are dependencies you take as given. The framework names them, but the mitigation is out of your hands.
It doesn't distinguish between runtime risks and design risks. LLM01 and LLM02 are problems you can detect and mitigate at runtime. LLM08 (excessive agency) is a design problem — if the agent has too many permissions, a runtime patch doesn't fix it.
I have a post on OpenTelemetry in Next.js where I talk about traces that survive the edge. That kind of observability helps here too: if you can't see which tools the agent called and with what args, you can't audit LLM08 in production.
| Risk | State Found | Action Taken |
|---|---|---|
| LLM01 Prompt Injection | ❌ Vulnerable | Zod schema on external tool output |
| LLM02 Insecure Output | ⚠️ Partial | Explicit escaping before render |
| LLM03 Training Data | 🔵 Out of scope | Documented as trust boundary |
| LLM04 Model DoS | ⚠️ No limit | Added max iterations + log |
| LLM05 Supply Chain | ⚠️ Invisible | |
pnpm audit in CI |
||
| LLM06 Info Disclosure | ❌ Leaky prompts | Separated config from system prompt |
| LLM07 Plugin Flaws | ⚠️ Partial | Permission review per tool |
| LLM08 Excessive Agency | ⚠️ No friction | Confirm before execute on destructive tools |
| LLM09 Overreliance | 🔵 Process | Human validation at critical nodes |
| LLM10 Model Theft | ⚠️ Prompts exposed | Prompts moved to authenticated endpoint |
❌ = critical finding | ⚠️ = partial mitigation | 🔵 = outside application control
Does the OWASP LLM Top 10 apply to agents built on Claude or GPT-4 via API?
Yes, with nuance. LLM01, LLM02, LLM06, LLM07, LLM08, and LLM10 are application-layer risks — they apply regardless of which model you use. LLM03 (training data) and part of LLM05 are provider risks: if you use an external API, you take them as a trust boundary. The audit starts with the risks you can actually control.
Is Zod enough to mitigate prompt injection?
No. Zod validates the structure of external output before it reaches context — that reduces the surface area, but it doesn't eliminate the risk. A well-formed adversarial payload can pass schema validation. Zod is one layer, not a complete solution. Real mitigation combines schema validation, system prompt constraints, and human review at critical points.
Is Cline safe to use in production as an agent orchestrator?
Cline has access to the filesystem, terminal, and other tools with real effects. That's not inherently unsafe — it's the functionality that makes it useful. The risk (LLM08) is in the design: if the agent can execute destructive commands without human confirmation, the risk is real regardless of how well Cline is configured. My rule: any tool with an irreversible effect requires explicit approval.
How often should you run this audit?
Every time you change the agent's architecture: you add a new tool, change the system prompt, or modify how the agent consumes external outputs. It's not a one-time audit — it's a checklist that runs against every structural change. If you add observability (OpenTelemetry is one option), you can catch runtime anomalies between audits.
Does the OWASP LLM Top 10 cover multi-agent risks or just single-agent?
The current version (2025) primarily covers per-agent risk. In multi-agent architectures, LLM01's surface multiplies: each agent can become an injection vector for the others. The framework names the risk, but the mitigation detail for multi-agent pipelines is left to each team.
Which risk should I tackle first if I have limited time?
LLM01 (prompt injection) if your agent consumes external output — it's the most exploitable and the most overlooked. LLM08 (excessive agency) if the agent has access to tools with irreversible effects — it's the one that can do the most damage when something goes wrong. The rest depend on your specific stack, but these two are the absolute floor.
My position is clear: the OWASP LLM Top 10 is not something you read and consider covered. It's something you bring into a review session with the architecture diagram open in front of you, and you ask — for each risk — exactly where in the pipeline that could fail.
What I don't buy is the idea that "following best practices" is enough. Practices are abstract; the pipeline is concrete. In my case, LLM01 and LLM06 were real problems I wouldn't have found without doing the systematic audit exercise. I would have discovered them when someone motivated enough decided to exploit them.
If you already have TypeScript agents with MCP tools or elaborate system prompts, do the exercise: open the OWASP LLM Top 10, open the architecture diagram, and ask risk by risk. The result will be more interesting than the list itself.
Concrete next step: take the checklist from this table, replace the states with your own, and document the findings. An audit that isn't documented doesn't exist.
Original source:
This article was originally published on juanchi.dev