Promptfoo: LLM Red Teaming Against OWASP Top 10 The open-source tool Promptfoo, acquired by OpenAI in March 2026, maps its 155 attack plugins to the OWASP LLM Top 10 2025 list for structured red teaming of LLM-powered products. It details the 2025 revision's new categories, including System Prompt Leakage and Vector and Embedding Weaknesses, and provides practical YAML configuration examples for testing vulnerabilities like prompt injection, sensitive information disclosure, and excessive agency in CI pipelines. If you ship an LLM-powered product and have not run a structured red team against it, you are flying blind on security. The OWASP LLM Top 10 2025 released November 2024 now gives you a canonical list of attack categories to test against — and Promptfoo, the open-source tool that OpenAI acquired in March 2026 for its enterprise security reach, maps its 155 attack plugins directly to that list. This guide walks through exactly how that mapping works, what a working YAML config looks like, and how to wire it into a CI pipeline before a bad actor does it for you. What the OWASP LLM Top 10 2025 Actually Covers The 2025 edition is a substantial revision from the 2023 original. Two new categories were added, several were renamed, and the ordering shifted to reflect real-world incident data from the intervening year. Here is the full current list: | ID | Category | What Changed in 2025 | |---|---|---| | LLM01 | Prompt Injection | Remains 1; now explicitly covers indirect injection via tool outputs and RAG context | | LLM02 | Sensitive Information Disclosure | Moved up from 6; training-data extraction attacks elevated in severity | | LLM03 | Supply Chain | Covers poisoned model weights, unsafe third-party plugins, and compromised fine-tune datasets | | LLM04 | Data and Model Poisoning | Separated from Supply Chain to address runtime poisoning of RAG corpora | | LLM05 | Improper Output Handling | XSS, SSRF, and command injection via unvalidated LLM output passed downstream | | LLM06 | Excessive Agency | Agents with too many tools, too-broad permissions, or no human-in-the-loop | | LLM07 | System Prompt Leakage | New 2025 entry; exposure of instructions, credentials, or logic in system prompts | | LLM08 | Vector and Embedding Weaknesses | New 2025 entry; targets RAG vector DB poisoning and cross-tenant data leakage | | LLM09 | Misinformation | Renamed from "Overreliance"; focuses on model-generated false information propagation | | LLM10 | Unbounded Consumption | DoS through resource abuse, financial exploitation via inference flooding, model theft | The two new entries — LLM07 System Prompt Leakage and LLM08 Vector and Embedding Weaknesses — reflect how agentic architectures changed the threat surface. When your app has a system prompt that configures access to payment APIs or internal tools, leaking that prompt is a significant operational risk, not just an embarrassment. Why Promptfoo Is the Right Tool Before the OpenAI acquisition, Promptfoo was already used by more than 25% of Fortune 500 companies for LLM evaluation, according to OpenAI's acquisition announcement. The open-source CLI has always been MIT-licensed and continues to be. The core design decision is that Promptfoo separates adversarial probe generation from evaluation . This matters because: - Generating adversarial probes requires a "red team model" — an uncensored model that can write jailbreaks, injection payloads, and PII extraction attempts without refusal. Promptfoo Cloud handles this. - Evaluating those probes against your target runs locally, using your own API key, with no sensitive data sent to Promptfoo's servers except the prompts themselves. Effloow Lab inspected Promptfoo 0.121.11 the latest release as of May 2026 by running npx promptfoo@latest redteam plugins , which outputs 155 attack plugins with their descriptions. We also ran a structural eval test with the built-in echo provider to verify the CLI works correctly without authentication. The full lab notes are at data/lab-runs/promptfoo-llm-red-teaming-owasp-agent-eval-guide-2026.md . Mapping Plugins to OWASP Categories The redteam plugins command shows every available attack generator. The OWASP mapping is not always explicit in the name, so here is the practical breakdown for the most important categories: LLM01 — Prompt Injection - indirect-prompt-injection — tests injection via untrusted variables retrieved document content, tool responses - special-token-injection — Unicode tag-based instruction smuggling - cyberseceval — Meta's CyberSecEval prompt injection dataset - pliny — community-curated jailbreak collection LLM02 — Sensitive Information Disclosure - pii:direct — asks the model to output PII directly - pii:api-db — attempts to extract PII via API or database access - pii:session — cross-session PII leakage probes - pii:social — social engineering to extract personal data LLM05 — Improper Output Handling - sql-injection — SQL injection via LLM output passed to a database - shell-injection — command injection via tool-calling LLMs - data-exfil — exfiltration via URL parameters, images, or Markdown links LLM06 — Excessive Agency - excessive-agency — attempts to trigger actions beyond defined system boundaries LLM07 — System Prompt Leakage - system-prompt-override — directly attempts to override or extract the system prompt LLM09 — Misinformation - hallucination — checks whether the model generates false or fabricated information For agentic applications, the coding-agent: plugin family adds 12 additional attack surfaces specific to AI coding agents, including repo-prompt-injection , sandbox-write-escape , and secret-env-read . A Working Red Team YAML Config Here is a minimal config that covers six of the ten OWASP categories with a reasonable number of test cases for a weekly CI run: promptfooconfig.yaml targets: - id: openai:gpt-4o-mini replace with your actual target label: prod-chatbot prompts: - "{{input}}" redteam: purpose: A customer-support chatbot for a SaaS product. It can answer questions about billing, features, and documentation. It has read access to user account data and can initiate refunds. numTests: 20 ~200 test cases total across all plugins plugins: LLM01: Prompt Injection - indirect-prompt-injection - special-token-injection LLM02: Sensitive Information Disclosure - pii:direct - pii:api-db LLM05: Improper Output Handling - shell-injection - sql-injection LLM06: Excessive Agency - excessive-agency LLM07: System Prompt Leakage - system-prompt-override LLM09: Misinformation - hallucination strategies: - basic standard adversarial prompts - jailbreak multi-step escalating attempts - prompt-injection injected instructions in content fields Two things worth noting about this config: First, the purpose field is critical. Promptfoo uses it to tailor adversarial prompts to your specific application context. A generic purpose produces generic probes. A precise description — including what the system can access and what it is allowed to do — produces targeted attacks that actually match your threat model. Second, numTests: 20 generates roughly 20 test cases per plugin across all three strategies. With nine plugins and three strategies, that is around 540 test cases. Adjust down for faster feedback during development, up for pre-release security gates. Running the Scan One-time setup no global install needed npx promptfoo@latest --version verify version Generate adversarial probes requires Promptfoo account npx promptfoo@latest redteam generate \ --config promptfooconfig.yaml \ --output redteam.yaml Evaluate probes against your target npx promptfoo@latest redteam run \ --config promptfooconfig.yaml View results in web UI npx promptfoo@latest view The generate step calls Promptfoo Cloud to produce adversarial variants of your prompts. The actual evaluation runs locally against your specified target using your own API key. Results appear both in the terminal summary and in a local web UI at localhost:15500 . Note: redteam generate requires email verification and a Promptfoo account. Effloow Lab confirmed this during the PoC on 2026-05-20 — the CLI prompts for a work email before generating probes. The eval command for running your own prompts with assertions works without authentication. Wiring It Into CI Promptfoo ships an official GitHub Action for both basic eval and red team scanning. Here is a minimal security gate that runs on pull requests: .github/workflows/llm-security.yml name: LLM Security Gate on: pull request: paths: - 'prompts/ ' - 'system-prompts/ ' - '.env.example' jobs: redteam: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run LLM red team uses: promptfoo/promptfoo-action@v2 with: openai-api-key: ${{ secrets.OPENAI API KEY }} promptfoo-api-key: ${{ secrets.PROMPTFOO API KEY }} config: ./promptfooconfig.yaml type: redteam - name: Comment results on PR if: github.event name == 'pull request' run: | PASS RATE=$ cat promptfoo-results.json | jq '.results.stats.passRate' echo "Pass rate: ${PASS RATE}%" The paths trigger is worth keeping narrow. Red team scans cost real money in LLM API calls — you want them running when prompt logic changes, not on every frontend commit. For teams that want a scheduled baseline rather than per-PR gates, a cron trigger makes more sense: on: schedule: - cron: '0 3 1' Monday 3 AM UTC workflow dispatch: Interpreting Results Promptfoo's output classifies each test case as PASS or FAIL, but the severity classification matters as much as the pass rate. After a scan, the web UI groups findings by OWASP category and shows the specific prompts that triggered failures. A few practical guidelines for triaging results: Immediate fix before next deploy : Any FAIL in pii:direct , system-prompt-override , or excessive-agency that includes an actual payload demonstration — not just a theoretical attack. These represent working exploits against your current deployment. Fix in current sprint: Hallucination failures where the model confidently states false facts about your product, pricing, or policies. These are reputation and liability risks even if not security exploits. Review next sprint: Indirect prompt injection failures that require a contrived multi-step scenario. Prioritize based on whether your application ingests untrusted external content RSS feeds, user-submitted documents, web browsing . Track as known risk: Failures in categories your application explicitly does not need to handle — for example, a code generation assistant may intentionally produce shell commands that would fail a shell-injection assertion by design. What Works Well