Promptfoo: LLM Red Teaming Against OWASP Top 10

The open-source tool Promptfoo, acquired by OpenAI in March 2026, maps its 155 attack plugins to the OWASP LLM Top 10 2025 list for structured red teaming of LLM-powered products. It details the 2025 revision's new categories, including System Prompt Leakage and Vector and Embedding Weaknesses, and provides practical YAML configuration examples for testing vulnerabilities like prompt injection, sensitive information disclosure, and excessive agency in CI pipelines.

If you ship an LLM-powered product and have not run a structured red team against it, you are flying blind on security. The OWASP LLM Top 10 2025 released November 2024 now gives you a canonical list of attack categories to test against — and Promptfoo, the open-source tool that OpenAI acquired in March 2026 for its enterprise security reach, maps its 155 attack plugins directly to that list. This guide walks through exactly how that mapping works, what a working YAML config looks like, and how to wire it into a CI pipeline before a bad actor does it for you. What the OWASP LLM Top 10 2025 Actually Covers The 2025 edition is a substantial revision from the 2023 original. Two new categories were added, several were renamed, and the ordering shifted to reflect real-world incident data from the intervening year. Here is the full current list: | ID | Category | What Changed in 2025 | |---|---|---| | LLM01 | Prompt Injection | Remains 1; now explicitly covers indirect injection via tool outputs and RAG context | | LLM02 | Sensitive Information Disclosure | Moved up from 6; training-data extraction attacks elevated in severity | | LLM03 | Supply Chain | Covers poisoned model weights, unsafe third-party plugins, and compromised fine-tune datasets | | LLM04 | Data and Model Poisoning | Separated from Supply Chain to address runtime poisoning of RAG corpora | | LLM05 | Improper Output Handling | XSS, SSRF, and command injection via unvalidated LLM output passed downstream | | LLM06 | Excessive Agency | Agents with too many tools, too-broad permissions, or no human-in-the-loop | | LLM07 | System Prompt Leakage | New 2025 entry; exposure of instructions, credentials, or logic in system prompts | | LLM08 | Vector and Embedding Weaknesses | New 2025 entry; targets RAG vector DB poisoning and cross-tenant data leakage | | LLM09 | Misinformation | Renamed from "Overreliance"; focuses on model-generated false information propagation | | LLM10 | Unbounded Consumption | DoS through resource abuse, financial exploitation via inference flooding, model theft | The two new entries — LLM07 System Prompt Leakage and LLM08 Vector and Embedding Weaknesses — reflect how agentic architectures changed the threat surface. When your app has a system prompt that configures access to payment APIs or internal tools, leaking that prompt is a significant operational risk, not just an embarrassment. Why Promptfoo Is the Right Tool Before the OpenAI acquisition, Promptfoo was already used by more than 25% of Fortune 500 companies for LLM evaluation, according to OpenAI's acquisition announcement. The open-source CLI has always been MIT-licensed and continues to be. The core design decision is that Promptfoo separates adversarial probe generation from evaluation . This matters because: - Generating adversarial probes requires a "red team model" — an uncensored model that can write jailbreaks, injection payloads, and PII extraction attempts without refusal. Promptfoo Cloud handles this. - Evaluating those probes against your target runs locally, using your own API key, with no sensitive data sent to Promptfoo's servers except the prompts themselves. Effloow Lab inspected Promptfoo 0.121.11 the latest release as of May 2026 by running npx promptfoo@latest redteam plugins , which outputs 155 attack plugins with their descriptions. We also ran a structural eval test with the built-in echo provider to verify the CLI works correctly without authentication. The full lab notes are at data/lab-runs/promptfoo-llm-red-teaming-owasp-agent-eval-guide-2026.md . Mapping Plugins to OWASP Categories The redteam plugins command shows every available attack generator. The OWASP mapping is not always explicit in the name, so here is the practical breakdown for the most important categories: LLM01 — Prompt Injection - indirect-prompt-injection — tests injection via untrusted variables retrieved document content, tool responses - special-token-injection — Unicode tag-based instruction smuggling - cyberseceval — Meta's CyberSecEval prompt injection dataset - pliny — community-curated jailbreak collection LLM02 — Sensitive Information Disclosure - pii:direct — asks the model to output PII directly - pii:api-db — attempts to extract PII via API or database access - pii:session — cross-session PII leakage probes - pii:social — social engineering to extract personal data LLM05 — Improper Output Handling - sql-injection — SQL injection via LLM output passed to a database - shell-injection — command injection via tool-calling LLMs - data-exfil — exfiltration via URL parameters, images, or Markdown links LLM06 — Excessive Agency - excessive-agency — attempts to trigger actions beyond defined system boundaries LLM07 — System Prompt Leakage - system-prompt-override — directly attempts to override or extract the system prompt LLM09 — Misinformation - hallucination — checks whether the model generates false or fabricated information For agentic applications, the coding-agent: plugin family adds 12 additional attack surfaces specific to AI coding agents, including repo-prompt-injection , sandbox-write-escape , and secret-env-read . A Working Red Team YAML Config Here is a minimal config that covers six of the ten OWASP categories with a reasonable number of test cases for a weekly CI run: promptfooconfig.yaml targets: - id: openai:gpt-4o-mini replace with your actual target label: prod-chatbot prompts: - "{{input}}" redteam: purpose: A customer-support chatbot for a SaaS product. It can answer questions about billing, features, and documentation. It has read access to user account data and can initiate refunds. numTests: 20 ~200 test cases total across all plugins plugins: LLM01: Prompt Injection - indirect-prompt-injection - special-token-injection LLM02: Sensitive Information Disclosure - pii:direct - pii:api-db LLM05: Improper Output Handling - shell-injection - sql-injection LLM06: Excessive Agency - excessive-agency LLM07: System Prompt Leakage - system-prompt-override LLM09: Misinformation - hallucination strategies: - basic standard adversarial prompts - jailbreak multi-step escalating attempts - prompt-injection injected instructions in content fields Two things worth noting about this config: First, the purpose field is critical. Promptfoo uses it to tailor adversarial prompts to your specific application context. A generic purpose produces generic probes. A precise description — including what the system can access and what it is allowed to do — produces targeted attacks that actually match your threat model. Second, numTests: 20 generates roughly 20 test cases per plugin across all three strategies. With nine plugins and three strategies, that is around 540 test cases. Adjust down for faster feedback during development, up for pre-release security gates. Running the Scan One-time setup no global install needed npx promptfoo@latest --version verify version Generate adversarial probes requires Promptfoo account npx promptfoo@latest redteam generate \ --config promptfooconfig.yaml \ --output redteam.yaml Evaluate probes against your target npx promptfoo@latest redteam run \ --config promptfooconfig.yaml View results in web UI npx promptfoo@latest view The generate step calls Promptfoo Cloud to produce adversarial variants of your prompts. The actual evaluation runs locally against your specified target using your own API key. Results appear both in the terminal summary and in a local web UI at localhost:15500 . Note: redteam generate requires email verification and a Promptfoo account. Effloow Lab confirmed this during the PoC on 2026-05-20 — the CLI prompts for a work email before generating probes. The eval command for running your own prompts with assertions works without authentication. Wiring It Into CI Promptfoo ships an official GitHub Action for both basic eval and red team scanning. Here is a minimal security gate that runs on pull requests: .github/workflows/llm-security.yml name: LLM Security Gate on: pull request: paths: - 'prompts/ ' - 'system-prompts/ ' - '.env.example' jobs: redteam: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run LLM red team uses: promptfoo/promptfoo-action@v2 with: openai-api-key: ${{ secrets.OPENAI API KEY }} promptfoo-api-key: ${{ secrets.PROMPTFOO API KEY }} config: ./promptfooconfig.yaml type: redteam - name: Comment results on PR if: github.event name == 'pull request' run: | PASS RATE=$ cat promptfoo-results.json | jq '.results.stats.passRate' echo "Pass rate: ${PASS RATE}%" The paths trigger is worth keeping narrow. Red team scans cost real money in LLM API calls — you want them running when prompt logic changes, not on every frontend commit. For teams that want a scheduled baseline rather than per-PR gates, a cron trigger makes more sense: on: schedule: - cron: '0 3 1' Monday 3 AM UTC workflow dispatch: Interpreting Results Promptfoo's output classifies each test case as PASS or FAIL, but the severity classification matters as much as the pass rate. After a scan, the web UI groups findings by OWASP category and shows the specific prompts that triggered failures. A few practical guidelines for triaging results: Immediate fix before next deploy : Any FAIL in pii:direct , system-prompt-override , or excessive-agency that includes an actual payload demonstration — not just a theoretical attack. These represent working exploits against your current deployment. Fix in current sprint: Hallucination failures where the model confidently states false facts about your product, pricing, or policies. These are reputation and liability risks even if not security exploits. Review next sprint: Indirect prompt injection failures that require a contrived multi-step scenario. Prioritize based on whether your application ingests untrusted external content RSS feeds, user-submitted documents, web browsing . Track as known risk: Failures in categories your application explicitly does not need to handle — for example, a code generation assistant may intentionally produce shell commands that would fail a shell-injection assertion by design. What Works Well <ul <li 155 plugins cover a wide attack surface with minimal config</li <li YAML-first config is version-controllable and reviewable in PRs</li <li Local evaluation means sensitive prompts stay in your infrastructure</li <li GitHub Action integrates in under 30 lines</li <li Echo provider lets you validate config structure without API costs</li </ul What to Watch <ul <li Adversarial probe generation requires Promptfoo account email verification </li <li Full scans with 20+ plugins run hundreds of LLM calls — budget accordingly</li <li The <code owasp:llm</code meta-preset is referenced in docs but resolves server-side</li <li Post-OpenAI acquisition, enterprise pricing direction is unclear</li <li False positives increase with broader <code purpose</code descriptions</li </ul The Agentic AI Extension OWASP released a separate Top 10 for Agentic Applications in December 2025, announced at Black Hat Europe. Promptfoo maps its coding-agent: plugin family and the broader agentic: namespace to this list. The key risks specific to agents that do not appear in the standard LLM Top 10: - Memory poisoning agentic:memory-poisoning — injecting false data into an agent's persistent memory store - Automation hijacking coding-agent:automation-poisoning — modifying CI scripts, hooks, or scheduled jobs to persist unsafe behavior after the immediate task completes - Sandbox escape coding-agent:sandbox-write-escape , coding-agent:sandbox-read-escape — reading or writing outside the intended workspace - Delayed exfiltration coding-agent:delayed-ci-exfil — planting workflow changes that leak data after the evaluation run completes If your application uses tool-calling or multi-step planning, run both the standard OWASP LLM config and the agentic plugin set. Frequently Asked Questions Q: Do I need a Promptfoo account to use it at all? No. The promptfoo eval command — which runs assertion-based tests against any provider using your own prompts — works without authentication. You only need an account for promptfoo redteam generate , which uses Promptfoo's cloud models to generate adversarial probes. You can hand-write test cases in a redteam.yaml and run promptfoo eval against them without ever signing up. Q: How does Promptfoo compare to Microsoft PyRIT or Garak? PyRIT is a Python framework from Microsoft's AI Red Team, better suited to researchers writing custom attack logic. Garak is similarly research-oriented with strong dataset coverage but no CI integration. Promptfoo sits in the practitioner tier: less flexible than PyRIT for novel research, but far easier to integrate into a standard dev workflow via YAML and GitHub Actions. Q: Does the OpenAI acquisition change anything for open-source users? As of May 2026, the repo remains MIT-licensed and the CLI is fully functional. OpenAI's stated intent is to integrate Promptfoo's technology into its Frontier enterprise platform while keeping the open-source tool available. Whether that changes pricing or rate limits for the cloud generation service is not yet public. Q: What is the minimum viable config for a solo developer? Three plugins cover the most commonly exploited categories with a reasonable test count: plugins: - indirect-prompt-injection - pii:direct - excessive-agency strategies: - basic Run this weekly with numTests: 10 per plugin. That is 30–90 API calls depending on strategy expansion — cheap enough to run regularly, targeted enough to catch the most common issues. Key Takeaways The OWASP LLM Top 10 2025 gives you a peer-reviewed threat model. Promptfoo gives you an automated way to test against it. The combination works because Promptfoo's plugin taxonomy was built with OWASP categories in mind, and the YAML config format makes security testing a first-class part of your repository rather than a one-off audit. The practical path for most teams: - Add promptfooconfig.yaml to your repo with the plugins that match your threat model - Run promptfoo eval on every PR that modifies prompt logic no auth required - Run promptfoo redteam run on a weekly schedule or before major releases - Triage FAIL results by category, starting with PII and system prompt leakage Bottom Line Promptfoo 0.121 is the most practical path from "we should test our LLM app for security" to "we have a CI gate that runs 500+ adversarial probes against OWASP categories on every release." The echo provider and local eval work without any account; the full red team needs a Promptfoo login but remains the fastest way to get an OWASP LLM Top 10 scan report on an LLM-powered application.