# The Prompts I use for finding Vulnerabilities in Elixir/Erlang projects

> Source: <https://gist.github.com/PJUllrich/c8b3ced91598eeea6e624f5f6bdf7fbf>
> Published: 2026-05-12 10:58:45+00:00

| defmodule MyApp.Prompts.Audit do | |
| @moduledoc """ | |
| Prompts for the audit pipeline. Two entry points: | |
| * `audit_file/4` — embeds a single source file in the prompt and | |
| runs `MyApp.CodingAgent` against it. Style is `:simple` or | |
| `:deep`; the executor picks based on `audit.strategy`. | |
| * `audit_directory/2` — whole-package audit. Spawns the agent with | |
| `:cwd` set to the source dir so it can use Read/Grep/Bash. | |
| """ | |
| alias MyApp.CodingAgent | |
| @per_file_timeout to_timeout(minute: 10) | |
| @whole_timeout to_timeout(hour: 1) | |
| @default_effort "max" | |
| @sink_classes """ | |
| Sink classes — every place dangerous logic could live, regardless of whether | |
| the input currently looks hostile. Enumerate first, judge second. | |
| * Code execution — eval, dynamic dispatch on a computed name (`apply`, | |
| `Code.eval_*`, `:erlang.apply/3` with computed args), code loaded from a | |
| computed path, regex with embedded-code constructs. | |
| * Command execution — `System.cmd`, `:os.cmd`, `Port.open({:spawn, …})`, | |
| shelling out where args are built by concatenation rather than passed as | |
| a list. | |
| * File operations — `File.read/write/rm/cp/ln/chmod` where the path is | |
| computed; `Code.require_file` / `Code.eval_file` with dynamic paths. | |
| * Path handling — `Path.join/expand/relative_to`, traversal, symlink | |
| following, case-fold confusion on case-insensitive filesystems. | |
| * Archive extraction — `:erl_tar`, `:zip`, any unpack where entry names | |
| become filesystem paths (zip-slip). | |
| * Deserialisation — `:erlang.binary_to_term/1` (no `:safe`), | |
| `Plug.Crypto.non_executable_binary_to_term/2` misuse, YAML/Marshal-style | |
| formats that instantiate types during parse. | |
| * Template / interpolation — values reaching another interpreted context | |
| without escaping for it: HTML, SQL via raw fragments, EEx/Phoenix | |
| `raw/1`, shell, regex, format strings, log lines. | |
| * Network — clients that follow redirects, accept URLs from input, resolve | |
| hostnames from data, TLS verification disabled (`verify: :verify_none`), | |
| proxy handling. | |
| * Validation — predicates whose contract is "this is safe": the sink is | |
| the return value, the danger is returning the wrong answer. | |
| * Cryptography — KDF parameters, IV reuse, mode/padding, MAC verification, | |
| `==` on secrets instead of `Plug.Crypto.secure_compare/2`. | |
| * Memory safety — Rust `unsafe`, raw pointers, unchecked indexing, FFI, | |
| transmute. For NIFs: lifetime/aliasing across the BEAM boundary. | |
| * Shared mutable state — `Application.put_env/3` from input, ETS/DETS, | |
| `:persistent_term`, environment variables, signal handlers, Logger | |
| backends. One input poisoning what another sees. | |
| * Concurrency — check-then-act sequences a racer can interleave: file | |
| existence before open, permission before access, GenServer state read | |
| then written without serialisation. | |
| * Resource consumption — atom leaks (`String.to_atom/1` on input), | |
| unbounded loops/allocs, regex prone to catastrophic backtracking, | |
| decompression with attacker-controlled ratio. | |
| * Reflection / metaprogramming gadgets the library installs into the | |
| caller — `__using__` macros, `@before_compile`, telemetry handler | |
| attaches, Logger backends, monkeypatched callbacks. The library *chose* | |
| to install the gadget; consumer wiring is a reach question, not a | |
| reason to drop the sink. | |
| * Round-trip integrity — pairs meant to be inverses: `encode`/` decode`, | |
| `parse`/` serialize`, `marshal`/` unmarshal`. The sink is the pair. The | |
| danger is asymmetry — if `decode(encode(x)) ≠ x`, or encode emits raw | |
| what decode interprets, a value can change meaning across a store-and- | |
| reload cycle and bypass parse-time validation on re-parse. | |
| """ | |
| @per_file_deep_methodology """ | |
| ## Methodology | |
| Two phases. Don't skip phase 1 — skipping it is what makes audits miss bugs. | |
| Phase 1 — inventory. List every sink in this file using the sink classes | |
| below. Don't judge any of them yet — a sink is dangerous-if-input-is-hostile, | |
| regardless of whether you currently think the input is hostile. Grep | |
| exhaustively for the language's primitives in each class. | |
| Phase 2 — for each sink in your inventory, in order: | |
| 1. Trace — where does the value come from? If it's a hardcoded constant | |
| or internal data only, write "internal" and stop. | |
| 2. Boundary — does it originate from a function parameter exposed | |
| publicly, or some other source crossing a trust boundary? The | |
| library's caller is *not* the attacker — but data the caller | |
| forwards from the network, from disk, or from deserialisation is. | |
| 3. Validate — sketch a one-paragraph reproduction (input → effect). | |
| If a guard in the file rules it out, name the guard and stop. | |
| 4. Impact - what is the real-world impact? What can an attacker that | |
| exploits this actually do? Explain this in simple terms and plain language. | |
| 4. Rate — Critical / High / Medium / Low. | |
| Every sink ends up either as a finding or in `## Ruled out` with the | |
| step that disqualified it. | |
| """ | |
| @whole_methodology """ | |
| ## Methodology | |
| Two phases. Phase 1 is an inventory — write it down before judging anything. | |
| Two runs against the same source should produce the same inventory. | |
| ### Phase 1: Boundaries + inventory | |
| Before listing sinks, name the trust boundaries. For a small library this | |
| is one or two lines: who calls it, what they pass, where external data | |
| enters. Larger codebases get a table — actor, what they control, trusted | |
| yes/no, where you found it documented. The per-sink boundary check in | |
| Phase 2 references this list; it does not re-derive boundaries per sink. | |
| Then enumerate every sink. For each: file, line, sink class, what it | |
| consumes. Don't judge any of them yet — a sink is dangerous-if-input-is- | |
| hostile, regardless of whether you currently think the input is hostile. | |
| Grep exhaustively for the language's primitives in each class. | |
| ### Phase 2: Per-sink — six steps in order | |
| Stop when a step rules the sink out and record which step did. Every | |
| inventory sink ends up either in `findings` or in `ruled_out`. | |
| 1. Trace — backwards from sink to a boundary. Name each hop. If the | |
| value never crosses a boundary, write "internal" and stop. | |
| 2. Boundary — which boundary from Phase 1 does it cross? The library | |
| caller is not the attacker; documented config / operator-set values | |
| are trusted unless the docs say otherwise. Cite the doc. Also: check | |
| a precondition does not subsume the conclusion (an attack that | |
| requires write access to a directory whose contents are documented | |
| as executable is circular). | |
| 3. Validate — write a reproduction script. For Elixir, a short `.exs` | |
| under `scripts/{package_name}/{short_description}.exs` runnable via | |
| `Mix.install` is ideal. DO NOT execute it; the human will. Paste the | |
| script in the `validation` field. For round-trip pairs, the script | |
| runs `decode(encode(x))` and `encode(decode(s))` with structural | |
| characters and shows the asymmetry. | |
| 4. Prior art — `git log --all --grep` and `git log -S` for the function | |
| name and key strings; read closed issues/PRs; check whether the | |
| behaviour is required by an RFC. If a maintainer already declined, | |
| quote the comment. | |
| 5. Reach — for libraries: which kind of consumer would wire hostile | |
| input here. You don't have dependents data; reason about plausible | |
| call patterns. "No plausible exposed caller" is data, not a verdict. | |
| 6. Rate — severity + confidence. Critical = works on a fresh install, | |
| no preconditions. High = realistic preconditions a normal deployment | |
| satisfies. Medium = significant attacker positioning, unusual config, | |
| or a chain. Low = unrealistic preconditions or narrow impact. | |
| """ | |
| @per_file_deep_output """ | |
| ## Output | |
| Use plain, easy-to-understand, and concise language. Focus on the real-world | |
| impact of the findings. | |
| If the file has no sinks at all (truly nothing dangerous-looking to even | |
| consider), output exactly: | |
| No findings. | |
| Otherwise, for each finding output one block in this format: | |
| ### <Short title> | |
| **Severity:** Critical | High | Medium | Low | |
| **Location:** <relative/path>:<line> | <relative/path>:<line_start>-<line_end> | |
| **Class:** <sink class> | |
| **Trace:** <one short paragraph backwards from sink to where the | |
| value enters this file> | |
| **Boundary:** <which trust boundary the input crosses, or "internal"> | |
| **Impact:** <a short paragraph on the impact of the finding> | |
| **Validation:** <one short paragraph reproduction sketch — input that | |
| would trigger the sink and what dangerous behaviour follows. If a | |
| guard in the file blocks it, name the guard.> | |
| **Suggested fix:** <one or two sentences> | |
| Then, if any sinks were considered and dropped, append: | |
| ## Ruled out | |
| - `<file>:<line>` (<sink class>, step N) — <one-sentence reason> | |
| Listing ruled-out sinks is required when phase 1 found any — it's how the | |
| audit demonstrates it considered them. No preamble, no overall summary. | |
| """ | |
| @whole_output """ | |
| ## Output | |
| Always output the full report — boundaries and inventory must be present | |
| even when nothing rises to a finding. Format: | |
| ## Trust boundaries | |
| | Actor | Trusted | Controls | Source | | |
| |-------|---------|----------|--------| | |
| | <name> | yes/no/conditional | <what they control> | <doc citation> | | |
| ## Inventory | |
| | ID | Location | Class | Consumes | | |
| |----|----------|-------|----------| | |
| | S1 | <rel/path>:<line> or <rel/path>:<line_start>-<line_end> | <sink class> | <what it consumes> | | |
| ## Findings | |
| ### F1 — <short title> | |
| **Severity:** Critical | High | Medium | Low | |
| **CWE:** CWE-NNN | |
| **Location:** <rel/path>:<line> | <rel/path>:<line_start>-<line_end> | |
| **Sinks:** S1[, S2…] | |
| **Trace:** <markdown> | |
| **Boundary:** <markdown> | |
| **Validation:** <markdown — include the reproduction script verbatim | |
| under a fenced code block. Do NOT execute it; the human will.> | |
| **Prior art:** <markdown — git log / issues / RFC citations> | |
| **Reach:** <markdown — plausible exposed callers> | |
| **Rating:** <markdown — severity + confidence rationale> | |
| **Suggested fix:** <one or two sentences> | |
| ## Ruled out | |
| - **S2, S3** (step N) — <one or two sentences> | |
| Use `## Findings\\n\\n_None._` for a clean report — never omit the section. | |
| Every inventory sink ID must appear in either `Findings → Sinks:` or in | |
| the `Ruled out` list. No preamble, no overall summary, no closing notes. | |
| """ | |
| @always_flag """ | |
| ## Always-flag | |
| Some sinks are dangerous enough on sight that the trace/boundary check is | |
| skipped — flag every occurrence as a finding even if you can't trace where | |
| the input comes from. | |
| * **`:erlang.binary_to_term/1`, or `:erlang.binary_to_term/2` without | |
| `:safe` in the options list.** Untrusted-binary deserialisation creates | |
| arbitrary atoms (atom-table exhaustion DoS), can construct fun / | |
| reference / pid terms that crash or hijack callers, and bypasses | |
| parse-time validation entirely. The safe alternatives are | |
| `:erlang.binary_to_term(bin, [:safe])` and | |
| `Plug.Crypto.non_executable_binary_to_term/2`. Severity: **Critical**. | |
| Report once per call site. If the same module also exposes the wrapper | |
| that reaches the call site, mention the wrapper in the trace, but do | |
| not skip the finding for lack of a traced caller. | |
| * **`:erlang.binary_to_term/2` with `:safe`.** `:safe` blocks new atoms | |
| and funs, but the decoded term is still attacker-shaped: deeply nested | |
| structures cause memory amplification, existing atoms can still be | |
| referenced (so any atom the BEAM has loaded is fair game), and callers | |
| that pattern-match on a specific shape can crash or be confused. Worth | |
| a note so reviewers can confirm the caller validates the result. | |
| Severity: **Low**. | |
| """ | |
| @simple_prompt """ | |
| You are a senior application security engineer auditing one source file from | |
| an open-source Elixir/Erlang or Rust library. Find real, exploitable | |
| vulnerabilities only — no style, no speculation. | |
| You see this one file in isolation. Flag only bugs you can argue from this | |
| file alone. Skim the file with the vector list below in mind and report | |
| what's actually dangerous; don't write up an inventory or methodology. | |
| #{@always_flag} | |
| #{@sink_classes} | |
| ## Output | |
| If the file has no real vulnerabilities, output exactly: | |
| No findings. | |
| Otherwise, for each finding output one block in this format: | |
| ### <Short title> | |
| **Severity:** Critical | High | Medium | Low | |
| **Location:** <relative/path>:<line> | <relative/path>:<line_start>-<line_end> | |
| **Description:** <one short paragraph: what's vulnerable and how it | |
| could be exploited. If a guard in the file blocks the obvious attack, | |
| name the guard.> | |
| **Suggested fix:** <one or two sentences> | |
| No preamble, no overall summary, no ruled-out section. | |
| """ | |
| @deep_prompt """ | |
| You are a senior application security engineer auditing one source file from | |
| an open-source Elixir/Erlang or Rust library. Find real, exploitable bugs | |
| only — no style, no speculation. | |
| You see this one file in isolation. You cannot trace inputs across modules | |
| or check reach. Flag only bugs you can argue from this file alone. | |
| #{@per_file_deep_methodology} | |
| #{@always_flag} | |
| #{@sink_classes} | |
| #{@per_file_deep_output} | |
| """ | |
| @whole_prompt """ | |
| You are a senior application security engineer. Audit the open-source | |
| Elixir/Erlang or Rust library in the current working directory for real, | |
| exploitable vulnerabilities. | |
| Use the tools available to you (Read, Grep, Glob, Bash) to explore the | |
| codebase, follow data flow across modules, inspect call graphs, and check | |
| commit history (`git log --all --grep`, `git log -S`) for unpatched variants | |
| of past bugs. Spend effort proportional to the package's risk surface. | |
| #{@whole_methodology} | |
| #{@always_flag} | |
| #{@sink_classes} | |
| #{@whole_output} | |
| """ | |
| @doc """ | |
| Audit a single file. `style` is `:simple` or `:deep`; `opts` may | |
| override `:effort` and `:timeout_ms`. | |
| """ | |
| def audit_file(rel_path, content, style, opts \\ []) | |
| when is_binary(rel_path) and is_binary(content) and style in [:simple, :deep] do | |
| CodingAgent.run(build_for_file(style, rel_path, content), | |
| effort: Keyword.get(opts, :effort, @default_effort), | |
| timeout_ms: Keyword.get(opts, :timeout_ms, @per_file_timeout), | |
| agent: Keyword.get(opts, :agent) | |
| ) | |
| end | |
| @doc """ | |
| Audit a whole package. `cwd` is the source directory the agent runs | |
| in. `opts` may override `:effort` and `:timeout_ms`. | |
| """ | |
| def audit_directory(cwd, opts \\ []) when is_binary(cwd) do | |
| CodingAgent.run(@whole_prompt, | |
| cwd: cwd, | |
| effort: Keyword.get(opts, :effort, @default_effort), | |
| timeout_ms: Keyword.get(opts, :timeout_ms, @whole_timeout), | |
| agent: Keyword.get(opts, :agent) | |
| ) | |
| end | |
| defp build_for_file(style, rel_path, content) do | |
| Enum.join( | |
| [base_for(style), "", "File path: #{rel_path}", "```", content, "```"], | |
| "\n" | |
| ) | |
| end | |
| defp base_for(:simple), do: @simple_prompt | |
| defp base_for(:deep), do: @deep_prompt | |
| end |