{"slug": "the-prompts-i-use-for-finding-vulnerabilities-in-elixir-erlang-projects", "title": "The Prompts I use for finding Vulnerabilities in Elixir/Erlang projects", "summary": "A developer has created a structured prompt system for auditing Elixir and Erlang projects, defining two entry points: `audit_file/4` for single-file analysis and `audit_directory/2` for whole-package audits. The system enumerates 17 sink classes—including code execution, command execution, file operations, and deserialization—as categories where vulnerabilities may reside, regardless of whether input appears hostile. The methodology employs a two-phase approach with configurable `:simple` or `:deep` strategies, using `MyApp.CodingAgent` to scan source files and directories for security flaws.", "body_md": "| defmodule MyApp.Prompts.Audit do | |\n| @moduledoc \"\"\" | |\n| Prompts for the audit pipeline. Two entry points: | |\n| * `audit_file/4` — embeds a single source file in the prompt and | |\n| runs `MyApp.CodingAgent` against it. Style is `:simple` or | |\n| `:deep`; the executor picks based on `audit.strategy`. | |\n| * `audit_directory/2` — whole-package audit. Spawns the agent with | |\n| `:cwd` set to the source dir so it can use Read/Grep/Bash. | |\n| \"\"\" | |\n| alias MyApp.CodingAgent | |\n| @per_file_timeout to_timeout(minute: 10) | |\n| @whole_timeout to_timeout(hour: 1) | |\n| @default_effort \"max\" | |\n| @sink_classes \"\"\" | |\n| Sink classes — every place dangerous logic could live, regardless of whether | |\n| the input currently looks hostile. Enumerate first, judge second. | |\n| * Code execution — eval, dynamic dispatch on a computed name (`apply`, | |\n| `Code.eval_*`, `:erlang.apply/3` with computed args), code loaded from a | |\n| computed path, regex with embedded-code constructs. | |\n| * Command execution — `System.cmd`, `:os.cmd`, `Port.open({:spawn, …})`, | |\n| shelling out where args are built by concatenation rather than passed as | |\n| a list. | |\n| * File operations — `File.read/write/rm/cp/ln/chmod` where the path is | |\n| computed; `Code.require_file` / `Code.eval_file` with dynamic paths. | |\n| * Path handling — `Path.join/expand/relative_to`, traversal, symlink | |\n| following, case-fold confusion on case-insensitive filesystems. | |\n| * Archive extraction — `:erl_tar`, `:zip`, any unpack where entry names | |\n| become filesystem paths (zip-slip). | |\n| * Deserialisation — `:erlang.binary_to_term/1` (no `:safe`), | |\n| `Plug.Crypto.non_executable_binary_to_term/2` misuse, YAML/Marshal-style | |\n| formats that instantiate types during parse. | |\n| * Template / interpolation — values reaching another interpreted context | |\n| without escaping for it: HTML, SQL via raw fragments, EEx/Phoenix | |\n| `raw/1`, shell, regex, format strings, log lines. | |\n| * Network — clients that follow redirects, accept URLs from input, resolve | |\n| hostnames from data, TLS verification disabled (`verify: :verify_none`), | |\n| proxy handling. | |\n| * Validation — predicates whose contract is \"this is safe\": the sink is | |\n| the return value, the danger is returning the wrong answer. | |\n| * Cryptography — KDF parameters, IV reuse, mode/padding, MAC verification, | |\n| `==` on secrets instead of `Plug.Crypto.secure_compare/2`. | |\n| * Memory safety — Rust `unsafe`, raw pointers, unchecked indexing, FFI, | |\n| transmute. For NIFs: lifetime/aliasing across the BEAM boundary. | |\n| * Shared mutable state — `Application.put_env/3` from input, ETS/DETS, | |\n| `:persistent_term`, environment variables, signal handlers, Logger | |\n| backends. One input poisoning what another sees. | |\n| * Concurrency — check-then-act sequences a racer can interleave: file | |\n| existence before open, permission before access, GenServer state read | |\n| then written without serialisation. | |\n| * Resource consumption — atom leaks (`String.to_atom/1` on input), | |\n| unbounded loops/allocs, regex prone to catastrophic backtracking, | |\n| decompression with attacker-controlled ratio. | |\n| * Reflection / metaprogramming gadgets the library installs into the | |\n| caller — `__using__` macros, `@before_compile`, telemetry handler | |\n| attaches, Logger backends, monkeypatched callbacks. The library *chose* | |\n| to install the gadget; consumer wiring is a reach question, not a | |\n| reason to drop the sink. | |\n| * Round-trip integrity — pairs meant to be inverses: `encode`/` decode`, | |\n| `parse`/` serialize`, `marshal`/` unmarshal`. The sink is the pair. The | |\n| danger is asymmetry — if `decode(encode(x)) ≠ x`, or encode emits raw | |\n| what decode interprets, a value can change meaning across a store-and- | |\n| reload cycle and bypass parse-time validation on re-parse. | |\n| \"\"\" | |\n| @per_file_deep_methodology \"\"\" | |\n| ## Methodology | |\n| Two phases. Don't skip phase 1 — skipping it is what makes audits miss bugs. | |\n| Phase 1 — inventory. List every sink in this file using the sink classes | |\n| below. Don't judge any of them yet — a sink is dangerous-if-input-is-hostile, | |\n| regardless of whether you currently think the input is hostile. Grep | |\n| exhaustively for the language's primitives in each class. | |\n| Phase 2 — for each sink in your inventory, in order: | |\n| 1. Trace — where does the value come from? If it's a hardcoded constant | |\n| or internal data only, write \"internal\" and stop. | |\n| 2. Boundary — does it originate from a function parameter exposed | |\n| publicly, or some other source crossing a trust boundary? The | |\n| library's caller is *not* the attacker — but data the caller | |\n| forwards from the network, from disk, or from deserialisation is. | |\n| 3. Validate — sketch a one-paragraph reproduction (input → effect). | |\n| If a guard in the file rules it out, name the guard and stop. | |\n| 4. Impact - what is the real-world impact? What can an attacker that | |\n| exploits this actually do? Explain this in simple terms and plain language. | |\n| 4. Rate — Critical / High / Medium / Low. | |\n| Every sink ends up either as a finding or in `## Ruled out` with the | |\n| step that disqualified it. | |\n| \"\"\" | |\n| @whole_methodology \"\"\" | |\n| ## Methodology | |\n| Two phases. Phase 1 is an inventory — write it down before judging anything. | |\n| Two runs against the same source should produce the same inventory. | |\n| ### Phase 1: Boundaries + inventory | |\n| Before listing sinks, name the trust boundaries. For a small library this | |\n| is one or two lines: who calls it, what they pass, where external data | |\n| enters. Larger codebases get a table — actor, what they control, trusted | |\n| yes/no, where you found it documented. The per-sink boundary check in | |\n| Phase 2 references this list; it does not re-derive boundaries per sink. | |\n| Then enumerate every sink. For each: file, line, sink class, what it | |\n| consumes. Don't judge any of them yet — a sink is dangerous-if-input-is- | |\n| hostile, regardless of whether you currently think the input is hostile. | |\n| Grep exhaustively for the language's primitives in each class. | |\n| ### Phase 2: Per-sink — six steps in order | |\n| Stop when a step rules the sink out and record which step did. Every | |\n| inventory sink ends up either in `findings` or in `ruled_out`. | |\n| 1. Trace — backwards from sink to a boundary. Name each hop. If the | |\n| value never crosses a boundary, write \"internal\" and stop. | |\n| 2. Boundary — which boundary from Phase 1 does it cross? The library | |\n| caller is not the attacker; documented config / operator-set values | |\n| are trusted unless the docs say otherwise. Cite the doc. Also: check | |\n| a precondition does not subsume the conclusion (an attack that | |\n| requires write access to a directory whose contents are documented | |\n| as executable is circular). | |\n| 3. Validate — write a reproduction script. For Elixir, a short `.exs` | |\n| under `scripts/{package_name}/{short_description}.exs` runnable via | |\n| `Mix.install` is ideal. DO NOT execute it; the human will. Paste the | |\n| script in the `validation` field. For round-trip pairs, the script | |\n| runs `decode(encode(x))` and `encode(decode(s))` with structural | |\n| characters and shows the asymmetry. | |\n| 4. Prior art — `git log --all --grep` and `git log -S` for the function | |\n| name and key strings; read closed issues/PRs; check whether the | |\n| behaviour is required by an RFC. If a maintainer already declined, | |\n| quote the comment. | |\n| 5. Reach — for libraries: which kind of consumer would wire hostile | |\n| input here. You don't have dependents data; reason about plausible | |\n| call patterns. \"No plausible exposed caller\" is data, not a verdict. | |\n| 6. Rate — severity + confidence. Critical = works on a fresh install, | |\n| no preconditions. High = realistic preconditions a normal deployment | |\n| satisfies. Medium = significant attacker positioning, unusual config, | |\n| or a chain. Low = unrealistic preconditions or narrow impact. | |\n| \"\"\" | |\n| @per_file_deep_output \"\"\" | |\n| ## Output | |\n| Use plain, easy-to-understand, and concise language. Focus on the real-world | |\n| impact of the findings. | |\n| If the file has no sinks at all (truly nothing dangerous-looking to even | |\n| consider), output exactly: | |\n| No findings. | |\n| Otherwise, for each finding output one block in this format: | |\n| ### <Short title> | |\n| **Severity:** Critical | High | Medium | Low | |\n| **Location:** <relative/path>:<line> | <relative/path>:<line_start>-<line_end> | |\n| **Class:** <sink class> | |\n| **Trace:** <one short paragraph backwards from sink to where the | |\n| value enters this file> | |\n| **Boundary:** <which trust boundary the input crosses, or \"internal\"> | |\n| **Impact:** <a short paragraph on the impact of the finding> | |\n| **Validation:** <one short paragraph reproduction sketch — input that | |\n| would trigger the sink and what dangerous behaviour follows. If a | |\n| guard in the file blocks it, name the guard.> | |\n| **Suggested fix:** <one or two sentences> | |\n| Then, if any sinks were considered and dropped, append: | |\n| ## Ruled out | |\n| - `<file>:<line>` (<sink class>, step N) — <one-sentence reason> | |\n| Listing ruled-out sinks is required when phase 1 found any — it's how the | |\n| audit demonstrates it considered them. No preamble, no overall summary. | |\n| \"\"\" | |\n| @whole_output \"\"\" | |\n| ## Output | |\n| Always output the full report — boundaries and inventory must be present | |\n| even when nothing rises to a finding. Format: | |\n| ## Trust boundaries | |\n| | Actor | Trusted | Controls | Source | | |\n| |-------|---------|----------|--------| | |\n| | <name> | yes/no/conditional | <what they control> | <doc citation> | | |\n| ## Inventory | |\n| | ID | Location | Class | Consumes | | |\n| |----|----------|-------|----------| | |\n| | S1 | <rel/path>:<line> or <rel/path>:<line_start>-<line_end> | <sink class> | <what it consumes> | | |\n| ## Findings | |\n| ### F1 — <short title> | |\n| **Severity:** Critical | High | Medium | Low | |\n| **CWE:** CWE-NNN | |\n| **Location:** <rel/path>:<line> | <rel/path>:<line_start>-<line_end> | |\n| **Sinks:** S1[, S2…] | |\n| **Trace:** <markdown> | |\n| **Boundary:** <markdown> | |\n| **Validation:** <markdown — include the reproduction script verbatim | |\n| under a fenced code block. Do NOT execute it; the human will.> | |\n| **Prior art:** <markdown — git log / issues / RFC citations> | |\n| **Reach:** <markdown — plausible exposed callers> | |\n| **Rating:** <markdown — severity + confidence rationale> | |\n| **Suggested fix:** <one or two sentences> | |\n| ## Ruled out | |\n| - **S2, S3** (step N) — <one or two sentences> | |\n| Use `## Findings\\\\n\\\\n_None._` for a clean report — never omit the section. | |\n| Every inventory sink ID must appear in either `Findings → Sinks:` or in | |\n| the `Ruled out` list. No preamble, no overall summary, no closing notes. | |\n| \"\"\" | |\n| @always_flag \"\"\" | |\n| ## Always-flag | |\n| Some sinks are dangerous enough on sight that the trace/boundary check is | |\n| skipped — flag every occurrence as a finding even if you can't trace where | |\n| the input comes from. | |\n| * **`:erlang.binary_to_term/1`, or `:erlang.binary_to_term/2` without | |\n| `:safe` in the options list.** Untrusted-binary deserialisation creates | |\n| arbitrary atoms (atom-table exhaustion DoS), can construct fun / | |\n| reference / pid terms that crash or hijack callers, and bypasses | |\n| parse-time validation entirely. The safe alternatives are | |\n| `:erlang.binary_to_term(bin, [:safe])` and | |\n| `Plug.Crypto.non_executable_binary_to_term/2`. Severity: **Critical**. | |\n| Report once per call site. If the same module also exposes the wrapper | |\n| that reaches the call site, mention the wrapper in the trace, but do | |\n| not skip the finding for lack of a traced caller. | |\n| * **`:erlang.binary_to_term/2` with `:safe`.** `:safe` blocks new atoms | |\n| and funs, but the decoded term is still attacker-shaped: deeply nested | |\n| structures cause memory amplification, existing atoms can still be | |\n| referenced (so any atom the BEAM has loaded is fair game), and callers | |\n| that pattern-match on a specific shape can crash or be confused. Worth | |\n| a note so reviewers can confirm the caller validates the result. | |\n| Severity: **Low**. | |\n| \"\"\" | |\n| @simple_prompt \"\"\" | |\n| You are a senior application security engineer auditing one source file from | |\n| an open-source Elixir/Erlang or Rust library. Find real, exploitable | |\n| vulnerabilities only — no style, no speculation. | |\n| You see this one file in isolation. Flag only bugs you can argue from this | |\n| file alone. Skim the file with the vector list below in mind and report | |\n| what's actually dangerous; don't write up an inventory or methodology. | |\n| #{@always_flag} | |\n| #{@sink_classes} | |\n| ## Output | |\n| If the file has no real vulnerabilities, output exactly: | |\n| No findings. | |\n| Otherwise, for each finding output one block in this format: | |\n| ### <Short title> | |\n| **Severity:** Critical | High | Medium | Low | |\n| **Location:** <relative/path>:<line> | <relative/path>:<line_start>-<line_end> | |\n| **Description:** <one short paragraph: what's vulnerable and how it | |\n| could be exploited. If a guard in the file blocks the obvious attack, | |\n| name the guard.> | |\n| **Suggested fix:** <one or two sentences> | |\n| No preamble, no overall summary, no ruled-out section. | |\n| \"\"\" | |\n| @deep_prompt \"\"\" | |\n| You are a senior application security engineer auditing one source file from | |\n| an open-source Elixir/Erlang or Rust library. Find real, exploitable bugs | |\n| only — no style, no speculation. | |\n| You see this one file in isolation. You cannot trace inputs across modules | |\n| or check reach. Flag only bugs you can argue from this file alone. | |\n| #{@per_file_deep_methodology} | |\n| #{@always_flag} | |\n| #{@sink_classes} | |\n| #{@per_file_deep_output} | |\n| \"\"\" | |\n| @whole_prompt \"\"\" | |\n| You are a senior application security engineer. Audit the open-source | |\n| Elixir/Erlang or Rust library in the current working directory for real, | |\n| exploitable vulnerabilities. | |\n| Use the tools available to you (Read, Grep, Glob, Bash) to explore the | |\n| codebase, follow data flow across modules, inspect call graphs, and check | |\n| commit history (`git log --all --grep`, `git log -S`) for unpatched variants | |\n| of past bugs. Spend effort proportional to the package's risk surface. | |\n| #{@whole_methodology} | |\n| #{@always_flag} | |\n| #{@sink_classes} | |\n| #{@whole_output} | |\n| \"\"\" | |\n| @doc \"\"\" | |\n| Audit a single file. `style` is `:simple` or `:deep`; `opts` may | |\n| override `:effort` and `:timeout_ms`. | |\n| \"\"\" | |\n| def audit_file(rel_path, content, style, opts \\\\ []) | |\n| when is_binary(rel_path) and is_binary(content) and style in [:simple, :deep] do | |\n| CodingAgent.run(build_for_file(style, rel_path, content), | |\n| effort: Keyword.get(opts, :effort, @default_effort), | |\n| timeout_ms: Keyword.get(opts, :timeout_ms, @per_file_timeout), | |\n| agent: Keyword.get(opts, :agent) | |\n| ) | |\n| end | |\n| @doc \"\"\" | |\n| Audit a whole package. `cwd` is the source directory the agent runs | |\n| in. `opts` may override `:effort` and `:timeout_ms`. | |\n| \"\"\" | |\n| def audit_directory(cwd, opts \\\\ []) when is_binary(cwd) do | |\n| CodingAgent.run(@whole_prompt, | |\n| cwd: cwd, | |\n| effort: Keyword.get(opts, :effort, @default_effort), | |\n| timeout_ms: Keyword.get(opts, :timeout_ms, @whole_timeout), | |\n| agent: Keyword.get(opts, :agent) | |\n| ) | |\n| end | |\n| defp build_for_file(style, rel_path, content) do | |\n| Enum.join( | |\n| [base_for(style), \"\", \"File path: #{rel_path}\", \"```\", content, \"```\"], | |\n| \"\\n\" | |\n| ) | |\n| end | |\n| defp base_for(:simple), do: @simple_prompt | |\n| defp base_for(:deep), do: @deep_prompt | |\n| end |", "url": "https://wpnews.pro/news/the-prompts-i-use-for-finding-vulnerabilities-in-elixir-erlang-projects", "canonical_source": "https://gist.github.com/PJUllrich/c8b3ced91598eeea6e624f5f6bdf7fbf", "published_at": "2026-05-12 10:58:45+00:00", "updated_at": "2026-05-29 18:43:38.274660+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-safety", "artificial-intelligence", "large-language-models"], "entities": ["Elixir", "Erlang", "MyApp", "CodingAgent"], "alternates": {"html": "https://wpnews.pro/news/the-prompts-i-use-for-finding-vulnerabilities-in-elixir-erlang-projects", "markdown": "https://wpnews.pro/news/the-prompts-i-use-for-finding-vulnerabilities-in-elixir-erlang-projects.md", "text": "https://wpnews.pro/news/the-prompts-i-use-for-finding-vulnerabilities-in-elixir-erlang-projects.txt", "jsonld": "https://wpnews.pro/news/the-prompts-i-use-for-finding-vulnerabilities-in-elixir-erlang-projects.jsonld"}}