The Prompts I use for finding Vulnerabilities in Elixir/Erlang projects A developer has created a structured prompt system for auditing Elixir and Erlang projects, defining two entry points: `audit_file/4` for single-file analysis and `audit_directory/2` for whole-package audits. The system enumerates 17 sink classes—including code execution, command execution, file operations, and deserialization—as categories where vulnerabilities may reside, regardless of whether input appears hostile. The methodology employs a two-phase approach with configurable `:simple` or `:deep` strategies, using `MyApp.CodingAgent` to scan source files and directories for security flaws. | defmodule MyApp.Prompts.Audit do | | | @moduledoc """ | | | Prompts for the audit pipeline. Two entry points: | | | audit file/4 — embeds a single source file in the prompt and | | | runs MyApp.CodingAgent against it. Style is :simple or | | | :deep ; the executor picks based on audit.strategy . | | | audit directory/2 — whole-package audit. Spawns the agent with | | | :cwd set to the source dir so it can use Read/Grep/Bash. | | | """ | | | alias MyApp.CodingAgent | | | @per file timeout to timeout minute: 10 | | | @whole timeout to timeout hour: 1 | | | @default effort "max" | | | @sink classes """ | | | Sink classes — every place dangerous logic could live, regardless of whether | | | the input currently looks hostile. Enumerate first, judge second. | | | Code execution — eval, dynamic dispatch on a computed name apply , | | | Code.eval , :erlang.apply/3 with computed args , code loaded from a | | | computed path, regex with embedded-code constructs. | | | Command execution — System.cmd , :os.cmd , Port.open {:spawn, …} , | | | shelling out where args are built by concatenation rather than passed as | | | a list. | | | File operations — File.read/write/rm/cp/ln/chmod where the path is | | | computed; Code.require file / Code.eval file with dynamic paths. | | | Path handling — Path.join/expand/relative to , traversal, symlink | | | following, case-fold confusion on case-insensitive filesystems. | | | Archive extraction — :erl tar , :zip , any unpack where entry names | | | become filesystem paths zip-slip . | | | Deserialisation — :erlang.binary to term/1 no :safe , | | | Plug.Crypto.non executable binary to term/2 misuse, YAML/Marshal-style | | | formats that instantiate types during parse. | | | Template / interpolation — values reaching another interpreted context | | | without escaping for it: HTML, SQL via raw fragments, EEx/Phoenix | | | raw/1 , shell, regex, format strings, log lines. | | | Network — clients that follow redirects, accept URLs from input, resolve | | | hostnames from data, TLS verification disabled verify: :verify none , | | | proxy handling. | | | Validation — predicates whose contract is "this is safe": the sink is | | | the return value, the danger is returning the wrong answer. | | | Cryptography — KDF parameters, IV reuse, mode/padding, MAC verification, | | | == on secrets instead of Plug.Crypto.secure compare/2 . | | | Memory safety — Rust unsafe , raw pointers, unchecked indexing, FFI, | | | transmute. For NIFs: lifetime/aliasing across the BEAM boundary. | | | Shared mutable state — Application.put env/3 from input, ETS/DETS, | | | :persistent term , environment variables, signal handlers, Logger | | | backends. One input poisoning what another sees. | | | Concurrency — check-then-act sequences a racer can interleave: file | | | existence before open, permission before access, GenServer state read | | | then written without serialisation. | | | Resource consumption — atom leaks String.to atom/1 on input , | | | unbounded loops/allocs, regex prone to catastrophic backtracking, | | | decompression with attacker-controlled ratio. | | | Reflection / metaprogramming gadgets the library installs into the | | | caller — using macros, @before compile , telemetry handler | | | attaches, Logger backends, monkeypatched callbacks. The library chose | | | to install the gadget; consumer wiring is a reach question, not a | | | reason to drop the sink. | | | Round-trip integrity — pairs meant to be inverses: encode / decode , | | | parse / serialize , marshal / unmarshal . The sink is the pair. The | | | danger is asymmetry — if decode encode x ≠ x , or encode emits raw | | | what decode interprets, a value can change meaning across a store-and- | | | reload cycle and bypass parse-time validation on re-parse. | | | """ | | | @per file deep methodology """ | | | Methodology | | | Two phases. Don't skip phase 1 — skipping it is what makes audits miss bugs. | | | Phase 1 — inventory. List every sink in this file using the sink classes | | | below. Don't judge any of them yet — a sink is dangerous-if-input-is-hostile, | | | regardless of whether you currently think the input is hostile. Grep | | | exhaustively for the language's primitives in each class. | | | Phase 2 — for each sink in your inventory, in order: | | | 1. Trace — where does the value come from? If it's a hardcoded constant | | | or internal data only, write "internal" and stop. | | | 2. Boundary — does it originate from a function parameter exposed | | | publicly, or some other source crossing a trust boundary? The | | | library's caller is not the attacker — but data the caller | | | forwards from the network, from disk, or from deserialisation is. | | | 3. Validate — sketch a one-paragraph reproduction input → effect . | | | If a guard in the file rules it out, name the guard and stop. | | | 4. Impact - what is the real-world impact? What can an attacker that | | | exploits this actually do? Explain this in simple terms and plain language. | | | 4. Rate — Critical / High / Medium / Low. | | | Every sink ends up either as a finding or in Ruled out with the | | | step that disqualified it. | | | """ | | | @whole methodology """ | | | Methodology | | | Two phases. Phase 1 is an inventory — write it down before judging anything. | | | Two runs against the same source should produce the same inventory. | | | Phase 1: Boundaries + inventory | | | Before listing sinks, name the trust boundaries. For a small library this | | | is one or two lines: who calls it, what they pass, where external data | | | enters. Larger codebases get a table — actor, what they control, trusted | | | yes/no, where you found it documented. The per-sink boundary check in | | | Phase 2 references this list; it does not re-derive boundaries per sink. | | | Then enumerate every sink. For each: file, line, sink class, what it | | | consumes. Don't judge any of them yet — a sink is dangerous-if-input-is- | | | hostile, regardless of whether you currently think the input is hostile. | | | Grep exhaustively for the language's primitives in each class. | | | Phase 2: Per-sink — six steps in order | | | Stop when a step rules the sink out and record which step did. Every | | | inventory sink ends up either in findings or in ruled out . | | | 1. Trace — backwards from sink to a boundary. Name each hop. If the | | | value never crosses a boundary, write "internal" and stop. | | | 2. Boundary — which boundary from Phase 1 does it cross? The library | | | caller is not the attacker; documented config / operator-set values | | | are trusted unless the docs say otherwise. Cite the doc. Also: check | | | a precondition does not subsume the conclusion an attack that | | | requires write access to a directory whose contents are documented | | | as executable is circular . | | | 3. Validate — write a reproduction script. For Elixir, a short .exs | | | under scripts/{package name}/{short description}.exs runnable via | | | Mix.install is ideal. DO NOT execute it; the human will. Paste the | | | script in the validation field. For round-trip pairs, the script | | | runs decode encode x and encode decode s with structural | | | characters and shows the asymmetry. | | | 4. Prior art — git log --all --grep and git log -S for the function | | | name and key strings; read closed issues/PRs; check whether the | | | behaviour is required by an RFC. If a maintainer already declined, | | | quote the comment. | | | 5. Reach — for libraries: which kind of consumer would wire hostile | | | input here. You don't have dependents data; reason about plausible | | | call patterns. "No plausible exposed caller" is data, not a verdict. | | | 6. Rate — severity + confidence. Critical = works on a fresh install, | | | no preconditions. High = realistic preconditions a normal deployment | | | satisfies. Medium = significant attacker positioning, unusual config, | | | or a chain. Low = unrealistic preconditions or narrow impact. | | | """ | | | @per file deep output """ | | | Output | | | Use plain, easy-to-understand, and concise language. Focus on the real-world | | | impact of the findings. | | | If the file has no sinks at all truly nothing dangerous-looking to even | | | consider , output exactly: | | | No findings. | | | Otherwise, for each finding output one block in this format: | | |