Show HN: Ploof – The agent-native CLI for generating images, video, and audio

wpnews.pro

The agent-native CLI for generating images, video, and audio.

Hand it to Claude Code, Cursor, or Codex — they install it, read ploof learn

, and create your assets for you. Works great by hand, too.

Ploof turns a prompt into a file — and it's designed to be driven by your coding agent. The usual path isn't typing ploof

commands yourself; it's telling Claude Code, Cursor, or Codex what you want and letting it install ploof, read the built-in ploof learn

reference, and generate the assets on your behalf. No SDK wiring, no polling loops, no glue code — and it's a sharp manual CLI when you want it.

🤖 Agent-native— built to be operated by coding agents:ploof learn

self-documents theinstalledversion, output is JSON/JSONL-clean, and flags stay stable. - 🎨 Every modality— images, video, and audio: generate, edit, extend, transcribe, translate. - 🔌 Multi-provider— OpenAI today, plus fal.ai's entire model marketplace viamodel run

. - 📦 Batch + parallel— declare assets in YAML, wire up dependencies, run them concurrently with one command. - 🔑 Local auth profiles— multiple keys per provider in~/.ploof

, with env-var overrides for CI. - 🧾 Reproducible— every asset gets a<file>.json

sidecar recording the prompt, params, and provider metadata.

Images	Video	Audio	Any endpoint
OpenAI
generate · edit · variations	generate · edit · extend · library · characters	speech (TTS) · transcribe · translate	—
fal.ai
✓	✓	✓	✓ marketplace via `model run`

More providers are planned — the provider registry is built to grow.

Use it with your coding agent
Install
Quick start
Authentication
Images
Video
Audio
Run any model endpoint
Batch manifests
Output and scripting
For AI agents
Configuration
Reference
Contributing

This is the main way to use ploof. You don't run the commands yourself — you tell your coding agent what you want, and it installs ploof, reads the built-in reference, authenticates, and generates the assets for you.

Paste this into Claude Code, Cursor, Codex, or any agent, and fill in the last line:

Use the ploof CLI to generate assets for this project.

Setup:
1. Install it if it isn't already: `bun i -g @miketromba/ploof` (or `npm i -g @miketromba/ploof`).
2. Run `ploof learn` and follow it — that's the canonical, always-current reference for the installed version.
3. If `ploof whoami openai` (or `ploof whoami fal`) shows I'm not authenticated, walk me through `ploof login`.

Task: <describe the asset you want — e.g. "a 1024x1024 hero image of a matte black water bottle on marble, saved to assets/hero.png">

Your agent takes it from ploof learn

and does the rest. Working in this repo often? Have it run ploof skill install

once to drop a bootstrap skill so the workflow auto-loads next time.

Why it works:ploof learn

prints a complete, version-matched guide to stdout, and every command emits clean JSON/JSONL with predictable exit codes — so agents operate ploof reliably instead of guessing or relying on stale training data.[More on the agent integration ↓]

bun i -g @miketromba/ploof

Requires Node 18+ (Bun optional). Your agent normally handles this for you (see above).

npm, pnpm, yarn, or run without installing #

npm  install -g @miketromba/ploof
pnpm add     -g @miketromba/ploof
yarn global add @miketromba/ploof

bunx @miketromba/ploof --help
npx  @miketromba/ploof --help

Prefer to drive it yourself — or want to see exactly what your agent will be doing? The manual path:

bun i -g @miketromba/ploof

ploof login openai --api-key sk-...

ploof image generate \
  --prompt "Studio product photo of a matte black water bottle on marble" \
  --out hero.png

hero.png

lands on disk next to hero.png.json

, a sidecar recording the exact prompt and parameters used. Run ploof --help

to see every command, or ploof learn

for the agent-oriented tour.

Credentials live in ~/.ploof/credentials.json

. Log in once per provider:

ploof login openai --api-key sk-...
ploof login fal    --api-key <fal-key>

ploof whoami openai      # show the active credential
ploof profiles           # list every stored profile
ploof logout fal         # remove credentials

Omit --api-key

and Ploof reads the matching env var, or securely prompts (no echo) in an interactive terminal.

Multiple keys? Name them with --profile

, then select per command:

ploof login openai --api-key sk-personal --profile personal
ploof login openai --api-key sk-work --profile work --no-default
ploof image generate --prompt "..." --profile work --out out.png

Env vars override stored credentials — ideal for CI:

Provider	Variables
OpenAI	`PLOOF_OPENAI_API_KEY` or `OPENAI_API_KEY`
fal.ai	`PLOOF_FAL_KEY` or `FAL_KEY` (or split `PLOOF_FAL_KEY_ID` + `PLOOF_FAL_KEY_SECRET` )

OpenAI org / project / base URL can be set with --organization

, --project

, --base-url

(or PLOOF_OPENAI_ORG

, PLOOF_OPENAI_PROJECT

, PLOOF_OPENAI_BASE_URL

).

OpenAI image generation and editing default to gpt-image-2

. Image inputs accept local paths, http(s)

URLs, or -

for stdin.

ploof image generate \
  --prompt "Editorial portrait, dramatic side light" \
  --out assets/portrait.png \
  --size 1024x1024 --quality high

ploof image edit \
  --image product.png --image reference.png --mask mask.png \
  --prompt "Replace the background with a clean marble countertop" \
  --out assets/edited.png

ploof image variation --image product.png --out assets/variation.png

Image flags #

Flag	Description
`--model`
Image model (default `gpt-image-2` )
`--size`
e.g. `1024x1024`
`--quality`
e.g. `low` , `medium` , `high`
`--format` / `--output-format`
`png` , `jpeg` , `webp` , …
`--n`
Number of images (`--out` file gets `-1` , `-2` , …)
`--image` (edit)
Input/context image; repeat for multiple
`--mask` (edit)
Mask for inpainting
`--input-fidelity` (edit)
OpenAI input fidelity
`--background` , `--moderation` , `--style` , `--user` , `--stream` , `--output-compression` , `--partial-images` , `--response-format`
Provider settings
`--param key=value` / `--json '{…}'`
Any provider-specific parameter

variation

is aliased as variations

and uses OpenAI's legacy endpoint, which currently supports only dall-e-2

. If it returns a 404, use image edit

for image-to-image instead.

OpenAI's asynchronous Videos API, defaulting to sora-2

. Pass --out

(or --download

) and Ploof waits for the job to finish, then downloads it.

ploof video generate \
  --prompt "Wide tracking shot of a paper city at blue hour" \
  --size 1280x720 --seconds 4 \
  --out assets/clip.mp4

ploof video extend --video-id video_abc123 --seconds 4 \
  --prompt "Camera rises over the rooftops" --out assets/extended.mp4

ploof video list --limit 20
ploof video status video_abc123
ploof video download video_abc123 --variant thumbnail --out thumb.webp
ploof video delete video_abc123

Video flags & characters #

Flag	Description
`--model`
`sora-2` , `sora-2-pro` , …
`--size` / `--seconds`
Resolution / duration
`--input-reference <path	url
First-frame image reference
`--character <id>`
Reusable character; repeat for several
`--wait` / `--download`
Poll to completion / download after wait
`--variant`
`video` , `thumbnail` , or `spritesheet`
`--poll-interval` / `--timeout`
Polling cadence / max wait (seconds)

video edit

and video extend

accept either --video-id

(a completed OpenAI video) or --video

(an uploaded source), where your project is eligible. Reusable characters:

ploof video character create --name Mossy --video character.mp4
ploof video character get char_abc123

Speech defaults to gpt-4o-mini-tts

/ alloy

/ mp3

. Transcription defaults to gpt-4o-mini-transcribe

; translation to whisper-1

.

ploof audio generate --text "Ploof can speak." --voice alloy --out assets/speech.mp3

ploof audio transcribe --audio assets/speech.mp3 --out assets/transcript.json

ploof audio translate --audio assets/spanish.mp3 --format text --out assets/translation.txt

Audio flags #

Generate (generate

, aliased speech

/ tts

): --model

, --voice

, --voice-id

, --instructions

, --format

(mp3

, opus

, aac

, flac

, wav

, pcm

), --speed

.

Transcribe: --model

, --language

, --prompt

, --format

, --temperature

, --include

, --timestamp-granularity

, --chunking-strategy

, --known-speaker-name

, --known-speaker-reference

.

Translate: --model

, --prompt

, --format

, --temperature

.

Ploof writes finished files, so streaming-only transport settings (e.g. stream=true

) are rejected — they don't produce a complete asset.

model run

calls a model endpoint directly through the provider's official client — defaulting to fal.ai. Ploof uploads local inputs to provider storage, submits to the queue, polls to completion, and writes the returned files or text to disk.

ploof model run \
  --provider fal --model fal-ai/flux/dev \
  --prompt "Friendly CLI mascot icon, transparent background" \
  --param image_size=square_hd \
  --out assets/icon.png

Map local assets to the endpoint's exact input fields with --input field=path

(repeatable):

ploof model run --provider fal --model <endpoint-id> \
  --prompt "Animate this into a short loop" \
  --input image_url=assets/source.png --param duration=4 \
  --out assets/loop.mp4

The media commands work against fal too — just pass --provider fal --model <endpoint-id>

:

ploof image generate --provider fal --model fal-ai/flux/dev \
  --prompt "Soft clay mascot icon" --param image_size=square_hd --out assets/mascot.png

Pass endpoint settings with --param key=value

or --json '{…}'

. Queue controls: --start-timeout

, --timeout

, --poll-interval

, --priority low|normal

, --storage-expires-in

.

Describe many assets in YAML (or JSON), wire dependencies with needs

, reuse one task's output as another's input, and run them in parallel:

version: 1
parallel: 4
tasks:
  - id: base
    kind: image.generate
    prompt: "Studio product photo"
    params: { model: gpt-image-2, size: 1024x1024, quality: high }
    output: assets/base.png

  - id: final
    kind: image.edit
    needs: [base]
    inputs:
      images:
        - task: base          # reuse base's output
      mask: ./mask.png
    prompt: "Add a premium background"
    output: assets/final.png

  - id: clip
    kind: video.generate
    prompt: "Slow dolly through a miniature paper city"
    params: { model: sora-2, size: 1280x720, seconds: "4" }
    wait: true
    download: true
    output: assets/clip.mp4

  - id: icon
    kind: model.run
    provider: fal
    model: fal-ai/flux/dev
    prompt: "Small mascot icon"
    params: { image_size: square_hd }
    output: assets/icon.png
ploof run assets.yaml --parallel 4
ploof run assets.yaml --dry-run --output json   # validate the plan, no API calls

Media tasks default to provider: openai

; model.run

defaults to provider: fal

. Relative paths resolve from the manifest's location, and every CLI operation is available as a task kind (image.*

, video.*

, audio.*

, model.run

).

Task fields & input references #

Fields:id

,kind

,provider

,profile

,needs

,model

,prompt

,text

,output

,params

,sidecar

,inputs

,videoId

,characterId

,name

,wait

,download

,variants

,pollIntervalMs

,timeoutMs

.accepts a string,inputs.images

{ source }

, or{ task }

(uses that task's first output).inputs.video(s)

,inputs.mask

,inputs.reference

, andinputs.audio

use the same shape.preserves exact input keys, somodel.run

inputs.image_url

maps to the provider fieldimage_url

.- Always --dry-run

before an expensive batch.

Human-readable in a terminal, machine-readable in a pipe — automatically:

ploof image generate --prompt "..." --output json
ploof run assets.yaml --output jsonl
ploof video list --fields id,outputs,metadata.video.status

Format	When
`auto` (default)
`table` in a TTY, `compact` when piped
`table`
Human-readable columns
`compact`
One line per asset, easy to grep
`json` / `jsonl`
Programmatic / streaming

Every result is a stable object:

{
  "kind": "video.generate",
  "provider": "openai",
  "outputs": ["assets/clip.mp4"],
  "metadata": { "video": { "id": "video_…", "status": "completed" } }
}

Sidecars: unless disabled, each asset gets a <output>.json

beside it recording the operation, prompt, params, outputs, and provider metadata — reproducible by default. Narrow output with --fields a,b.c

, and set the default format via --output

, the PLOOF_OUTPUT

env var, or ploof config set output …

.

The copy-paste setup above is all most agents need. Here's what's happening under the hood — two commands carry the integration:

ploof learn          # canonical, version-matched agent reference (prints to stdout)
ploof skill install  # install a bootstrap skill into your agent

ploof learn

is the source of truth — it documents every command, default, and gotcha for the exact installed version, so an agent never works from stale memory. The installed skill is intentionally tiny: it just points back at ploof learn

, keeping guidance in lockstep with the package. Combined with --output json

(or jsonl

), --fields

selection, and predictable exit codes, ploof is built for hands-off automation.

ploof config list
ploof config set output compact
ploof config set defaultParallel 8
ploof config set sidecar false
ploof config reset

Stored at ~/.ploof/config.json

, separate from credentials.

Key	Default	Meaning
`output`
`auto`
Default output format
`defaultParallel`
`4`
Default `run` concurrency
`sidecar`
`true`
Write `<file>.json` metadata
`noColor`
`false`
Disable ANSI color

Global flags #

Flag	Description
`-o, --output <format>`
`auto` , `table` , `compact` , `json` , `jsonl`
`-f, --fields <list>`
Comma-separated field selection
`-d, --detail`
Full detail view
`-q, --quiet`
Data only, no hints
`--no-color`
Disable color
`--verbose`
Debug output to stderr
`-y, --yes`
Skip confirmation prompts
`-V, --version` / `-h, --help`
Version / help

Run ploof <command> --help

for any subcommand.

Environment variables #

Variable	Purpose
`PLOOF_OPENAI_API_KEY` , `OPENAI_API_KEY`
OpenAI key
`PLOOF_OPENAI_ORG` , `PLOOF_OPENAI_PROJECT` , `PLOOF_OPENAI_BASE_URL`
OpenAI org / project / base URL
`PLOOF_FAL_KEY` , `FAL_KEY`
fal.ai key
`PLOOF_FAL_KEY_ID` + `PLOOF_FAL_KEY_SECRET` (or `FAL_KEY_ID` + `FAL_KEY_SECRET` )
fal.ai split key
`PLOOF_OUTPUT`
Default output format

bun install
bun run dev -- --help     # run locally
bun test                  # unit + integration (mocked, no API spend)
bun run typecheck
bun run lint
bun run build

The default suite runs real ploof

commands against a local OpenAI mock plus fal unit tests, so no credits are spent. Live tests are opt-in:

PLOOF_OPENAI_API_KEY=sk-... bun test tests/e2e
PLOOF_FAL_KEY=...           bun test tests/e2e/fal-live.test.ts

Releases publish from GitHub Actions on a v*

tag via npm Trusted Publishing. See SPEC.md for the full specification and release details.

source & further reading

github.com — original article

Show HN: Ploof – The agent-native CLI for generating images, video, and audio

npm, pnpm, yarn, or run without installing #

Image flags #

Video flags & characters #

Audio flags #

Task fields & input references #

Global flags #

Environment variables #

Run your AI side-project on zahid.host