{"slug": "show-hn-ploof-the-agent-native-cli-for-generating-images-video-and-audio", "title": "Show HN: Ploof – The agent-native CLI for generating images, video, and audio", "summary": "Ploof, an agent-native CLI for generating images, video, and audio, has been released on GitHub. The tool is designed to be operated by coding agents like Claude Code, Cursor, or Codex, which can install it and generate assets from prompts without manual command typing. It supports multiple providers including OpenAI and fal.ai, batch processing, and local authentication profiles.", "body_md": "**The agent-native CLI for generating images, video, and audio.**\n\nHand it to Claude Code, Cursor, or Codex — they install it, read `ploof learn`\n\n, and create your assets for you. Works great by hand, too.\n\nPloof turns a prompt into a file — and it's designed to be driven by your coding agent. The usual path isn't typing `ploof`\n\ncommands yourself; it's telling Claude Code, Cursor, or Codex what you want and letting it install ploof, read the built-in `ploof learn`\n\nreference, and generate the assets on your behalf. No SDK wiring, no polling loops, no glue code — and it's a sharp manual CLI when you want it.\n\n- 🤖\n**Agent-native**— built to be operated by coding agents:`ploof learn`\n\nself-documents the*installed*version, output is JSON/JSONL-clean, and flags stay stable. - 🎨\n**Every modality**— images, video, and audio: generate, edit, extend, transcribe, translate. - 🔌\n**Multi-provider**— OpenAI today, plus fal.ai's entire model marketplace via`model run`\n\n. - 📦\n**Batch + parallel**— declare assets in YAML, wire up dependencies, run them concurrently with one command. - 🔑\n**Local auth profiles**— multiple keys per provider in`~/.ploof`\n\n, with env-var overrides for CI. - 🧾\n**Reproducible**— every asset gets a`<file>.json`\n\nsidecar recording the prompt, params, and provider metadata.\n\n| Images | Video | Audio | Any endpoint | |\n|---|---|---|---|---|\nOpenAI |\ngenerate · edit · variations | generate · edit · extend · library · characters | speech (TTS) · transcribe · translate | — |\nfal.ai |\n✓ | ✓ | ✓ | ✓ marketplace via `model run` |\n\nMore providers are planned — the provider registry is built to grow.\n\n[Use it with your coding agent](#use-it-with-your-coding-agent)[Install](#install)[Quick start](#quick-start)[Authentication](#authentication)[Images](#images)[Video](#video)[Audio](#audio)[Run any model endpoint](#run-any-model-endpoint)[Batch manifests](#batch-manifests)[Output and scripting](#output-and-scripting)[For AI agents](#for-ai-agents)[Configuration](#configuration)[Reference](#reference)[Contributing](#contributing)\n\n**This is the main way to use ploof.** You don't run the commands yourself — you tell your coding agent what you want, and it installs ploof, reads the built-in reference, authenticates, and generates the assets for you.\n\nPaste this into Claude Code, Cursor, Codex, or any agent, and fill in the last line:\n\n```\nUse the ploof CLI to generate assets for this project.\n\nSetup:\n1. Install it if it isn't already: `bun i -g @miketromba/ploof` (or `npm i -g @miketromba/ploof`).\n2. Run `ploof learn` and follow it — that's the canonical, always-current reference for the installed version.\n3. If `ploof whoami openai` (or `ploof whoami fal`) shows I'm not authenticated, walk me through `ploof login`.\n\nTask: <describe the asset you want — e.g. \"a 1024x1024 hero image of a matte black water bottle on marble, saved to assets/hero.png\">\n```\n\nYour agent takes it from `ploof learn`\n\nand does the rest. Working in this repo often? Have it run `ploof skill install`\n\nonce to drop a bootstrap skill so the workflow auto-loads next time.\n\nWhy it works:`ploof learn`\n\nprints a complete, version-matched guide to stdout, and every command emits clean JSON/JSONL with predictable exit codes — so agents operate ploof reliably instead of guessing or relying on stale training data.[More on the agent integration ↓]\n\n```\nbun i -g @miketromba/ploof\n```\n\nRequires Node 18+ (Bun optional). Your agent normally handles this for you (see [above](#use-it-with-your-coding-agent)).\n\n## npm, pnpm, yarn, or run without installing\n\n```\nnpm  install -g @miketromba/ploof\npnpm add     -g @miketromba/ploof\nyarn global add @miketromba/ploof\n\n# one-off, no install:\nbunx @miketromba/ploof --help\nnpx  @miketromba/ploof --help\n```\n\nPrefer to drive it yourself — or want to see exactly what your agent will be doing? The manual path:\n\n```\n# 1 — install\nbun i -g @miketromba/ploof\n\n# 2 — authenticate (saved to ~/.ploof/credentials.json)\nploof login openai --api-key sk-...\n\n# 3 — make your first asset\nploof image generate \\\n  --prompt \"Studio product photo of a matte black water bottle on marble\" \\\n  --out hero.png\n```\n\n`hero.png`\n\nlands on disk next to `hero.png.json`\n\n, a sidecar recording the exact prompt and parameters used. Run `ploof --help`\n\nto see every command, or `ploof learn`\n\nfor the agent-oriented tour.\n\nCredentials live in `~/.ploof/credentials.json`\n\n. Log in once per provider:\n\n```\nploof login openai --api-key sk-...\nploof login fal    --api-key <fal-key>\n\nploof whoami openai      # show the active credential\nploof profiles           # list every stored profile\nploof logout fal         # remove credentials\n```\n\nOmit `--api-key`\n\nand Ploof reads the matching env var, or securely prompts (no echo) in an interactive terminal.\n\n**Multiple keys?** Name them with `--profile`\n\n, then select per command:\n\n```\nploof login openai --api-key sk-personal --profile personal\nploof login openai --api-key sk-work --profile work --no-default\nploof image generate --prompt \"...\" --profile work --out out.png\n```\n\n**Env vars override stored credentials** — ideal for CI:\n\n| Provider | Variables |\n|---|---|\n| OpenAI | `PLOOF_OPENAI_API_KEY` or `OPENAI_API_KEY` |\n| fal.ai | `PLOOF_FAL_KEY` or `FAL_KEY` (or split `PLOOF_FAL_KEY_ID` + `PLOOF_FAL_KEY_SECRET` ) |\n\nOpenAI org / project / base URL can be set with `--organization`\n\n, `--project`\n\n, `--base-url`\n\n(or `PLOOF_OPENAI_ORG`\n\n, `PLOOF_OPENAI_PROJECT`\n\n, `PLOOF_OPENAI_BASE_URL`\n\n).\n\nOpenAI image generation and editing default to `gpt-image-2`\n\n. Image inputs accept local paths, `http(s)`\n\nURLs, or `-`\n\nfor stdin.\n\n```\n# generate\nploof image generate \\\n  --prompt \"Editorial portrait, dramatic side light\" \\\n  --out assets/portrait.png \\\n  --size 1024x1024 --quality high\n\n# edit with context images + a mask (repeat --image for references)\nploof image edit \\\n  --image product.png --image reference.png --mask mask.png \\\n  --prompt \"Replace the background with a clean marble countertop\" \\\n  --out assets/edited.png\n\n# variations\nploof image variation --image product.png --out assets/variation.png\n```\n\n## Image flags\n\n| Flag | Description |\n|---|---|\n`--model` |\nImage model (default `gpt-image-2` ) |\n`--size` |\ne.g. `1024x1024` |\n`--quality` |\ne.g. `low` , `medium` , `high` |\n`--format` / `--output-format` |\n`png` , `jpeg` , `webp` , … |\n`--n` |\nNumber of images (`--out` file gets `-1` , `-2` , …) |\n`--image` (edit) |\nInput/context image; repeat for multiple |\n`--mask` (edit) |\nMask for inpainting |\n`--input-fidelity` (edit) |\nOpenAI input fidelity |\n`--background` , `--moderation` , `--style` , `--user` , `--stream` , `--output-compression` , `--partial-images` , `--response-format` |\nProvider settings |\n`--param key=value` / `--json '{…}'` |\nAny provider-specific parameter |\n\n`variation`\n\nis aliased as `variations`\n\nand uses OpenAI's legacy endpoint, which currently supports only `dall-e-2`\n\n. If it returns a 404, use `image edit`\n\nfor image-to-image instead.\n\nOpenAI's asynchronous Videos API, defaulting to `sora-2`\n\n. Pass `--out`\n\n(or `--download`\n\n) and Ploof waits for the job to finish, then downloads it.\n\n```\nploof video generate \\\n  --prompt \"Wide tracking shot of a paper city at blue hour\" \\\n  --size 1280x720 --seconds 4 \\\n  --out assets/clip.mp4\n\n# continue an existing clip\nploof video extend --video-id video_abc123 --seconds 4 \\\n  --prompt \"Camera rises over the rooftops\" --out assets/extended.mp4\n\n# library + lifecycle\nploof video list --limit 20\nploof video status video_abc123\nploof video download video_abc123 --variant thumbnail --out thumb.webp\nploof video delete video_abc123\n```\n\n## Video flags & characters\n\n| Flag | Description |\n|---|---|\n`--model` |\n`sora-2` , `sora-2-pro` , … |\n`--size` / `--seconds` |\nResolution / duration |\n`--input-reference <path|url|file-id>` |\nFirst-frame image reference |\n`--character <id>` |\nReusable character; repeat for several |\n`--wait` / `--download` |\nPoll to completion / download after wait |\n`--variant` |\n`video` , `thumbnail` , or `spritesheet` |\n`--poll-interval` / `--timeout` |\nPolling cadence / max wait (seconds) |\n\n`video edit`\n\nand `video extend`\n\naccept either `--video-id`\n\n(a completed OpenAI video) or `--video`\n\n(an uploaded source), where your project is eligible. Reusable characters:\n\n```\nploof video character create --name Mossy --video character.mp4\nploof video character get char_abc123\n```\n\nSpeech defaults to `gpt-4o-mini-tts`\n\n/ `alloy`\n\n/ `mp3`\n\n. Transcription defaults to `gpt-4o-mini-transcribe`\n\n; translation to `whisper-1`\n\n.\n\n```\n# text → speech\nploof audio generate --text \"Ploof can speak.\" --voice alloy --out assets/speech.mp3\n\n# speech → text\nploof audio transcribe --audio assets/speech.mp3 --out assets/transcript.json\n\n# any language → English text\nploof audio translate --audio assets/spanish.mp3 --format text --out assets/translation.txt\n```\n\n## Audio flags\n\n**Generate** (`generate`\n\n, aliased `speech`\n\n/ `tts`\n\n): `--model`\n\n, `--voice`\n\n, `--voice-id`\n\n, `--instructions`\n\n, `--format`\n\n(`mp3`\n\n, `opus`\n\n, `aac`\n\n, `flac`\n\n, `wav`\n\n, `pcm`\n\n), `--speed`\n\n.\n\n**Transcribe**: `--model`\n\n, `--language`\n\n, `--prompt`\n\n, `--format`\n\n, `--temperature`\n\n, `--include`\n\n, `--timestamp-granularity`\n\n, `--chunking-strategy`\n\n, `--known-speaker-name`\n\n, `--known-speaker-reference`\n\n.\n\n**Translate**: `--model`\n\n, `--prompt`\n\n, `--format`\n\n, `--temperature`\n\n.\n\nPloof writes finished files, so streaming-only transport settings (e.g. `stream=true`\n\n) are rejected — they don't produce a complete asset.\n\n`model run`\n\ncalls a model endpoint directly through the provider's official client — defaulting to **fal.ai**. Ploof uploads local inputs to provider storage, submits to the queue, polls to completion, and writes the returned files or text to disk.\n\n```\nploof model run \\\n  --provider fal --model fal-ai/flux/dev \\\n  --prompt \"Friendly CLI mascot icon, transparent background\" \\\n  --param image_size=square_hd \\\n  --out assets/icon.png\n```\n\nMap local assets to the endpoint's exact input fields with `--input field=path`\n\n(repeatable):\n\n```\nploof model run --provider fal --model <endpoint-id> \\\n  --prompt \"Animate this into a short loop\" \\\n  --input image_url=assets/source.png --param duration=4 \\\n  --out assets/loop.mp4\n```\n\nThe media commands work against fal too — just pass `--provider fal --model <endpoint-id>`\n\n:\n\n```\nploof image generate --provider fal --model fal-ai/flux/dev \\\n  --prompt \"Soft clay mascot icon\" --param image_size=square_hd --out assets/mascot.png\n```\n\nPass endpoint settings with `--param key=value`\n\nor `--json '{…}'`\n\n. Queue controls: `--start-timeout`\n\n, `--timeout`\n\n, `--poll-interval`\n\n, `--priority low|normal`\n\n, `--storage-expires-in`\n\n.\n\nDescribe many assets in YAML (or JSON), wire dependencies with `needs`\n\n, reuse one task's output as another's input, and run them in parallel:\n\n```\nversion: 1\nparallel: 4\ntasks:\n  - id: base\n    kind: image.generate\n    prompt: \"Studio product photo\"\n    params: { model: gpt-image-2, size: 1024x1024, quality: high }\n    output: assets/base.png\n\n  - id: final\n    kind: image.edit\n    needs: [base]\n    inputs:\n      images:\n        - task: base          # reuse base's output\n      mask: ./mask.png\n    prompt: \"Add a premium background\"\n    output: assets/final.png\n\n  - id: clip\n    kind: video.generate\n    prompt: \"Slow dolly through a miniature paper city\"\n    params: { model: sora-2, size: 1280x720, seconds: \"4\" }\n    wait: true\n    download: true\n    output: assets/clip.mp4\n\n  - id: icon\n    kind: model.run\n    provider: fal\n    model: fal-ai/flux/dev\n    prompt: \"Small mascot icon\"\n    params: { image_size: square_hd }\n    output: assets/icon.png\nploof run assets.yaml --parallel 4\nploof run assets.yaml --dry-run --output json   # validate the plan, no API calls\n```\n\nMedia tasks default to `provider: openai`\n\n; `model.run`\n\ndefaults to `provider: fal`\n\n. Relative paths resolve from the manifest's location, and every CLI operation is available as a task kind (`image.*`\n\n, `video.*`\n\n, `audio.*`\n\n, `model.run`\n\n).\n\n## Task fields & input references\n\n**Fields:**`id`\n\n,`kind`\n\n,`provider`\n\n,`profile`\n\n,`needs`\n\n,`model`\n\n,`prompt`\n\n,`text`\n\n,`output`\n\n,`params`\n\n,`sidecar`\n\n,`inputs`\n\n,`videoId`\n\n,`characterId`\n\n,`name`\n\n,`wait`\n\n,`download`\n\n,`variants`\n\n,`pollIntervalMs`\n\n,`timeoutMs`\n\n.accepts a string,`inputs.images`\n\n`{ source }`\n\n, or`{ task }`\n\n(uses that task's first output).`inputs.video(s)`\n\n,`inputs.mask`\n\n,`inputs.reference`\n\n, and`inputs.audio`\n\nuse the same shape.preserves exact input keys, so`model.run`\n\n`inputs.image_url`\n\nmaps to the provider field`image_url`\n\n.- Always\n`--dry-run`\n\nbefore an expensive batch.\n\nHuman-readable in a terminal, machine-readable in a pipe — automatically:\n\n```\nploof image generate --prompt \"...\" --output json\nploof run assets.yaml --output jsonl\nploof video list --fields id,outputs,metadata.video.status\n```\n\n| Format | When |\n|---|---|\n`auto` (default) |\n`table` in a TTY, `compact` when piped |\n`table` |\nHuman-readable columns |\n`compact` |\nOne line per asset, easy to grep |\n`json` / `jsonl` |\nProgrammatic / streaming |\n\nEvery result is a stable object:\n\n```\n{\n  \"kind\": \"video.generate\",\n  \"provider\": \"openai\",\n  \"outputs\": [\"assets/clip.mp4\"],\n  \"metadata\": { \"video\": { \"id\": \"video_…\", \"status\": \"completed\" } }\n}\n```\n\n**Sidecars:** unless disabled, each asset gets a `<output>.json`\n\nbeside it recording the operation, prompt, params, outputs, and provider metadata — reproducible by default. Narrow output with `--fields a,b.c`\n\n, and set the default format via `--output`\n\n, the `PLOOF_OUTPUT`\n\nenv var, or `ploof config set output …`\n\n.\n\nThe [copy-paste setup above](#use-it-with-your-coding-agent) is all most agents need. Here's what's happening under the hood — two commands carry the integration:\n\n```\nploof learn          # canonical, version-matched agent reference (prints to stdout)\nploof skill install  # install a bootstrap skill into your agent\n```\n\n`ploof learn`\n\nis the source of truth — it documents every command, default, and gotcha for the *exact installed version*, so an agent never works from stale memory. The installed skill is intentionally tiny: it just points back at `ploof learn`\n\n, keeping guidance in lockstep with the package. Combined with `--output json`\n\n(or `jsonl`\n\n), `--fields`\n\nselection, and predictable exit codes, ploof is built for hands-off automation.\n\n```\nploof config list\nploof config set output compact\nploof config set defaultParallel 8\nploof config set sidecar false\nploof config reset\n```\n\nStored at `~/.ploof/config.json`\n\n, separate from credentials.\n\n| Key | Default | Meaning |\n|---|---|---|\n`output` |\n`auto` |\nDefault output format |\n`defaultParallel` |\n`4` |\nDefault `run` concurrency |\n`sidecar` |\n`true` |\nWrite `<file>.json` metadata |\n`noColor` |\n`false` |\nDisable ANSI color |\n\n## Global flags\n\n| Flag | Description |\n|---|---|\n`-o, --output <format>` |\n`auto` , `table` , `compact` , `json` , `jsonl` |\n`-f, --fields <list>` |\nComma-separated field selection |\n`-d, --detail` |\nFull detail view |\n`-q, --quiet` |\nData only, no hints |\n`--no-color` |\nDisable color |\n`--verbose` |\nDebug output to stderr |\n`-y, --yes` |\nSkip confirmation prompts |\n`-V, --version` / `-h, --help` |\nVersion / help |\n\nRun `ploof <command> --help`\n\nfor any subcommand.\n\n## Environment variables\n\n| Variable | Purpose |\n|---|---|\n`PLOOF_OPENAI_API_KEY` , `OPENAI_API_KEY` |\nOpenAI key |\n`PLOOF_OPENAI_ORG` , `PLOOF_OPENAI_PROJECT` , `PLOOF_OPENAI_BASE_URL` |\nOpenAI org / project / base URL |\n`PLOOF_FAL_KEY` , `FAL_KEY` |\nfal.ai key |\n`PLOOF_FAL_KEY_ID` + `PLOOF_FAL_KEY_SECRET` (or `FAL_KEY_ID` + `FAL_KEY_SECRET` ) |\nfal.ai split key |\n`PLOOF_OUTPUT` |\nDefault output format |\n\n```\nbun install\nbun run dev -- --help     # run locally\nbun test                  # unit + integration (mocked, no API spend)\nbun run typecheck\nbun run lint\nbun run build\n```\n\nThe default suite runs real `ploof`\n\ncommands against a local OpenAI mock plus fal unit tests, so no credits are spent. Live tests are opt-in:\n\n```\nPLOOF_OPENAI_API_KEY=sk-... bun test tests/e2e\nPLOOF_FAL_KEY=...           bun test tests/e2e/fal-live.test.ts\n```\n\nReleases publish from GitHub Actions on a `v*`\n\ntag via npm Trusted Publishing. See [ SPEC.md](/miketromba/ploof/blob/main/packages/cli/SPEC.md) for the full specification and release details.\n\n[MIT](/miketromba/ploof/blob/main/LICENSE) © Michael Tromba", "url": "https://wpnews.pro/news/show-hn-ploof-the-agent-native-cli-for-generating-images-video-and-audio", "canonical_source": "https://github.com/miketromba/ploof", "published_at": "2026-06-29 16:01:21+00:00", "updated_at": "2026-06-29 16:21:11.979440+00:00", "lang": "en", "topics": ["ai-tools", "generative-ai", "developer-tools", "ai-agents"], "entities": ["Ploof", "Claude Code", "Cursor", "Codex", "OpenAI", "fal.ai", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/show-hn-ploof-the-agent-native-cli-for-generating-images-video-and-audio", "markdown": "https://wpnews.pro/news/show-hn-ploof-the-agent-native-cli-for-generating-images-video-and-audio.md", "text": "https://wpnews.pro/news/show-hn-ploof-the-agent-native-cli-for-generating-images-video-and-audio.txt", "jsonld": "https://wpnews.pro/news/show-hn-ploof-the-agent-native-cli-for-generating-images-video-and-audio.jsonld"}}