# Show HN: Stable Audio 3 – one-shot sample generator (110gb download)

> Source: <https://github.com/shiehn/sas-sample-generator>
> Published: 2026-05-31 13:05:24+00:00

The latest **v3** libraries are hosted publicly on GCP (no auth required). The
[Signals & Sorcery](https://signalsandsorcery.com) app installs these
automatically, but you can grab them directly here:

| Pack | Contents | Download |
|---|---|---|
Drums (v3 large) |
24 roles · 10,359 one-shots (+ prompt sidecars) · ~1.4 GB |
|

**Instruments**(v3 large)[sas-instrument-pack-v3-large.zip](https://storage.googleapis.com/docs-assets/sas-instrument-pack-v3-large.zip)Each zip contains a

`_pack-version.json`

marker plus the payload tree (drums:`<role>/*.wav`

; instruments:`<category>/<id>/manifest.json`

+`zones/`

). The instrument pack is zones-only (the 24-bit generation`sources/`

are omitted — they aren't used at playback).

Generate large batches of audio samples with
[Stable Audio 3](https://huggingface.co/stabilityai/stable-audio-3-medium)
on a rented [RunPod](https://www.runpod.io) GPU. Two pipelines ship side by side:

**Drums / one-shots**(`run_all.sh`

) — 24 unpitched categories (kicks, snares, hats, claps, 808s, risers, impacts, textures…). Generate → quality-gate → trim/normalize → flat`processed/<role>/`

folders.**Pitched instruments**(`run_pitched.sh`

) — 28 tuned categories (pianos, basses, pads, leads, strings…). Generate → pitch/quality-gate → multi-source pitch-correct + pre-render playable zones →`instruments/<cat>/<id>/manifest.json`

.

Both run the same **retry-to-target** loop (re-roll failed prompts until ~150
samples per category survive the gate) and **batched** generation on one model load.

**Read top to bottom, copy-paste each command block.** A single-category test
slice is ~5 minutes and a few cents; the full ~50-category v3 campaign is up to
~a day of big-GPU pod time (still well under $50 — see [Cost](#cost-recap)).

Assumes an **Apple Silicon Mac** as the control machine.

For the rationale (why these settings, prompt-design tips, deep cost math),
see [ stable_audio_open_batch_oneshot_guide.md](/shiehn/sas-sample-generator/blob/main/stable_audio_open_batch_oneshot_guide.md).

Part of the

[Signals & Sorcery]family. See[Related repos]at the bottom of this README.

Plain-text prompt files — **one description per line** — under `prompts/`

(drums) and `prompts/pitched/`

(instruments). The repo ships **52 categories,
200 prompts each** (~10,400 prompts), pre-generated by
[ scripts/gen_prompts.py](/shiehn/sas-sample-generator/blob/main/scripts/gen_prompts.py) in a combinatorial house
style (~58% EDM / ~25% hip-hop & urban / ~17% acoustic-orchestral-world). Run
them as-is, edit them, or subset which categories generate.

Each non-comment line becomes one generation job. Example from
[ prompts/kick.txt](/shiehn/sas-sample-generator/blob/main/prompts/kick.txt):

```
# 909-style
tight 909-style kick drum one shot, hard click transient, short punchy body, dry
punchy 909 kick drum one shot, sharp transient, controlled low end, clean studio sample

# 808-style
deep 808 kick one shot, long sub bass decay, smooth sine low end, dry
warm 808 kick one shot, saturated low end, medium decay, dry, no melody, no loop
```

Blank lines and lines starting with `#`

are ignored (handy for grouping).
Aim for ~10 words per line. For drums, always include phrases like
`one shot, no loop`

so the model doesn't render a rhythmic loop.

Editing prompts changes content-hashes.Output filenames are content-addressed (`{category}-{hash}.wav`

), so re-wording a line orphans the WAV it used to produce. Finalize wordingbeforea GPU run. Re-running with`--skip-existing`

then only generates the new/changed lines.

Flat `processed/<role>/`

output; the folder name *is* the drum role.

| Core kit | Dur | EDM / electronic one-shots | Dur | |
|---|---|---|---|---|
`kick` |
1.5s | `clap` |
0.75s | |
`snare-standard` |
1.0s | `808` (tuned sub one-shot) |
2.0s | |
`snare-rim` |
0.75s | `riser` |
4.0s | |
`hat-closed` |
0.5s | `downlifter` |
3.0s | |
`hat-open` |
1.5s | `impact` |
2.0s | |
`cymbal-ride` |
2.5s | `sub-drop` |
2.0s | |
`cymbal-crash` |
3.0s | `sweep` |
2.5s | |
`cymbal-splash` |
1.5s | `texture` (vinyl/foley/glitch) |
3.0s | |
`tamborine` |
1.0s | `zap` |
0.75s | |
`shaker` |
0.75s | `foley-perc` |
0.75s | |
`tom-hi` / `tom-mid` / `tom-low` |
1.0–1.5s | |||
`hit` (generic stab) |
1.5s |

`instruments/<cat>/<id>/`

with a `manifest.json`

. Source pitches, durations, and
variant counts are in the
[Pitched-instrument pipeline](#pitched-instrument-pipeline) table below.

```
synths  lead-supersaw  lead-fm  lead-acid  pluck-synth  plucks  keys  pianos
organs  basses  808-bass  reese-bass  pads  strings  brass  winds  accordion
bells  mallets  percussion  timpani  guitars  banjos  mandolin  harp  sitar
vocals  choir
```

Output filenames are content-addressed: `{category}-{hash}.wav`

. Same prompt +
seed → same filename → safely re-runnable with `--skip-existing`

.

To **subset** what generates, edit
[ scripts/categories.txt](/shiehn/sas-sample-generator/blob/main/scripts/categories.txt) (drums) or

[(pitched) — comment out any line to skip that category.](/shiehn/sas-sample-generator/blob/main/scripts/pitched_categories.txt)

`scripts/pitched_categories.txt`

- Create / sign in at
[huggingface.co](https://huggingface.co). - Visit
[stabilityai/stable-audio-3-medium](https://huggingface.co/stabilityai/stable-audio-3-medium)and click**Agree and access repository**. SA3 also requires accepting the Gemma Terms of Use (linked from the same page). If you switch models via`--model`

, accept the license on that model's page too —[stable-audio-3-small-sfx](https://huggingface.co/stabilityai/stable-audio-3-small-sfx)is the lighter 0.6B SFX-tuned alternative. - Create a read-only access token at
[huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). Save it in your password manager — you'll paste it once per pod.

- Create / sign in at
[runpod.io](https://runpod.io). Add a payment method. - Add your Mac SSH public key under
[Settings → SSH Keys](https://www.runpod.io/console/user/settings):Paste it into the form. (If

```
pbcopy < ~/.ssh/id_ed25519.pub        # copies key to clipboard
```

`~/.ssh/id_ed25519.pub`

doesn't exist:`ssh-keygen -t ed25519`

first, accept defaults.)

[runpod.io/console/pods](https://www.runpod.io/console/pods) → **Deploy → GPU Pod**:

| Setting | Value |
|---|---|
| GPU | RTX A6000 (48 GB VRAM, ~$0.49/hr) |
| Template | most recent RunPod PyTorch with CUDA 12.x |
| Container Disk | 50 GB (default) |
| Volume Disk | 100 GB at `/workspace` |
| Expose | SSH (port 22) — default |

Click **Deploy On-Demand**. Wait ~30 sec until status is `RUNNING`

.

On the pod's card click **Connect → SSH over exposed TCP** and copy the SSH
command. It looks like:

```
ssh root@<POD_IP> -p <POD_PORT> -i ~/.ssh/id_ed25519
```

From your Mac terminal, paste the SSH command from step 1. Type `yes`

to
accept the host key on first connect.

If you get `Permission denied (publickey)`

:

```
ssh-add ~/.ssh/id_ed25519
```

…then retry.

On the pod:

```
cd /workspace && \
git clone https://github.com/shiehn/sas-sample-generator.git && \
cd /workspace/sas-sample-generator && \
./scripts/setup.sh 2>&1 | tee /workspace/setup.log
```

**Why these paths matter** (and the reason this used to be slow): `/workspace`

is a network filesystem (MooseFS) — fine for big sequential reads/writes
(model weights, generated audio) but painfully slow for many-tiny-files (a
Python venv). The script installs the venv at ** /root/.venv**, which is on
the pod's container-local SSD, and only keeps the HuggingFace cache and
outputs on

`/workspace`

. Roughly:

```
/root/.venv                   ← Python venv          (fast SSD; ~5 min install)
/workspace/sas-sample-generator   ← cloned repo
/workspace/.cache/huggingface ← model weights        (downloaded once)
/workspace/outputs            ← generated WAVs
```

You're done when you see:

```
[setup] cuda available: True
[setup] device:         NVIDIA RTX A6000
[setup] done.
[setup] next: source /root/.venv/bin/activate
source /root/.venv/bin/activate
hf auth login
```

Paste your HF token (One-Time Setup A). Answer `n`

to "Add token as git
credential".

The 14 prompt files are already in `prompts/<category>.txt`

. To run them
as-is, **skip to step 6**.

To customize:

**Edit content**:`nano prompts/kick.txt`

(or scp over your own version, or edit on Mac →`git push`

→`git pull`

on the pod).**Subset which categories run**: editand comment out the lines you want to skip. Useful for prompt iteration on a single category.`scripts/categories.txt`

Wrap the run in `tmux`

first so an SSH drop doesn't kill the job:

```
tmux new -s sas
./scripts/run_all.sh 2>&1 | tee /workspace/run.log
```

(Detach with `Ctrl-b d`

; reattach later with `tmux attach -t sas`

.)

The wrapper runs, in order:

**Build JSONLs**—`list_to_jsonl.py`

turns each`prompts/<cat>.txt`

into`prompts/<cat>.jsonl`

, stamping a per-category`variants`

count from(5–6 candidates per prompt; more for high-value kick/snare/clap/808).`scripts/drum_gate_config.py`

**Generate + gate, with retry-to-target**—`run_retry.py`

drives`batch_generate.py`

(one model load,**batched**—`BATCH_SIZE`

generations per call) then`gate_drums.py`

, which best-of-N selects one winner per prompt and rejects clipped / silent / off-band / multi-hit / wrong-decay samples (per-category profiles in`drum_gate_config.py`

). Any prompt whose candidates*all*fail is re-rolled with fresh seeds (up to`MAX_RETRIES=2`

); if a category still has fewer than`TARGET=150`

survivors, more prompt lines are topped up. Winners land in`gated_drums/<cat>/`

.**Post-process**—`postprocess_oneshots.py`

trims, LUFS-normalizes to**-16 LUFS**(-1 dBFS peak ceiling), and mono-downmixes the gated winners into`processed/<cat>/`

. Each WAV ships with a sibling`<id>.txt`

holding its exact prompt; the prompt is also embedded in the WAV's RIFF INFO comment chunk so Logic / Ableton / Audacity /`ffprobe`

/ macOS Get-Info show it. Merging runs is a single`rsync`

— no manifest to reconcile.

First call downloads Stable Audio 3 (~5–8 GB for medium, ~3 min, one-time). SA3
converges in ~8 diffusion steps (vs 120 for SAO 1.0), so generation is fast — with
the gate + retry, the **bottleneck is now the CPU gate**, not generation.

LUFS normalization keeps kicks, hats, and splashes at the same perceived volume
in your sampler. Revert to peak-only with `--normalize peak`

, or override with
`--target-lufs -14`

(streaming-hot) / `-10`

(commercial-hot).

**Useful env knobs** (all optional):

```
ONLY="kick clap" LIMIT=10 ./scripts/run_all.sh   # tiny test slice (TARGET auto-0)
GATE=0          ./scripts/run_all.sh             # legacy: keep all raw, no gate
BATCH_SIZE=32   ./scripts/run_all.sh             # bigger batches on an 80GB GPU
TARGET=200 MAX_RETRIES=3 ./scripts/run_all.sh    # chase a higher survivor count
STEPS=4         ./scripts/run_all.sh             # cheaper/faster iteration
```

**Single-category iteration** (tuning prompts):

```
# Edit prompts/kick.txt, then run just that category. --skip-existing means
# only new/changed prompt lines hit the GPU.
ONLY=kick ./scripts/run_all.sh
ls /workspace/outputs/processed/
# Should show one subdir per enabled drum role (24 by default)
find /workspace/outputs/processed -name "*.wav" | wc -l
# ~150 per category survive the gate (TARGET) → ~3,600 for all 24
```

Every `<id>.wav`

ships with a sibling `<id>.txt`

containing its
generation prompt. Spot-check the pairing:

```
test "$(find /workspace/outputs/processed -name '*.wav' | wc -l)" \
   = "$(find /workspace/outputs/processed -name '*.txt' | wc -l)" \
   && echo "wav/txt pairing OK"
```

On the pod:

```
cd /workspace
tar czf run.tar.gz outputs/processed
ls -lh run.tar.gz
```

(We use `tar`

rather than `zip`

because the stock RunPod PyTorch image
doesn't ship `zip`

. `tar`

is preinstalled everywhere. `tar`

also recurses
into the per-category subdirs automatically.)

In a **second** Mac terminal (don't close the SSH session yet — you still
need it for step 10):

```
cd ~/Downloads
scp -P <POD_PORT> root@<POD_IP>:/workspace/run.tar.gz .
tar xzf run.tar.gz
open outputs/processed                 # Finder + QuickLook to audition
```

`<POD_PORT>`

and `<POD_IP>`

are the same ones from your step-1 SSH command.
The unpacked structure is one folder per category:

```
outputs/processed/
  kick/        kick-c1da23da.wav   kick-e5d95885.wav   ...
  snare-standard/   snare-standard-...wav
  hat-closed/  ...
  ...etc
```

This is the step you will forget. The pod bills **$0.49/hr** for as long as
it exists, whether you're using it or not.

**Idle overnight**≈ $12** Forgotten for a week**≈ $80** Forgotten for a month**≈ $350

In the [RunPod console](https://www.runpod.io/console/pods), click your pod's
card → **Terminate**. Confirm.

Termination wipes `/workspace`

. That's fine — you have the zip on your Mac.
Next month, you start fresh from step 1.

```
sas-sample-generator/
├── README.md                                   ← you are here
├── stable_audio_open_batch_oneshot_guide.md    ← long-form background
├── requirements.txt
├── docs/SAMPLE_GEN_V3_PLAN.md                  ← v3 design rationale & locked decisions
├── prompts/
│   ├── <drum-cat>.txt        (24 files, 200 prompts each)  kick.txt, clap.txt, 808.txt, …
│   └── pitched/<cat>.txt     (28 files, 200 prompts each)  pianos.txt, basses.txt, pads.txt, …
├── scripts/
│   ├── setup.sh                       ← bootstrap a pod (venv + deps + SA3 tools)
│   ├── gen_prompts.py                 ← (re)generate the 200-prompt corpora
│   │
│   ├── run_all.sh                     ← DRUM pipeline driver
│   ├── categories.txt                 ← which drum roles to run
│   ├── category_config.py             ← per-role negatives + durations
│   ├── drum_gate_config.py            ← per-role gate profiles + variant counts
│   ├── gate_drums.py                  ← drum quality gate + best-of-N
│   ├── list_to_jsonl.py               ← .txt → .jsonl (drums)
│   ├── postprocess_oneshots.py        ← trim / LUFS / mono / tag
│   │
│   ├── run_pitched.sh                 ← PITCHED pipeline driver
│   ├── pitched_categories.txt         ← which instruments to run
│   ├── pitched_category_config.py     ← target pitches, durations, zones, variants
│   ├── gate_pitched.py                ← pitch + quality gate + best-of-N
│   ├── list_to_jsonl_pitched.py       ← .txt → .jsonl (multi-source fan-out)
│   ├── enrich_pitched.py              ← pitch-correct + multi-zone render + manifest
│   ├── pitch_report.py                ← measured-vs-target pitch-accuracy report
│   ├── repair_instrument_pitch.py     ← post-hoc single-source pitch fix
│   │
│   ├── run_retry.py                   ← generate+gate+retry-to-target (both pipelines)
│   ├── batch_generate.py              ← batched SA3 inference
│   ├── build_pack.py                  ← deterministic versioned pack zips
│   ├── README-PACKS.md                ← pack build + publish runbook
│   └── benchmark.py / sync.sh         ← optional helpers
├── tests/                             ← CPU-only suite (see "Tests" below)
└── outputs/                           ← gitignored; all generated audio lands here
    ├── raw/<cat>/<id>_vNN.wav         ← SA3 candidates (N variants/prompt)
    ├── raw/<cat>/_metadata/<id>_vNN.json  ← seed, model, gen params (stays on pod)
    ├── gated_drums/<cat>/<id>.wav     ← drum gate winners
    ├── processed/<cat>/<id>.{wav,txt} ← final drums (trimmed/LUFS/mono; ICMT = prompt)
    ├── gated/<cat>/<id>.{wav,gate.json}   ← pitched gate winners + scores
    ├── _reports/pitch_summary.{json,md}   ← pitch-accuracy report (report stage)
    └── instruments/<cat>/<id>/        ← final instruments (sources/, zones/, manifest.json)
```

Sibling pipeline to the drum one. Same Stable Audio 3 generator, different
downstream: a 5-stage quality+pitch gate (prefilter, onset, sustain-plateau,
CREPE/spectral pitch with a sub-bass octave cross-check, BasicPitch polyphony
when the TF/numpy ABI is happy), **multi-source pitch correction**, and zone
pre-rendering (RubberBand R3, formant-preserving). Emits a per-instrument
`manifest.json`

consumed by `sas-instrument-plugin`

.

**Multi-source real-pitch sampling (the v3 headline).** Wide-range instruments
are generated at **2–4 real source pitches** spanning their natural register
(e.g. basses at E1/E2/E3). Enrich assigns each playable zone to its *nearest*
real source, so no zone is pitch-shifted more than ~half the inter-source gap —
small, artifact-free shifts instead of stretching one root ±12 semitones. Toggle
with `SAS_MULTI_SOURCE=1`

(default; `0`

reproduces v1 single-source for an A/B).

**28 categories ship, 200 prompts each (5,600 prompts):**

| Category | Source pitch(es) | Dur | Var | Notes |
|---|---|---|---|---|
`synths` |
C3 | 5.0s | 6 | analog mono, FM, wavetable, acid |
`lead-supersaw` |
C4·C5 | 5.0s | 6 | multi-source |
`lead-fm` |
C4 | 5.0s | 6 | |
`lead-acid` |
C3 | 4.0s | 6 | 303-style |
`pluck-synth` |
C4 | 3.0s | 6 | |
`plucks` |
C4 | 3.0s | 6 | |
`keys` |
C3 | 5.0s | 6 | Rhodes, Wurli, clav, DX7 |
`pianos` |
C2·C3·C4·C5 | 5.0s | 6 | multi-source, step 2 |
`organs` |
C3 | 8.0s | 8 | open-ended |
`basses` |
E1·E2·E3 | 6.0s | 8 | multi-source, 30 Hz floor |
`808-bass` |
C2·C3 | 6.0s | 8 | multi-source, 25 Hz floor |
`reese-bass` |
E2·E3 | 6.0s | 8 | multi-source |
`pads` |
C3 | 12.0s | 8 | open-ended, step 2 |
`strings` |
A2·A3·A4 | 8.0s | 8 | multi-source, open-ended, step 2 |
`brass` |
A2·A3·A4 | 6.0s | 8 | multi-source, open-ended |
`winds` |
D3·D4·D5 | 5.0s | 8 | multi-source, open-ended, step 2 |
`accordion` |
F#3 | 6.0s | 8 | open-ended |
`bells` |
C5 | 4.0s | 6 | glockenspiel, FM, music box |
`mallets` |
C4 | 3.0s | 6 | marimba, vibes, kalimba |
`percussion` |
C4 | 2.0s | 6 | tonal/tuned only |
`timpani` |
F2·F3 | 4.0s | 8 | multi-source, tuned drum |
`guitars` |
E2·E3·E4 | 4.0s | 6 | multi-source |
`banjos` |
C4·G4 | 3.0s | 6 | multi-source |
`mandolin` |
A4 | 3.0s | 6 | |
`harp` |
C3·C4·C5 | 4.0s | 6 | multi-source |
`sitar` |
C4 | 4.0s | 6 | |
`vocals` |
A3 | 5.0s | 20 | choir, chops, vocoded (SA3 vocals are weak) |
`choir` |
A3 | 10.0s | 18 | open-ended |

"Var" = SA3 candidates per (prompt × source pitch); the gate keeps the best.
Pitches use the C4 = MIDI 60 convention; multi-source counts apply with
`SAS_MULTI_SOURCE=1`

(default). Full config in
[ scripts/pitched_category_config.py](/shiehn/sas-sample-generator/blob/main/scripts/pitched_category_config.py).

To **subset** which categories run, edit
[ scripts/pitched_categories.txt](/shiehn/sas-sample-generator/blob/main/scripts/pitched_categories.txt) —
comment any line with

`#`

to skip that category.`scripts/setup.sh`

installs **everything** the pitched pipeline needs:

- All Python deps from
`requirements.txt`

(librosa, torchcrepe, basic-pitch, pyloudnorm, soxr, …) - All system packages —
`rsync`

(transfer to Mac),`tmux`

(long sessions survive disconnect),`rubberband-cli`

(enrich shells out to it directly for formant-preserving pitch shift),`ffmpeg`

(audio inspection),`zip`

/`unzip`

(archives) - The compiled stable-audio-tools from git main (the PyPI release doesn't support SA3-medium)
- CUDA-12.8 torch wheels (Blackwell-compatible, also works on older Hopper/Ampere/Ada)

You should **never** need to apt-get or pip install anything on a pod
after running `setup.sh`

. If you do, treat it as a bug in `setup.sh`

and
add it there.

```
brew install rubberband                          # pitch-shift backend
cd ~/path/to/sas-sample-generator                # your local repo
pip install -r requirements.txt                  # one-time, ~5 min
```

Designed to be safely repeatable from a cold start. The whole pipeline:
**~15 min setup + ~30 min generate + ~10 min gate + ~30 min enrich + transfer**.

[runpod.io/console/pods](https://www.runpod.io/console/pods) → **Deploy → GPU Pod**:

| Setting | Value | Why |
|---|---|---|
| GPU | RTX A6000 / 4090 / 5090 / L40S / A100 (24+ GB VRAM) |
SA3-medium fits in 16 GB; 24 GB gives headroom |
| Template | most recent RunPod PyTorch with CUDA 12.x |
matches our cu128 wheels |
Container Disk |
100 GB |
persistent across pod restart; holds venv + HF model cache + outputs |
Network Volume |
None |
RunPod's "migrate to new host" flow has been known to attach a tiny 10 GB network volume — don't let it. We use container disk only |
| Expose | SSH (port 22, default) |

**Critical:** the field is named "Container Disk" — the persistent SSD. Do NOT confuse with "Network Volume" or "Volume Disk".

Click **Deploy On-Demand**. Wait ~30 sec for status `RUNNING`

.

Copy the SSH command from **Connect → SSH over exposed TCP**. It looks like:

```
ssh root@<POD_IP> -p <POD_PORT> -i ~/.ssh/id_ed25519
ssh root@<POD_IP> -p <POD_PORT> -i ~/.ssh/id_ed25519
```

Type `yes`

on first connect. If `Permission denied`

: `ssh-add ~/.ssh/id_ed25519`

.

```
cd /workspace && \
git clone https://github.com/shiehn/sas-sample-generator.git && \
cd /workspace/sas-sample-generator && \
./scripts/setup.sh 2>&1 | tee /root/setup.log
```

Look for these "OK" markers near the end of setup.log:

```
[setup]   rsync:          rsync version 3.x.x ...
[setup]   tmux:           tmux 3.x
[setup]   rubberband:     /usr/bin/rubberband
[setup]   ffmpeg:         ffmpeg version ...
[setup] cuda available: True
[setup] device:         NVIDIA RTX 4090
[setup] done.
```

If `cuda available: False`

→ you deployed onto a CPU template; terminate, redeploy with PyTorch GPU.

```
source /root/.venv/bin/activate
hf auth login
```

Paste your HF read token. Answer `n`

to "Add token as git credential".

**First time on a new HF account:** in your browser, visit
[stabilityai/stable-audio-3-medium](https://huggingface.co/stabilityai/stable-audio-3-medium)
and accept BOTH the SA3 community license **AND** the underlying Gemma
terms. Without both, the model download fails with `GatedRepoError`

.
Token IS your account — accept while logged into the same HF account
your token belongs to.

Verify access (under 5 seconds):

```
hf download stabilityai/stable-audio-3-medium model_config.json --local-dir /tmp/sa3-test
ls /tmp/sa3-test/
```

If `model_config.json`

is listed: cleared.

The default ships with **all 16** categories enabled. For a quick test
or a focused run:

```
# Edit scripts/pitched_categories.txt — comment out any category with `#`
nano scripts/pitched_categories.txt
```

After the test, restore with `git checkout scripts/pitched_categories.txt`

.

Run **generate + gate + report** on the GPU pod; enrich runs later on your Mac
(it's CPU-bound).

```
tmux new -s pitched

# Inside tmux:
cd /workspace/sas-sample-generator
source /root/.venv/bin/activate
source /workspace/.bash_env

STAGES=generate,gate,report ./scripts/run_pitched.sh 2>&1 | tee outputs/run.log
```

`run_pitched.sh`

builds the JSONLs (multi-source fan-out — one job per
prompt × source pitch, with the per-category `variants`

count), then
`run_retry.py`

drives **batched** generation + `gate_pitched.py`

with the same
**retry-to-target** loop as drums (re-roll all-fail prompts up to
`MAX_RETRIES=2`

, top up until `TARGET=150`

instruments survive). The `report`

stage then writes `outputs/_reports/pitch_summary.{json,md}`

so you can read
measured-vs-target pitch accuracy *before* transferring.

**Detach with Ctrl-b d.** The run keeps going even if SSH drops.

**Reattach later** (from any new SSH session — possibly a new IP/port if migrated):

```
tmux attach -t pitched
```

Monitor from outside tmux:

```
tail -f outputs/run.log
nvidia-smi
```

**Throughput:** batched generation is ~1–3 h; the **CPU gate is the bottleneck**
(torchcrepe + basic-pitch + librosa per variant). For all 28 categories at full
variant counts, budget up to ~a day of pod time. Dial variant counts down in
`pitched_category_config.py`

, or run a subset, if wall-clock matters.

Useful knobs: `ONLY=pianos LIMIT=5`

(tiny slice), `BATCH_SIZE=32`

(80 GB GPU),
`MAX_RETRIES=0`

(one pass, no retry), `INIT_ANCHOR=1`

(experimental init_audio
pitch anchoring — default off), `SAS_MULTI_SOURCE=0`

(single-source A/B).

When `STAGES=generate,gate`

finishes, before transferring:

```
# Per-category pass rates
for d in outputs/gated/*/; do
  cat=$(basename "$d")
  [[ "$cat" == "_failures" ]] && continue
  passed=$(ls "$d"*.wav 2>/dev/null | wc -l)
  failed=$(ls "$d/_failures"/*.json 2>/dev/null | wc -l)
  total=$((passed + failed))
  if [[ $total -gt 0 ]]; then
    rate=$((passed * 100 / total))
    printf "  %-18s passed=%3d  failed=%3d  pass-rate=%d%%\n" "$cat" "$passed" "$failed" "$rate"
  fi
done

echo "Total gated: $(find outputs/gated -name '*.wav' -not -path '*_failures*' | wc -l)"
du -sh outputs/gated
```

Expected (with the current thresholds, 2026-05-22): **80–100% pass rate per category**. If a category is below 50%, look in `outputs/gated/<cat>/_failures/<id>.json`

to see why prompts are failing.

The pod has `rsync`

installed by `setup.sh`

. On your Mac:

```
mkdir -p ~/sas-pitched-out
rsync -avzP -e "ssh -p <POD_PORT> -i ~/.ssh/id_ed25519" \
  root@<POD_IP>:/workspace/sas-sample-generator/outputs/gated/ \
  ~/sas-pitched-out/gated/
```

For ~4 GB at typical RunPod / home upload speeds, expect 10–20 min.
`rsync`

resumes on interruption — just re-run the same command if SSH drops.

Verify locally:

```
find ~/sas-pitched-out/gated -name '*.wav' -not -path '*_failures*' | wc -l   # should match step 7
du -sh ~/sas-pitched-out/gated
cd ~/path/to/sas-sample-generator
git pull                                # pick up any threshold updates
pip install -r requirements.txt         # idempotent

export SAS_OUTPUTS_DIR=~/sas-pitched-out
STAGES=enrich ./scripts/run_pitched.sh
```

Enrich groups the surviving source pitches of each prompt into **one
multi-source instrument** under `~/sas-pitched-out/instruments/<cat>/<id>/`

:

`sources/<midi>.wav`

—**24-bit** real source samples (1 per source pitch: 2–4 for multi-source categories), pitch-corrected + normalized to -20 LUFS`zones/<midi>.wav`

—**16-bit WAV** pre-rendered playable zones (every 2–3 semitones), each rendered from its nearest real source (was 24-bit FLAC pre-v3; WAV is memory-mapped by the Tracktion sampler with no decode stall)`manifest.json`

—`schema_version: 1`

, disjoint ordered zones`prompt.txt`

— original positive prompt

It parallelizes across instruments (`ProcessPoolExecutor`

) and shells out to the
`rubberband`

CLI for pitch shifts (`brew install rubberband`

on the Mac).

[runpod.io/console/pods](https://www.runpod.io/console/pods) → pod card → **Terminate** (NOT Stop). Compute billing stops immediately. Volume billing (if any auto-created Network Volume snuck in) stops only on Terminate.

Then [runpod.io/console/user/storage](https://www.runpod.io/console/user/storage) → **Network Volumes** → check for any `outside_*`

orphan from a migration → Delete.

RunPod sometimes moves your pod to a different physical host mid-run. Symptoms:

- SSH connection drops mid-session
`Connection refused`

when reconnecting on the same IP/port- Pod shows "Stopped" briefly, then "Running" again at a new address

**The pod, the venv, the HF cache, and all outputs/ data persist on the container disk** as long as the pod isn't terminated. You just need fresh connection info.

- Open RunPod console → click your pod card → check the
**Connect → SSH over exposed TCP** panel for the new IP and port (both can change). - Clear the old SSH host key on your Mac:

```
ssh-keygen -R '[<NEW_IP>]:<NEW_PORT>'
```

- SSH back in with the new details. Run
`tmux attach -t pitched`

— your run is still going. - If you were mid-rsync, just re-run the rsync command with the new
`-p <NEW_PORT>`

and`root@<NEW_IP>`

— it picks up where it stopped.

This bit us twice this session (May 2026). Symptoms are unambiguous; recovery takes 30 seconds.

SA3 doesn't reliably hit a target pitch from a text prompt — that's a known limitation of text-to-audio diffusion models. Enrich now compensates intelligently:

| If measured pitch is… | Enrich does… | Result |
|---|---|---|
within `max_correction_semitones` of target (default 3) |
shifts all the way to the original target | Sample is at exactly the prompted MIDI note; preserves prompt semantics |
| further away than that | snaps to the nearest integer semitone |
Sample is at the closest "logical" MIDI note (always ≤50 cent shift, no audible artifacts) |

Either way: every output sample lands on an **exact MIDI semitone** with the smallest possible pitch shift. The zone rendering loop centers on that effective root, so the sampler always has a clean zone at the sample's actual pitch.

`max_correction_semitones`

is per-category in `scripts/pitched_category_config.py`

. Set to `0`

to always snap to nearest semitone (never shift to target). Set to a large value (24+) to always shift to target.

| Stage | What it checks | What rejection means |
|---|---|---|
`prefilter` |
Clipping, dead channels, all-silent buffers | Sample is broken at the file level |
`onset` |
Time from buffer start to first transient | `slow_onset` → SA3 added a fade-in / silence preamble (>300ms) |
`sustain` |
Longest plateau within 12 dB of peak RMS | `short_stab` → audio decays too fast or has no held region |
`pitch` |
CREPE periodicity + measured-vs-target | `no_voiced_frames` / `unconfident` → unpitched output; (tolerance 9999) so enrich's snap-to-nearest-semitone can do its job`wrong_pitch` is OFF by default |
`polyphony` |
BasicPitch note count after vibrato bypass | Disabled when TF/numpy ABI mismatches (common on RunPod) — gate prints one warning at start, then runs without it |

The gate scores winners by `confidence² × exp(-|cents|/50) × sus_quality`

. With `wrong_pitch`

disabled, the pitch term collapses to ~0 for far-off samples, so all variants of a prompt can tie at score=0.000 — the picker just grabs v00 by default in that case. Acceptable for now.

```
outputs/
├── raw/<category>/                                  ← SA3 candidates (N variants per prompt × source pitch)
│   ├── <id>_v00.wav, <id>_v01.wav, ...
│   └── _metadata/<id>_v0N.json                      ← seed, model, generation_seconds, batch_size
├── gated/<category>/                                ← gate winners only
│   ├── <id>.wav                                     ← chosen variant
│   ├── <id>.gate.json                               ← per-gate scores + measured pitch
│   └── _failures/<id>.json                          ← prompts where ALL variants rejected
├── _reports/pitch_summary.{json,md}                 ← measured-vs-target accuracy (report stage)
└── instruments/<category>/<instrument-id>/          ← final library, sampler-consumable
    ├── sources/<midi>.wav                           ← 24-bit real source pitches (1–4)
    ├── zones/<midi>.wav                             ← 16-bit WAV pre-rendered zones
    ├── manifest.json                                ← schema_version 1, disjoint ordered zones
    └── prompt.txt                                   ← original positive prompt
```

The v3 full run is much bigger than a quick slice: 28 categories × 200 prompts ×
2–4 source pitches × 6–20 variants = tens of thousands of candidates. Generation
is cheap and batched; the **CPU gate dominates wall-clock** (per-variant CREPE +
basic-pitch + librosa). On an **A100 80 GB** (~~$0.89–1.89/hr) the whole pitched
campaign runs in roughly ~~$15–35**, with drums adding
a few dollars. The original $200 budget is never the binding constraint.**half a day to a day** → **

To cut cost/time: run fewer categories (`ONLY=…`

), lower variant counts in
`pitched_category_config.py`

, or set `MAX_RETRIES=0`

. A single-category slice
(`ONLY=pianos LIMIT=5`

) is a few minutes and a few cents — use it to dial prompts
before committing the full campaign. Enrich is local on your Mac → **$0**.

```
prompts/pitched/<category>.txt                # one prompt per line, # comments
scripts/pitched_categories.txt                # which categories to run (comment to skip)
scripts/pitched_category_config.py            # per-category target pitch, duration, sustain thresholds, etc.
```

Fast iteration on a single category:

```
# 1. ONLY=<cat> overrides the enable-list — no need to edit pitched_categories.txt
# 2. Edit prompts/pitched/<cat>.txt
# 3. Re-run generate+gate+report (LIMIT caps prompts; sets TARGET=0)
ONLY=<cat> LIMIT=10 STAGES=generate,gate,report ./scripts/run_pitched.sh
# 4. Listen to outputs/gated/<cat>/*.wav, read outputs/_reports/pitch_summary.md, re-run
```

Re-author / extend the corpora with
[ scripts/gen_prompts.py](/shiehn/sas-sample-generator/blob/main/scripts/gen_prompts.py) (deterministic, preserves
existing lines so hashes stay stable, holds the EDM ratio).

`--skip-existing`

in `batch_generate.py`

means re-running won't regenerate samples you already have — only new prompt lines hit the GPU.

`STAGES=generate,gate,enrich,report`

— pitched only; comma-separated subset (default: all four)`STEPS=8`

— diffusion steps (SA3 converges fast)`BATCH_SIZE=16`

— generations per model call (32–64 on an 80 GB GPU)`TARGET=150`

— per-category minimum surviving samples (`0`

= no top-up)`MAX_RETRIES=2`

— re-roll rounds for all-fail prompts before topping up (`0`

= one pass)`ONLY="a b"`

/`LIMIT=N`

— run a subset of categories / cap prompts per category (test slice; auto-sets`TARGET=0`

)`GATE=0`

— drums only; skip the quality gate (legacy keep-all)`INIT_ANCHOR=1`

— pitched only; experimental init_audio pitch anchoring (default off)`SAS_MULTI_SOURCE=0`

— pitched only; disable multi-source (single root, span 12; for an A/B)`SAS_OUTPUTS_DIR=/path`

— override outputs location (default`/workspace/outputs`

on pod,`./outputs`

local)

Per-prompt **variant counts are per-category now** (in `drum_gate_config.py`

/
`pitched_category_config.py`

), not a global env var.

The `sas-instrument-plugin`

walks `outputs/instruments/<cat>/<id>/`

, parses
each `manifest.json`

, and uses the `zones[]`

array to call
`host.setTrackInstrumentSampler`

on the chosen track. Disjoint zones +
per-zone `root_midi`

mean the engine pitch-shifts the nearest
pre-rendered zone for any played MIDI note, with the smart-corrected
sample as the unshifted root. Since enrich locks every sample to an
integer MIDI semitone, the sampler never has to deal with off-pitch
sources.

Generated audio ships to the [Signals & Sorcery](https://signalsandsorcery.com)
app as **versioned pack zips** built by
[ scripts/build_pack.py](/shiehn/sas-sample-generator/blob/main/scripts/build_pack.py) — deterministic (fixed mtimes +
sorted entries → byte-identical zip + sha256 from an identical source tree):

| Pack | Source dir | Zip | Approx size (v3) |
|---|---|---|---|
| Drums | `outputs/processed/` |
`sas-drum-pack-v{N}.zip` |
~2–3 GB (24 roles) |
| Instruments | `outputs/instruments/` |
`sas-instrument-pack-v{N}.zip` |
~20–24 GB (28 cats × ~150) |
| Loops | `outputs/loops/` |
`sas-loop-library-v{N}.zip` |
external loop library (not generated here) |

```
python scripts/build_pack.py --smoke-test                   # cheap determinism check
python scripts/build_pack.py --pack drums --version 1       # real build → ./dist/
python scripts/build_pack.py --pack instruments --version 1
```

**Ready-to-consume directories (no zip / download step).** To produce the two
libraries as folders you can drop straight into the app's install location —
`<userData>/sample-packs/{drums,instruments}/`

— use `--format dir`

, or the
wrapper. Each emits `dist/<subdir>/`

with the `_pack-version.json`

marker at its
root: exactly the tree the app expects on disk, ready out of the box.

```
# after gating + enriching (processed/ + instruments/ are populated):
DRUM_VERSION=3 INSTRUMENT_VERSION=3 ./scripts/build_libraries.sh   # → dist/drums/, dist/instruments/
# drop into your local app (macOS path shown):
rsync -a dist/drums/        ~/Library/Application\ Support/signals-and-sorcery/sample-packs/drums/
rsync -a dist/instruments/  ~/Library/Application\ Support/signals-and-sorcery/sample-packs/instruments/
```

The marker `version`

must equal that pack's `expectedVersion`

in
`sas-app/src/shared/constants/sample-packs.ts`

(plain string match) or the app
treats the folder as a different version. For GCP distribution use `--format zip`

(default) and follow the publish runbook below.

The build prints `sizeBytes`

+ `sha256`

; paste those into
`sas-app/src/shared/constants/sample-packs.ts`

(bump `expectedVersion`

+ the
download URL) so the app detects the new version and prompts a re-download.
**Never overwrite a published version — always bump.** Full publish runbook
(GCP upload, version rules, what's in/excluded, the v3 WAV-zone note):
[ scripts/README-PACKS.md](/shiehn/sas-sample-generator/blob/main/scripts/README-PACKS.md).

CPU-only suite — no GPU, no Stable Audio. It covers everything *around* model
generation: pitch detection, both gates, multi-source enrich, retry-to-target
helpers, config/prompt/enable wiring, loudness targets, `list_to_jsonl`

, the
pitch report, and the deterministic pack builder.

```
./tests/run_tests.sh
```

Keep it green — run before and after any pipeline-code change. Deps live in the
project `.venv`

(numpy, soundfile, pyloudnorm, librosa); the `rubberband`

CLI is
optional (the enrich test degrades gracefully without it).

| Symptom | Most likely cause | Fix |
|---|---|---|
`Permission denied (publickey)` on ssh |
private key not loaded into agent | `ssh-add ~/.ssh/id_ed25519` |
`setup.sh` hangs at `Installing collected packages:` for >5 min |
something redirected the venv onto `/workspace` (MooseFS); script defaults to `/root/.venv` for a reason |
check `echo $VENV_DIR` — should be `/root/.venv` . If overridden, unset it and re-run |
`cuda available: False` after `setup.sh` |
picked a CPU template | terminate; re-deploy with PyTorch GPU template |
`huggingface_hub.utils._errors.GatedRepoError` |
didn't accept the SA3 license (or the Gemma terms it inherits) | visit the
|

`batch_generate.py`

errors `CUDA out of memory`

`--default-duration`

or `--num-waveforms-per-prompt 1`

`one shot, no loop, no hi hats, no snare`

to every prompt`dry, no reverb, no ambience`

to prompts`negative_prompt`

may be excluding the target — bug in `scripts/category_config.py`

`scripts/category_config.py`

, audit the negative for that category; nothing in it should match the target sound`run_all.sh`

skips a category`prompts/<cat>.txt`

is missing or has only comments`prompts/`

has the .txt file and contains non-comment lines`rejected/`

with `reason=lufs_unmeasurable`

`pyloudnorm`

and check they cluster near `--target-lufs`

. The pipeline log at `/workspace/run.log`

(from step 6) records the as-run postprocess flags for each category.`tmux new -s sas`

BEFORE running, reattach with `tmux attach -t sas`

Stable Audio 3 needs only ~8 diffusion steps (vs 120 for SAO 1.0), so generation
is fast and batched. Since v3 added the quality gate + retry-to-target, the
**CPU gate — not generation — sets the wall-clock**, and the per-category
minimum-survivor target (`TARGET=150`

) re-rolls until each category fills.

| Run shape | Where the time goes | Rough cost |
|---|---|---|
Single-category slice (`ONLY=… LIMIT=…` ) |
a few minutes, gate-bound | a few cents |
Full drum run (24 roles, gate + retry) |
~1 h on a big GPU | ~$1–3 |
Full pitched run (28 cats, multi-source, gate + retry) |
½–1 day, gate-bound | ~$15–35 |

Keep a pod alive between iterations and `--skip-existing`

skips anything already
generated — so re-running after a prompt tweak only pays for the new lines.
Terminate as soon as generate + gate finish (enrich is local and free).

This is a **public repo**. Never commit:

- Hugging Face tokens
- RunPod API keys
- B2 / R2 / S3 keys
- SSH private keys
- The generated WAVs (gitignored already)

`.gitignore`

covers `.env`

, `*.token`

, `*.secret`

, `outputs/*`

. If you ever
`git add`

a file containing a secret by mistake: **rotate the secret first**,
then `git rm`

+ commit + push. Treat anything that hit `main`

as compromised.

[ stable_audio_open_batch_oneshot_guide.md](/shiehn/sas-sample-generator/blob/main/stable_audio_open_batch_oneshot_guide.md)
covers:

- Why Stable Audio 3 vs alternatives (and what changed from SAO 1.0)
- Prompt-design rules and category-specific templates
- Optional persistent Network Volume layout (for users running multiple times per week)
- Optional rclone push to Backblaze B2 / Cloudflare R2 instead of
`scp`

- Optional custom Docker image
- Cost-control deep dive

This is one piece of a larger ecosystem around the
[Signals & Sorcery](https://signalsandsorcery.com) audio app.

**Plugin SDK & templates**

[sas-plugin-sdk](https://github.com/shiehn/sas-plugin-sdk)— types, components, and hooks for building generator plugins[sas-plugin-template](https://github.com/shiehn/sas-plugin-template)— starter template for new plugins[sas-chat-plugin](https://github.com/shiehn/sas-chat-plugin)— in-app conversational agent

**Built-in plugins**

[sas-stems-plugin](https://github.com/shiehn/sas-stems-plugin)— default AI audio-from-text + stem-splitting plugin[sas-loops-plugin](https://github.com/shiehn/sas-loops-plugin)— default audio loop / sample plugin[sas-synth-plugin](https://github.com/shiehn/sas-synth-plugin)— default synth plugin[sas-texture-plugin](https://github.com/shiehn/sas-texture-plugin)— texture/ambient plugin[sas-recorder-plugin](https://github.com/shiehn/sas-recorder-plugin)— line-in recording plugin

**Audio tooling**

[sas-audio-processor](https://github.com/shiehn/sas-audio-processor)— audio processing utilities[Signals2Surge](https://github.com/shiehn/Signals2Surge)— synth patch transfer to Surge XT

**Infrastructure**

[signals-and-sorcery-server](https://github.com/shiehn/signals-and-sorcery-server)— DAWNet API + WebSocket server[signals-and-sorcery-docs](https://github.com/shiehn/signals-and-sorcery-docs)— public docs

**Other**

[signalsandsorcery-game-ui](https://github.com/shiehn/signalsandsorcery-game-ui)— LLM-powered RPG frontend[SignalsAndSorcery](https://github.com/shiehn/SignalsAndSorcery)— earlier VueJS + Web Audio sample arrangement tool[Errantry](https://github.com/shiehn/Errantry)— E2E testing for agent-facing CLIs (drives this project's CLI too)