The latest v3 libraries are hosted publicly on GCP (no auth required). The Signals & Sorcery app installs these automatically, but you can grab them directly here:
| Pack | Contents | Download |
|---|---|---|
| Drums (v3 large) | ||
| 24 roles · 10,359 one-shots (+ prompt sidecars) · ~1.4 GB | ||
Instruments(v3 large)sas-instrument-pack-v3-large.zipEach zip contains a
_pack-version.json
marker plus the payload tree (drums:<role>/*.wav
; instruments:<category>/<id>/manifest.json
+zones/
). The instrument pack is zones-only (the 24-bit generationsources/
are omitted — they aren't used at playback).
Generate large batches of audio samples with Stable Audio 3 on a rented RunPod GPU. Two pipelines ship side by side:
Drums / one-shots(run_all.sh
) — 24 unpitched categories (kicks, snares, hats, claps, 808s, risers, impacts, textures…). Generate → quality-gate → trim/normalize → flatprocessed/<role>/
folders.Pitched instruments(run_pitched.sh
) — 28 tuned categories (pianos, basses, pads, leads, strings…). Generate → pitch/quality-gate → multi-source pitch-correct + pre-render playable zones →instruments/<cat>/<id>/manifest.json
.
Both run the same retry-to-target loop (re-roll failed prompts until ~150 samples per category survive the gate) and batched generation on one model load.
Read top to bottom, copy-paste each command block. A single-category test slice is ~5 minutes and a few cents; the full ~50-category v3 campaign is up to ~a day of big-GPU pod time (still well under $50 — see Cost).
Assumes an Apple Silicon Mac as the control machine.
For the rationale (why these settings, prompt-design tips, deep cost math), see stable_audio_open_batch_oneshot_guide.md.
Part of the
[Signals & Sorcery]family. See[Related repos]at the bottom of this README.
Plain-text prompt files — one description per line — under prompts/
(drums) and prompts/pitched/
(instruments). The repo ships 52 categories, 200 prompts each (~10,400 prompts), pre-generated by scripts/gen_prompts.py in a combinatorial house style (~58% EDM / ~25% hip-hop & urban / ~17% acoustic-orchestral-world). Run them as-is, edit them, or subset which categories generate.
Each non-comment line becomes one generation job. Example from prompts/kick.txt:
tight 909-style kick drum one shot, hard click transient, short punchy body, dry
punchy 909 kick drum one shot, sharp transient, controlled low end, clean studio sample
deep 808 kick one shot, long sub bass decay, smooth sine low end, dry
warm 808 kick one shot, saturated low end, medium decay, dry, no melody, no loop
Blank lines and lines starting with #
are ignored (handy for grouping).
Aim for ~10 words per line. For drums, always include phrases like
one shot, no loop
so the model doesn't render a rhythmic loop.
Editing prompts changes content-hashes.Output filenames are content-addressed ({category}-{hash}.wav
), so re-wording a line orphans the WAV it used to produce. Finalize wordingbeforea GPU run. Re-running with--skip-existing
then only generates the new/changed lines.
Flat processed/<role>/
output; the folder name is the drum role.
| Core kit | Dur | EDM / electronic one-shots | Dur | |
|---|---|---|---|---|
kick |
||||
| 1.5s | clap |
|||
| 0.75s | ||||
snare-standard |
||||
| 1.0s | 808 (tuned sub one-shot) |
|||
| 2.0s | ||||
snare-rim |
||||
| 0.75s | riser |
|||
| 4.0s | ||||
hat-closed |
||||
| 0.5s | downlifter |
|||
| 3.0s | ||||
hat-open |
||||
| 1.5s | impact |
|||
| 2.0s | ||||
cymbal-ride |
||||
| 2.5s | sub-drop |
|||
| 2.0s | ||||
cymbal-crash |
||||
| 3.0s | sweep |
|||
| 2.5s | ||||
cymbal-splash |
||||
| 1.5s | texture (vinyl/foley/glitch) |
|||
| 3.0s | ||||
tamborine |
||||
| 1.0s | zap |
|||
| 0.75s | ||||
shaker |
||||
| 0.75s | foley-perc |
|||
| 0.75s | ||||
tom-hi / tom-mid / tom-low |
||||
| 1.0–1.5s | ||||
hit (generic stab) |
||||
| 1.5s |
instruments/<cat>/<id>/
with a manifest.json
. Source pitches, durations, and variant counts are in the Pitched-instrument pipeline table below.
synths lead-supersaw lead-fm lead-acid pluck-synth plucks keys pianos
organs basses 808-bass reese-bass pads strings brass winds accordion
bells mallets percussion timpani guitars banjos mandolin harp sitar
vocals choir
Output filenames are content-addressed: {category}-{hash}.wav
. Same prompt +
seed → same filename → safely re-runnable with --skip-existing
.
To subset what generates, edit scripts/categories.txt (drums) or
(pitched) — comment out any line to skip that category.
scripts/pitched_categories.txt
- Create / sign in at
huggingface.co. - Visit
stabilityai/stable-audio-3-mediumand clickAgree and access repository. SA3 also requires accepting the Gemma Terms of Use (linked from the same page). If you switch models via
--model
, accept the license on that model's page too —stable-audio-3-small-sfxis the lighter 0.6B SFX-tuned alternative. - Create a read-only access token at huggingface.co/settings/tokens. Save it in your password manager — you'll paste it once per pod.
- Create / sign in at runpod.io. Add a payment method. - Add your Mac SSH public key under Settings → SSH Keys:Paste it into the form. (If
pbcopy < ~/.ssh/id_ed25519.pub # copies key to clipboard
~/.ssh/id_ed25519.pub
doesn't exist:ssh-keygen -t ed25519
first, accept defaults.)
runpod.io/console/pods → Deploy → GPU Pod:
| Setting | Value |
|---|---|
| GPU | RTX A6000 (48 GB VRAM, ~$0.49/hr) |
| Template | most recent RunPod PyTorch with CUDA 12.x |
| Container Disk | 50 GB (default) |
| Volume Disk | 100 GB at /workspace |
| Expose | SSH (port 22) — default |
Click Deploy On-Demand. Wait ~30 sec until status is RUNNING
.
On the pod's card click Connect → SSH over exposed TCP and copy the SSH command. It looks like:
ssh root@<POD_IP> -p <POD_PORT> -i ~/.ssh/id_ed25519
From your Mac terminal, paste the SSH command from step 1. Type yes
to accept the host key on first connect.
If you get Permission denied (publickey)
:
ssh-add ~/.ssh/id_ed25519
…then retry.
On the pod:
cd /workspace && \
git clone https://github.com/shiehn/sas-sample-generator.git && \
cd /workspace/sas-sample-generator && \
./scripts/setup.sh 2>&1 | tee /workspace/setup.log
Why these paths matter (and the reason this used to be slow): /workspace
is a network filesystem (MooseFS) — fine for big sequential reads/writes (model weights, generated audio) but painfully slow for many-tiny-files (a Python venv). The script installs the venv at ** /root/.venv**, which is on the pod's container-local SSD, and only keeps the HuggingFace cache and outputs on
/workspace
. Roughly:
/root/.venv ← Python venv (fast SSD; ~5 min install)
/workspace/sas-sample-generator ← cloned repo
/workspace/.cache/huggingface ← model weights (downloaded once)
/workspace/outputs ← generated WAVs
You're done when you see:
[setup] cuda available: True
[setup] device: NVIDIA RTX A6000
[setup] done.
[setup] next: source /root/.venv/bin/activate
source /root/.venv/bin/activate
hf auth login
Paste your HF token (One-Time Setup A). Answer n
to "Add token as git credential".
The 14 prompt files are already in prompts/<category>.txt
. To run them as-is, skip to step 6.
To customize:
Edit content:nano prompts/kick.txt
(or scp over your own version, or edit on Mac →git push
→git pull
on the pod).Subset which categories run: editand comment out the lines you want to skip. Useful for prompt iteration on a single category.scripts/categories.txt
Wrap the run in tmux
first so an SSH drop doesn't kill the job:
tmux new -s sas
./scripts/run_all.sh 2>&1 | tee /workspace/run.log
(Detach with Ctrl-b d
; reattach later with tmux attach -t sas
.)
The wrapper runs, in order:
Build JSONLs—list_to_jsonl.py
turns eachprompts/<cat>.txt
intoprompts/<cat>.jsonl
, stamping a per-categoryvariants
count from(5–6 candidates per prompt; more for high-value kick/snare/clap/808).scripts/drum_gate_config.py
Generate + gate, with retry-to-target—run_retry.py
drivesbatch_generate.py
(one model load,batched—BATCH_SIZE
generations per call) thengate_drums.py
, which best-of-N selects one winner per prompt and rejects clipped / silent / off-band / multi-hit / wrong-decay samples (per-category profiles indrum_gate_config.py
). Any prompt whose candidatesallfail is re-rolled with fresh seeds (up toMAX_RETRIES=2
); if a category still has fewer thanTARGET=150
survivors, more prompt lines are topped up. Winners land ingated_drums/<cat>/
.Post-process—postprocess_oneshots.py
trims, LUFS-normalizes to**-16 LUFS**(-1 dBFS peak ceiling), and mono-downmixes the gated winners intoprocessed/<cat>/
. Each WAV ships with a sibling<id>.txt
holding its exact prompt; the prompt is also embedded in the WAV's RIFF INFO comment chunk so Logic / Ableton / Audacity /ffprobe
/ macOS Get-Info show it. Merging runs is a singlersync
— no manifest to reconcile.
First call downloads Stable Audio 3 (~5–8 GB for medium, ~3 min, one-time). SA3 converges in ~8 diffusion steps (vs 120 for SAO 1.0), so generation is fast — with the gate + retry, the bottleneck is now the CPU gate, not generation.
LUFS normalization keeps kicks, hats, and splashes at the same perceived volume
in your sampler. Revert to peak-only with --normalize peak
, or override with
--target-lufs -14
(streaming-hot) / -10
(commercial-hot).
Useful env knobs (all optional):
ONLY="kick clap" LIMIT=10 ./scripts/run_all.sh # tiny test slice (TARGET auto-0)
GATE=0 ./scripts/run_all.sh # legacy: keep all raw, no gate
BATCH_SIZE=32 ./scripts/run_all.sh # bigger batches on an 80GB GPU
TARGET=200 MAX_RETRIES=3 ./scripts/run_all.sh # chase a higher survivor count
STEPS=4 ./scripts/run_all.sh # cheaper/faster iteration
Single-category iteration (tuning prompts):
ONLY=kick ./scripts/run_all.sh
ls /workspace/outputs/processed/
find /workspace/outputs/processed -name "*.wav" | wc -l
Every <id>.wav
ships with a sibling <id>.txt
containing its generation prompt. Spot-check the pairing:
test "$(find /workspace/outputs/processed -name '*.wav' | wc -l)" \
= "$(find /workspace/outputs/processed -name '*.txt' | wc -l)" \
&& echo "wav/txt pairing OK"
On the pod:
cd /workspace
tar czf run.tar.gz outputs/processed
ls -lh run.tar.gz
(We use tar
rather than zip
because the stock RunPod PyTorch image
doesn't ship zip
. tar
is preinstalled everywhere. tar
also recurses into the per-category subdirs automatically.)
In a second Mac terminal (don't close the SSH session yet — you still need it for step 10):
cd ~/Downloads
scp -P <POD_PORT> root@<POD_IP>:/workspace/run.tar.gz .
tar xzf run.tar.gz
open outputs/processed # Finder + QuickLook to audition
<POD_PORT>
and <POD_IP>
are the same ones from your step-1 SSH command. The unpacked structure is one folder per category:
outputs/processed/
kick/ kick-c1da23da.wav kick-e5d95885.wav ...
snare-standard/ snare-standard-...wav
hat-closed/ ...
...etc
This is the step you will forget. The pod bills $0.49/hr for as long as it exists, whether you're using it or not.
Idle overnight≈ $12** Forgotten for a week≈ $80 Forgotten for a month**≈ $350
In the RunPod console, click your pod's card → Terminate. Confirm.
Termination wipes /workspace
. That's fine — you have the zip on your Mac. Next month, you start fresh from step 1.
sas-sample-generator/
├── README.md ← you are here
├── stable_audio_open_batch_oneshot_guide.md ← long-form background
├── requirements.txt
├── docs/SAMPLE_GEN_V3_PLAN.md ← v3 design rationale & locked decisions
├── prompts/
│ ├── <drum-cat>.txt (24 files, 200 prompts each) kick.txt, clap.txt, 808.txt, …
│ └── pitched/<cat>.txt (28 files, 200 prompts each) pianos.txt, basses.txt, pads.txt, …
├── scripts/
│ ├── setup.sh ← bootstrap a pod (venv + deps + SA3 tools)
│ ├── gen_prompts.py ← (re)generate the 200-prompt corpora
│ │
│ ├── run_all.sh ← DRUM pipeline driver
│ ├── categories.txt ← which drum roles to run
│ ├── category_config.py ← per-role negatives + durations
│ ├── drum_gate_config.py ← per-role gate profiles + variant counts
│ ├── gate_drums.py ← drum quality gate + best-of-N
│ ├── list_to_jsonl.py ← .txt → .jsonl (drums)
│ ├── postprocess_oneshots.py ← trim / LUFS / mono / tag
│ │
│ ├── run_pitched.sh ← PITCHED pipeline driver
│ ├── pitched_categories.txt ← which instruments to run
│ ├── pitched_category_config.py ← target pitches, durations, zones, variants
│ ├── gate_pitched.py ← pitch + quality gate + best-of-N
│ ├── list_to_jsonl_pitched.py ← .txt → .jsonl (multi-source fan-out)
│ ├── enrich_pitched.py ← pitch-correct + multi-zone render + manifest
│ ├── pitch_report.py ← measured-vs-target pitch-accuracy report
│ ├── repair_instrument_pitch.py ← post-hoc single-source pitch fix
│ │
│ ├── run_retry.py ← generate+gate+retry-to-target (both pipelines)
│ ├── batch_generate.py ← batched SA3 inference
│ ├── build_pack.py ← deterministic versioned pack zips
│ ├── README-PACKS.md ← pack build + publish runbook
│ └── benchmark.py / sync.sh ← optional helpers
├── tests/ ← CPU-only suite (see "Tests" below)
└── outputs/ ← gitignored; all generated audio lands here
├── raw/<cat>/<id>_vNN.wav ← SA3 candidates (N variants/prompt)
├── raw/<cat>/_metadata/<id>_vNN.json ← seed, model, gen params (stays on pod)
├── gated_drums/<cat>/<id>.wav ← drum gate winners
├── processed/<cat>/<id>.{wav,txt} ← final drums (trimmed/LUFS/mono; ICMT = prompt)
├── gated/<cat>/<id>.{wav,gate.json} ← pitched gate winners + scores
├── _reports/pitch_summary.{json,md} ← pitch-accuracy report (report stage)
└── instruments/<cat>/<id>/ ← final instruments (sources/, zones/, manifest.json)
Sibling pipeline to the drum one. Same Stable Audio 3 generator, different
downstream: a 5-stage quality+pitch gate (prefilter, onset, sustain-plateau,
CREPE/spectral pitch with a sub-bass octave cross-check, BasicPitch polyphony
when the TF/numpy ABI is happy), multi-source pitch correction, and zone
pre-rendering (RubberBand R3, formant-preserving). Emits a per-instrument
manifest.json
consumed by sas-instrument-plugin
.
Multi-source real-pitch sampling (the v3 headline). Wide-range instruments
are generated at 2–4 real source pitches spanning their natural register
(e.g. basses at E1/E2/E3). Enrich assigns each playable zone to its nearest
real source, so no zone is pitch-shifted more than ~half the inter-source gap —
small, artifact-free shifts instead of stretching one root ±12 semitones. Toggle
with SAS_MULTI_SOURCE=1
(default; 0
reproduces v1 single-source for an A/B).
28 categories ship, 200 prompts each (5,600 prompts):
| Category | Source pitch(es) | Dur | Var | Notes |
|---|---|---|---|---|
synths |
||||
| C3 | 5.0s | 6 | analog mono, FM, wavetable, acid | |
lead-supersaw |
||||
| C4·C5 | 5.0s | 6 | multi-source | |
lead-fm |
||||
| C4 | 5.0s | 6 | ||
lead-acid |
||||
| C3 | 4.0s | 6 | 303-style | |
pluck-synth |
||||
| C4 | 3.0s | 6 | ||
plucks |
||||
| C4 | 3.0s | 6 | ||
keys |
||||
| C3 | 5.0s | 6 | Rhodes, Wurli, clav, DX7 | |
pianos |
||||
| C2·C3·C4·C5 | 5.0s | 6 | multi-source, step 2 | |
organs |
||||
| C3 | 8.0s | 8 | open-ended | |
basses |
||||
| E1·E2·E3 | 6.0s | 8 | multi-source, 30 Hz floor | |
808-bass |
||||
| C2·C3 | 6.0s | 8 | multi-source, 25 Hz floor | |
reese-bass |
||||
| E2·E3 | 6.0s | 8 | multi-source | |
pads |
||||
| C3 | 12.0s | 8 | open-ended, step 2 | |
strings |
||||
| A2·A3·A4 | 8.0s | 8 | multi-source, open-ended, step 2 | |
brass |
||||
| A2·A3·A4 | 6.0s | 8 | multi-source, open-ended | |
winds |
||||
| D3·D4·D5 | 5.0s | 8 | multi-source, open-ended, step 2 | |
accordion |
||||
| F#3 | 6.0s | 8 | open-ended | |
bells |
||||
| C5 | 4.0s | 6 | glockenspiel, FM, music box | |
mallets |
||||
| C4 | 3.0s | 6 | marimba, vibes, kalimba | |
percussion |
||||
| C4 | 2.0s | 6 | tonal/tuned only | |
timpani |
||||
| F2·F3 | 4.0s | 8 | multi-source, tuned drum | |
guitars |
||||
| E2·E3·E4 | 4.0s | 6 | multi-source | |
banjos |
||||
| C4·G4 | 3.0s | 6 | multi-source | |
mandolin |
||||
| A4 | 3.0s | 6 | ||
harp |
||||
| C3·C4·C5 | 4.0s | 6 | multi-source | |
sitar |
||||
| C4 | 4.0s | 6 | ||
vocals |
||||
| A3 | 5.0s | 20 | choir, chops, vocoded (SA3 vocals are weak) | |
choir |
||||
| A3 | 10.0s | 18 | open-ended |
"Var" = SA3 candidates per (prompt × source pitch); the gate keeps the best.
Pitches use the C4 = MIDI 60 convention; multi-source counts apply with
SAS_MULTI_SOURCE=1
(default). Full config in scripts/pitched_category_config.py.
To subset which categories run, edit scripts/pitched_categories.txt — comment any line with
#
to skip that category.scripts/setup.sh
installs everything the pitched pipeline needs:
- All Python deps from
requirements.txt
(librosa, torchcrepe, basic-pitch, pyloudnorm, soxr, …) - All system packages —
rsync
(transfer to Mac),tmux
(long sessions survive disconnect),rubberband-cli
(enrich shells out to it directly for formant-preserving pitch shift),ffmpeg
(audio inspection),zip
/unzip
(archives) - The compiled stable-audio-tools from git main (the PyPI release doesn't support SA3-medium)
- CUDA-12.8 torch wheels (Blackwell-compatible, also works on older Hopper/Ampere/Ada)
You should never need to apt-get or pip install anything on a pod
after running setup.sh
. If you do, treat it as a bug in setup.sh
and add it there.
brew install rubberband # pitch-shift backend
cd ~/path/to/sas-sample-generator # your local repo
pip install -r requirements.txt # one-time, ~5 min
Designed to be safely repeatable from a cold start. The whole pipeline: ~15 min setup + ~30 min generate + ~10 min gate + ~30 min enrich + transfer.
runpod.io/console/pods → Deploy → GPU Pod:
| Setting | Value | Why |
|---|---|---|
| GPU | RTX A6000 / 4090 / 5090 / L40S / A100 (24+ GB VRAM) | |
| SA3-medium fits in 16 GB; 24 GB gives headroom | ||
| Template | most recent RunPod PyTorch with CUDA 12.x | |
| matches our cu128 wheels | ||
| Container Disk | ||
| 100 GB | ||
| persistent across pod restart; holds venv + HF model cache + outputs | ||
| Network Volume | ||
| None | ||
| RunPod's "migrate to new host" flow has been known to attach a tiny 10 GB network volume — don't let it. We use container disk only | ||
| Expose | SSH (port 22, default) |
Critical: the field is named "Container Disk" — the persistent SSD. Do NOT confuse with "Network Volume" or "Volume Disk".
Click Deploy On-Demand. Wait ~30 sec for status RUNNING
.
Copy the SSH command from Connect → SSH over exposed TCP. It looks like:
ssh root@<POD_IP> -p <POD_PORT> -i ~/.ssh/id_ed25519
ssh root@<POD_IP> -p <POD_PORT> -i ~/.ssh/id_ed25519
Type yes
on first connect. If Permission denied
: ssh-add ~/.ssh/id_ed25519
.
cd /workspace && \
git clone https://github.com/shiehn/sas-sample-generator.git && \
cd /workspace/sas-sample-generator && \
./scripts/setup.sh 2>&1 | tee /root/setup.log
Look for these "OK" markers near the end of setup.log:
[setup] rsync: rsync version 3.x.x ...
[setup] tmux: tmux 3.x
[setup] rubberband: /usr/bin/rubberband
[setup] ffmpeg: ffmpeg version ...
[setup] cuda available: True
[setup] device: NVIDIA RTX 4090
[setup] done.
If cuda available: False
→ you deployed onto a CPU template; terminate, redeploy with PyTorch GPU.
source /root/.venv/bin/activate
hf auth login
Paste your HF read token. Answer n
to "Add token as git credential".
First time on a new HF account: in your browser, visit
stabilityai/stable-audio-3-medium
and accept BOTH the SA3 community license AND the underlying Gemma
terms. Without both, the model download fails with GatedRepoError
. Token IS your account — accept while logged into the same HF account your token belongs to.
Verify access (under 5 seconds):
hf download stabilityai/stable-audio-3-medium model_config.json --local-dir /tmp/sa3-test
ls /tmp/sa3-test/
If model_config.json
is listed: cleared.
The default ships with all 16 categories enabled. For a quick test or a focused run:
nano scripts/pitched_categories.txt
After the test, restore with git checkout scripts/pitched_categories.txt
.
Run generate + gate + report on the GPU pod; enrich runs later on your Mac (it's CPU-bound).
tmux new -s pitched
cd /workspace/sas-sample-generator
source /root/.venv/bin/activate
source /workspace/.bash_env
STAGES=generate,gate,report ./scripts/run_pitched.sh 2>&1 | tee outputs/run.log
run_pitched.sh
builds the JSONLs (multi-source fan-out — one job per
prompt × source pitch, with the per-category variants
count), then
run_retry.py
drives batched generation + gate_pitched.py
with the same
retry-to-target loop as drums (re-roll all-fail prompts up to
MAX_RETRIES=2
, top up until TARGET=150
instruments survive). The report
stage then writes outputs/_reports/pitch_summary.{json,md}
so you can read measured-vs-target pitch accuracy before transferring.
Detach with Ctrl-b d. The run keeps going even if SSH drops.
Reattach later (from any new SSH session — possibly a new IP/port if migrated):
tmux attach -t pitched
Monitor from outside tmux:
tail -f outputs/run.log
nvidia-smi
Throughput: batched generation is ~1–3 h; the CPU gate is the bottleneck
(torchcrepe + basic-pitch + librosa per variant). For all 28 categories at full
variant counts, budget up to ~a day of pod time. Dial variant counts down in
pitched_category_config.py
, or run a subset, if wall-clock matters.
Useful knobs: ONLY=pianos LIMIT=5
(tiny slice), BATCH_SIZE=32
(80 GB GPU),
MAX_RETRIES=0
(one pass, no retry), INIT_ANCHOR=1
(experimental init_audio
pitch anchoring — default off), SAS_MULTI_SOURCE=0
(single-source A/B).
When STAGES=generate,gate
finishes, before transferring:
for d in outputs/gated/*/; do
cat=$(basename "$d")
[[ "$cat" == "_failures" ]] && continue
passed=$(ls "$d"*.wav 2>/dev/null | wc -l)
failed=$(ls "$d/_failures"/*.json 2>/dev/null | wc -l)
total=$((passed + failed))
if [[ $total -gt 0 ]]; then
rate=$((passed * 100 / total))
printf " %-18s passed=%3d failed=%3d pass-rate=%d%%\n" "$cat" "$passed" "$failed" "$rate"
fi
done
echo "Total gated: $(find outputs/gated -name '*.wav' -not -path '*_failures*' | wc -l)"
du -sh outputs/gated
Expected (with the current thresholds, 2026-05-22): 80–100% pass rate per category. If a category is below 50%, look in outputs/gated/<cat>/_failures/<id>.json
to see why prompts are failing.
The pod has rsync
installed by setup.sh
. On your Mac:
mkdir -p ~/sas-pitched-out
rsync -avzP -e "ssh -p <POD_PORT> -i ~/.ssh/id_ed25519" \
root@<POD_IP>:/workspace/sas-sample-generator/outputs/gated/ \
~/sas-pitched-out/gated/
For ~4 GB at typical RunPod / home upload speeds, expect 10–20 min.
rsync
resumes on interruption — just re-run the same command if SSH drops.
Verify locally:
find ~/sas-pitched-out/gated -name '*.wav' -not -path '*_failures*' | wc -l # should match step 7
du -sh ~/sas-pitched-out/gated
cd ~/path/to/sas-sample-generator
git pull # pick up any threshold updates
pip install -r requirements.txt # idempotent
export SAS_OUTPUTS_DIR=~/sas-pitched-out
STAGES=enrich ./scripts/run_pitched.sh
Enrich groups the surviving source pitches of each prompt into one
multi-source instrument under ~/sas-pitched-out/instruments/<cat>/<id>/
:
sources/<midi>.wav
—24-bit real source samples (1 per source pitch: 2–4 for multi-source categories), pitch-corrected + normalized to -20 LUFSzones/<midi>.wav
—16-bit WAV pre-rendered playable zones (every 2–3 semitones), each rendered from its nearest real source (was 24-bit FLAC pre-v3; WAV is memory-mapped by the Tracktion sampler with no decode stall)manifest.json
—schema_version: 1
, disjoint ordered zonesprompt.txt
— original positive prompt
It parallelizes across instruments (ProcessPoolExecutor
) and shells out to the
rubberband
CLI for pitch shifts (brew install rubberband
on the Mac).
runpod.io/console/pods → pod card → Terminate (NOT Stop). Compute billing stops immediately. Volume billing (if any auto-created Network Volume snuck in) stops only on Terminate.
Then runpod.io/console/user/storage → Network Volumes → check for any outside_*
orphan from a migration → Delete.
RunPod sometimes moves your pod to a different physical host mid-run. Symptoms:
- SSH connection drops mid-session
Connection refused
when reconnecting on the same IP/port- Pod shows "Stopped" briefly, then "Running" again at a new address
The pod, the venv, the HF cache, and all outputs/ data persist on the container disk as long as the pod isn't terminated. You just need fresh connection info.
- Open RunPod console → click your pod card → check the Connect → SSH over exposed TCP panel for the new IP and port (both can change). - Clear the old SSH host key on your Mac:
ssh-keygen -R '[<NEW_IP>]:<NEW_PORT>'
- SSH back in with the new details. Run
tmux attach -t pitched
— your run is still going. - If you were mid-rsync, just re-run the rsync command with the new
-p <NEW_PORT>
androot@<NEW_IP>
— it picks up where it stopped.
This bit us twice this session (May 2026). Symptoms are unambiguous; recovery takes 30 seconds.
SA3 doesn't reliably hit a target pitch from a text prompt — that's a known limitation of text-to-audio diffusion models. Enrich now compensates intelligently:
| If measured pitch is… | Enrich does… | Result |
|---|---|---|
within max_correction_semitones of target (default 3) |
||
| shifts all the way to the original target | Sample is at exactly the prompted MIDI note; preserves prompt semantics | |
| further away than that | snaps to the nearest integer semitone | |
| Sample is at the closest "logical" MIDI note (always ≤50 cent shift, no audible artifacts) |
Either way: every output sample lands on an exact MIDI semitone with the smallest possible pitch shift. The zone rendering loop centers on that effective root, so the sampler always has a clean zone at the sample's actual pitch.
max_correction_semitones
is per-category in scripts/pitched_category_config.py
. Set to 0
to always snap to nearest semitone (never shift to target). Set to a large value (24+) to always shift to target.
| Stage | What it checks | What rejection means |
|---|---|---|
prefilter |
||
| Clipping, dead channels, all-silent buffers | Sample is broken at the file level | |
onset |
||
| Time from buffer start to first transient | slow_onset → SA3 added a fade-in / silence preamble (>300ms) |
|
sustain |
||
| Longest plateau within 12 dB of peak RMS | short_stab → audio decays too fast or has no held region |
|
pitch |
||
| CREPE periodicity + measured-vs-target | no_voiced_frames / unconfident → unpitched output; (tolerance 9999) so enrich's snap-to-nearest-semitone can do its jobwrong_pitch is OFF by default |
|
polyphony |
||
| BasicPitch note count after vibrato bypass | Disabled when TF/numpy ABI mismatches (common on RunPod) — gate prints one warning at start, then runs without it |
The gate scores winners by confidence² × exp(-|cents|/50) × sus_quality
. With wrong_pitch
disabled, the pitch term collapses to ~0 for far-off samples, so all variants of a prompt can tie at score=0.000 — the picker just grabs v00 by default in that case. Acceptable for now.
outputs/
├── raw/<category>/ ← SA3 candidates (N variants per prompt × source pitch)
│ ├── <id>_v00.wav, <id>_v01.wav, ...
│ └── _metadata/<id>_v0N.json ← seed, model, generation_seconds, batch_size
├── gated/<category>/ ← gate winners only
│ ├── <id>.wav ← chosen variant
│ ├── <id>.gate.json ← per-gate scores + measured pitch
│ └── _failures/<id>.json ← prompts where ALL variants rejected
├── _reports/pitch_summary.{json,md} ← measured-vs-target accuracy (report stage)
└── instruments/<category>/<instrument-id>/ ← final library, sampler-consumable
├── sources/<midi>.wav ← 24-bit real source pitches (1–4)
├── zones/<midi>.wav ← 16-bit WAV pre-rendered zones
├── manifest.json ← schema_version 1, disjoint ordered zones
└── prompt.txt ← original positive prompt
The v3 full run is much bigger than a quick slice: 28 categories × 200 prompts × 2–4 source pitches × 6–20 variants = tens of thousands of candidates. Generation is cheap and batched; the CPU gate dominates wall-clock (per-variant CREPE + basic-pitch + librosa). On an A100 80 GB (~~$0.89–1.89/hr) the whole pitched campaign runs in roughly ~~$15–35**, with drums adding a few dollars. The original $200 budget is never the binding constraint.half a day to a day → **
To cut cost/time: run fewer categories (ONLY=…
), lower variant counts in
pitched_category_config.py
, or set MAX_RETRIES=0
. A single-category slice
(ONLY=pianos LIMIT=5
) is a few minutes and a few cents — use it to dial prompts before committing the full campaign. Enrich is local on your Mac → $0.
prompts/pitched/<category>.txt # one prompt per line, # comments
scripts/pitched_categories.txt # which categories to run (comment to skip)
scripts/pitched_category_config.py # per-category target pitch, duration, sustain thresholds, etc.
Fast iteration on a single category:
ONLY=<cat> LIMIT=10 STAGES=generate,gate,report ./scripts/run_pitched.sh
Re-author / extend the corpora with scripts/gen_prompts.py (deterministic, preserves existing lines so hashes stay stable, holds the EDM ratio).
--skip-existing
in batch_generate.py
means re-running won't regenerate samples you already have — only new prompt lines hit the GPU.
STAGES=generate,gate,enrich,report
— pitched only; comma-separated subset (default: all four)STEPS=8
— diffusion steps (SA3 converges fast)BATCH_SIZE=16
— generations per model call (32–64 on an 80 GB GPU)TARGET=150
— per-category minimum surviving samples (0
= no top-up)MAX_RETRIES=2
— re-roll rounds for all-fail prompts before topping up (0
= one pass)ONLY="a b"
/LIMIT=N
— run a subset of categories / cap prompts per category (test slice; auto-setsTARGET=0
)GATE=0
— drums only; skip the quality gate (legacy keep-all)INIT_ANCHOR=1
— pitched only; experimental init_audio pitch anchoring (default off)SAS_MULTI_SOURCE=0
— pitched only; disable multi-source (single root, span 12; for an A/B)SAS_OUTPUTS_DIR=/path
— override outputs location (default/workspace/outputs
on pod,./outputs
local)
Per-prompt variant counts are per-category now (in drum_gate_config.py
/
pitched_category_config.py
), not a global env var.
The sas-instrument-plugin
walks outputs/instruments/<cat>/<id>/
, parses
each manifest.json
, and uses the zones[]
array to call
host.setTrackInstrumentSampler
on the chosen track. Disjoint zones +
per-zone root_midi
mean the engine pitch-shifts the nearest pre-rendered zone for any played MIDI note, with the smart-corrected sample as the unshifted root. Since enrich locks every sample to an integer MIDI semitone, the sampler never has to deal with off-pitch sources.
Generated audio ships to the Signals & Sorcery app as versioned pack zips built by scripts/build_pack.py — deterministic (fixed mtimes + sorted entries → byte-identical zip + sha256 from an identical source tree):
| Pack | Source dir | Zip | Approx size (v3) |
|---|---|---|---|
| Drums | outputs/processed/ |
||
sas-drum-pack-v{N}.zip |
|||
| ~2–3 GB (24 roles) | |||
| Instruments | outputs/instruments/ |
||
sas-instrument-pack-v{N}.zip |
|||
| ~20–24 GB (28 cats × ~150) | |||
| Loops | outputs/loops/ |
||
sas-loop-library-v{N}.zip |
|||
| external loop library (not generated here) |
python scripts/build_pack.py --smoke-test # cheap determinism check
python scripts/build_pack.py --pack drums --version 1 # real build → ./dist/
python scripts/build_pack.py --pack instruments --version 1
Ready-to-consume directories (no zip / download step). To produce the two
libraries as folders you can drop straight into the app's install location —
<userData>/sample-packs/{drums,instruments}/
— use --format dir
, or the
wrapper. Each emits dist/<subdir>/
with the _pack-version.json
marker at its root: exactly the tree the app expects on disk, ready out of the box.
DRUM_VERSION=3 INSTRUMENT_VERSION=3 ./scripts/build_libraries.sh # → dist/drums/, dist/instruments/
rsync -a dist/drums/ ~/Library/Application\ Support/signals-and-sorcery/sample-packs/drums/
rsync -a dist/instruments/ ~/Library/Application\ Support/signals-and-sorcery/sample-packs/instruments/
The marker version
must equal that pack's expectedVersion
in
sas-app/src/shared/constants/sample-packs.ts
(plain string match) or the app
treats the folder as a different version. For GCP distribution use --format zip
(default) and follow the publish runbook below.
The build prints sizeBytes
sha256
; paste those into
sas-app/src/shared/constants/sample-packs.ts
(bump expectedVersion
- the download URL) so the app detects the new version and prompts a re-download. Never overwrite a published version — always bump. Full publish runbook (GCP upload, version rules, what's in/excluded, the v3 WAV-zone note): scripts/README-PACKS.md.
CPU-only suite — no GPU, no Stable Audio. It covers everything around model
generation: pitch detection, both gates, multi-source enrich, retry-to-target
helpers, config/prompt/enable wiring, loudness targets, list_to_jsonl
, the pitch report, and the deterministic pack builder.
./tests/run_tests.sh
Keep it green — run before and after any pipeline-code change. Deps live in the
project .venv
(numpy, soundfile, pyloudnorm, librosa); the rubberband
CLI is optional (the enrich test degrades gracefully without it).
| Symptom | Most likely cause | Fix |
|---|---|---|
Permission denied (publickey) on ssh |
||
| private key not loaded into agent | ssh-add ~/.ssh/id_ed25519 |
|
setup.sh hangs at Installing collected packages: for >5 min |
||
something redirected the venv onto /workspace (MooseFS); script defaults to /root/.venv for a reason |
||
check echo $VENV_DIR — should be /root/.venv . If overridden, unset it and re-run |
||
cuda available: False after setup.sh |
||
| picked a CPU template | terminate; re-deploy with PyTorch GPU template | |
huggingface_hub.utils._errors.GatedRepoError |
||
| didn't accept the SA3 license (or the Gemma terms it inherits) | visit the | |
batch_generate.py
errors CUDA out of memory
--default-duration
or --num-waveforms-per-prompt 1
one shot, no loop, no hi hats, no snare
to every promptdry, no reverb, no ambience
to promptsnegative_prompt
may be excluding the target — bug in scripts/category_config.py
scripts/category_config.py
, audit the negative for that category; nothing in it should match the target soundrun_all.sh
skips a categoryprompts/<cat>.txt
is missing or has only commentsprompts/
has the .txt file and contains non-comment linesrejected/
with reason=lufs_unmeasurable
pyloudnorm
and check they cluster near --target-lufs
. The pipeline log at /workspace/run.log
(from step 6) records the as-run postprocess flags for each category.tmux new -s sas
BEFORE running, reattach with tmux attach -t sas
Stable Audio 3 needs only ~8 diffusion steps (vs 120 for SAO 1.0), so generation
is fast and batched. Since v3 added the quality gate + retry-to-target, the
CPU gate — not generation — sets the wall-clock, and the per-category
minimum-survivor target (TARGET=150
) re-rolls until each category fills.
| Run shape | Where the time goes | Rough cost |
|---|---|---|
Single-category slice (ONLY=… LIMIT=… ) |
||
| a few minutes, gate-bound | a few cents | |
| Full drum run (24 roles, gate + retry) | ||
| ~1 h on a big GPU | ~$1–3 | |
| Full pitched run (28 cats, multi-source, gate + retry) | ||
| ½–1 day, gate-bound | ~$15–35 |
Keep a pod alive between iterations and --skip-existing
skips anything already generated — so re-running after a prompt tweak only pays for the new lines. Terminate as soon as generate + gate finish (enrich is local and free).
This is a public repo. Never commit:
- Hugging Face tokens
- RunPod API keys
- B2 / R2 / S3 keys
- SSH private keys
- The generated WAVs (gitignored already)
.gitignore
covers .env
, *.token
, *.secret
, outputs/*
. If you ever
git add
a file containing a secret by mistake: rotate the secret first,
then git rm
- commit + push. Treat anything that hit
main
as compromised.
stable_audio_open_batch_oneshot_guide.md covers:
-
Why Stable Audio 3 vs alternatives (and what changed from SAO 1.0)
-
Prompt-design rules and category-specific templates
-
Optional persistent Network Volume layout (for users running multiple times per week)
-
Optional rclone push to Backblaze B2 / Cloudflare R2 instead of
scp -
Optional custom Docker image
-
Cost-control deep dive
This is one piece of a larger ecosystem around the Signals & Sorcery audio app.
Plugin SDK & templates
sas-plugin-sdk— types, components, and hooks for building generator pluginssas-plugin-template— starter template for new pluginssas-chat-plugin— in-app conversational agent
Built-in plugins
sas-stems-plugin— default AI audio-from-text + stem-splitting pluginsas-loops-plugin— default audio loop / sample pluginsas-synth-plugin— default synth pluginsas-texture-plugin— texture/ambient pluginsas-recorder-plugin— line-in recording plugin
Audio tooling
sas-audio-processor— audio processing utilitiesSignals2Surge— synth patch transfer to Surge XT
Infrastructure
signals-and-sorcery-server— DAWNet API + WebSocket serversignals-and-sorcery-docs— public docs
Other
signalsandsorcery-game-ui— LLM-powered RPG frontendSignalsAndSorcery— earlier VueJS + Web Audio sample arrangement toolErrantry— E2E testing for agent-facing CLIs (drives this project's CLI too)