Transcribe.cpp – ggml speech-to-text inference engine Transcribe.cpp, a C/C++ speech-to-text inference library, has been released supporting 16 model families and 60+ variants via GGUF models on the ggml runtime. It offers Metal, Vulkan, and CUDA backends for GPU acceleration and a tinyBLAS-accelerated CPU path, with all models numerically verified and WER-tested. C/C++ speech-to-text inference library. Runs diverse STT model families via GGUF https://github.com/ggerganov/gguf models on the ggml https://github.com/ggml-org/ggml runtime, with Metal, Vulkan, and CUDA backends for fast GPU inference plus a tinyBLAS-accelerated CPU path. 16 model families and 60+ variants, streaming and batch. Every model we publish under handy-computer https://huggingface.co/handy-computer is numerically verified and WER-tested against its reference implementation Supported models: | Family | Variants | Docs | |---|---|---| | Parakeet | 10 variants: TDT, RNN-T, CTC, TDT+CTC 110M–1.1B | | canary-1b , canary-1b-v2 , canary-1b-flash , canary-180m-flash docs/models/canary.md /handy-computer/transcribe.cpp/blob/main/docs/models/canary.md canary-qwen-2.5b FastConformer + Qwen3-1.7B SALM docs/models/canary-qwen-2.5b.md /handy-computer/transcribe.cpp/blob/main/docs/models/canary-qwen-2.5b.md tiny through large-v3-turbo , plus .en siblings docs/models/whisper.md /handy-computer/transcribe.cpp/blob/main/docs/models/whisper.md gigaam-v3-{e2e-rnnt,e2e-ctc,rnnt,ctc} docs/models/gigaam.md /handy-computer/transcribe.cpp/blob/main/docs/models/gigaam.md moonshine-tiny , moonshine-base docs/models/moonshine.md /handy-computer/transcribe.cpp/blob/main/docs/models/moonshine.md moonshine-streaming-{tiny,small,medium} docs/models/moonshine-streaming.md /handy-computer/transcribe.cpp/blob/main/docs/models/moonshine-streaming.md qwen3-asr-0.6b , qwen3-asr-1.7b docs/models/qwen3-asr.md /handy-computer/transcribe.cpp/blob/main/docs/models/qwen3-asr.md cohere-transcribe-03-2026 docs/models/cohere-transcribe-03-2026.md /handy-computer/transcribe.cpp/blob/main/docs/models/cohere-transcribe-03-2026.md sensevoice-small docs/models/sensevoice-small.md /handy-computer/transcribe.cpp/blob/main/docs/models/sensevoice-small.md fun-asr-nano-2512 , fun-asr-mlt-nano-2512 docs/models/fun-asr-nano.md /handy-computer/transcribe.cpp/blob/main/docs/models/fun-asr-nano.md nemotron-speech-streaming-en-0.6b docs/models/nemotron-speech-streaming-en-0.6b.md /handy-computer/transcribe.cpp/blob/main/docs/models/nemotron-speech-streaming-en-0.6b.md nemotron-3.5-asr-streaming-0.6b multilingual, 40 locales docs/models/nemotron-3.5-asr-streaming-0.6b.md /handy-computer/transcribe.cpp/blob/main/docs/models/nemotron-3.5-asr-streaming-0.6b.md granite-4.0-1b-speech , granite-speech-4.1-2b{,-plus,-nar} docs/models/granite-speech.md /handy-computer/transcribe.cpp/blob/main/docs/models/granite-speech.md voxtral-mini-3b-2507 , voxtral-small-24b-2507 audio-LLM; transcription + translation docs/models/voxtral.md /handy-computer/transcribe.cpp/blob/main/docs/models/voxtral.md voxtral-mini-4b-realtime-2602 streaming audio-LLM docs/models/voxtral-realtime.md /handy-computer/transcribe.cpp/blob/main/docs/models/voxtral-realtime.md medasr Conformer + CTC, English medical-dictation, gated docs/models/medasr.md /handy-computer/transcribe.cpp/blob/main/docs/models/medasr.md Per-variant model cards live under docs/models/ /handy-computer/transcribe.cpp/blob/main/docs/models . cmake -B build cmake --build build Metal is enabled automatically on Apple Silicon. For Vulkan Linux/Windows : Ubuntu/Debian sudo apt install build-essential cmake libvulkan-dev glslc libopenblas-dev cmake -B build -DTRANSCRIBE VULKAN=ON cmake --build build For CUDA Linux + NVIDIA GPU : requires the CUDA toolkit nvcc on PATH cmake -B build -DTRANSCRIBE CUDA=ON cmake --build build libopenblas-dev is optional but recommended. It accelerates the host-side decoder ~10-15x. Without it the build falls back to a scalar path automatically. tinyBLAS Justine Tunney's llamafile sgemm kernels is on by default. To build the quantization tool: cmake -B build -DTRANSCRIBE BUILD TOOLS=ON cmake --build build Pre-built GGUFs for all supported models are hosted on Hugging Face under handy-computer https://huggingface.co/handy-computer . Each per-model doc linked in the table above includes direct download links for every quant. Convert from source only if you need a different dtype or a checkpoint that isn't pre-built. The converter loads directly from NVIDIA's NeMo checkpoints via ASRModel.from pretrained . Requires uv https://docs.astral.sh/uv/ ; the parakeet env ships NeMo and its deps. uv run --project scripts/envs/parakeet \ scripts/convert-parakeet.py nvidia/parakeet-tdt-0.6b-v2 This writes models/parakeet-tdt-0.6b-v2/parakeet-tdt-0.6b-v2-F32.gguf following the llama.cpp-style