Transcribe.cpp – ggml speech-to-text inference engine

Transcribe.cpp, a C/C++ speech-to-text inference library, has been released supporting 16 model families and 60+ variants via GGUF models on the ggml runtime. It offers Metal, Vulkan, and CUDA backends for GPU acceleration and a tinyBLAS-accelerated CPU path, with all models numerically verified and WER-tested.

C/C++ speech-to-text inference library. Runs diverse STT model families via GGUF https://github.com/ggerganov/gguf models on the ggml https://github.com/ggml-org/ggml runtime, with Metal, Vulkan, and CUDA backends for fast GPU inference plus a tinyBLAS-accelerated CPU path. 16 model families and 60+ variants, streaming and batch. Every model we publish under handy-computer https://huggingface.co/handy-computer is numerically verified and WER-tested against its reference implementation Supported models: | Family | Variants | Docs | |---|---|---| | Parakeet | 10 variants: TDT, RNN-T, CTC, TDT+CTC 110M–1.1B | | canary-1b , canary-1b-v2 , canary-1b-flash , canary-180m-flash docs/models/canary.md /handy-computer/transcribe.cpp/blob/main/docs/models/canary.md canary-qwen-2.5b FastConformer + Qwen3-1.7B SALM docs/models/canary-qwen-2.5b.md /handy-computer/transcribe.cpp/blob/main/docs/models/canary-qwen-2.5b.md tiny through large-v3-turbo , plus .en siblings docs/models/whisper.md /handy-computer/transcribe.cpp/blob/main/docs/models/whisper.md gigaam-v3-{e2e-rnnt,e2e-ctc,rnnt,ctc} docs/models/gigaam.md /handy-computer/transcribe.cpp/blob/main/docs/models/gigaam.md moonshine-tiny , moonshine-base docs/models/moonshine.md /handy-computer/transcribe.cpp/blob/main/docs/models/moonshine.md moonshine-streaming-{tiny,small,medium} docs/models/moonshine-streaming.md /handy-computer/transcribe.cpp/blob/main/docs/models/moonshine-streaming.md qwen3-asr-0.6b , qwen3-asr-1.7b docs/models/qwen3-asr.md /handy-computer/transcribe.cpp/blob/main/docs/models/qwen3-asr.md cohere-transcribe-03-2026 docs/models/cohere-transcribe-03-2026.md /handy-computer/transcribe.cpp/blob/main/docs/models/cohere-transcribe-03-2026.md sensevoice-small docs/models/sensevoice-small.md /handy-computer/transcribe.cpp/blob/main/docs/models/sensevoice-small.md fun-asr-nano-2512 , fun-asr-mlt-nano-2512 docs/models/fun-asr-nano.md /handy-computer/transcribe.cpp/blob/main/docs/models/fun-asr-nano.md nemotron-speech-streaming-en-0.6b docs/models/nemotron-speech-streaming-en-0.6b.md /handy-computer/transcribe.cpp/blob/main/docs/models/nemotron-speech-streaming-en-0.6b.md nemotron-3.5-asr-streaming-0.6b multilingual, 40 locales docs/models/nemotron-3.5-asr-streaming-0.6b.md /handy-computer/transcribe.cpp/blob/main/docs/models/nemotron-3.5-asr-streaming-0.6b.md granite-4.0-1b-speech , granite-speech-4.1-2b{,-plus,-nar} docs/models/granite-speech.md /handy-computer/transcribe.cpp/blob/main/docs/models/granite-speech.md voxtral-mini-3b-2507 , voxtral-small-24b-2507 audio-LLM; transcription + translation docs/models/voxtral.md /handy-computer/transcribe.cpp/blob/main/docs/models/voxtral.md voxtral-mini-4b-realtime-2602 streaming audio-LLM docs/models/voxtral-realtime.md /handy-computer/transcribe.cpp/blob/main/docs/models/voxtral-realtime.md medasr Conformer + CTC, English medical-dictation, gated docs/models/medasr.md /handy-computer/transcribe.cpp/blob/main/docs/models/medasr.md Per-variant model cards live under docs/models/ /handy-computer/transcribe.cpp/blob/main/docs/models . cmake -B build cmake --build build Metal is enabled automatically on Apple Silicon. For Vulkan Linux/Windows : Ubuntu/Debian sudo apt install build-essential cmake libvulkan-dev glslc libopenblas-dev cmake -B build -DTRANSCRIBE VULKAN=ON cmake --build build For CUDA Linux + NVIDIA GPU : requires the CUDA toolkit nvcc on PATH cmake -B build -DTRANSCRIBE CUDA=ON cmake --build build libopenblas-dev is optional but recommended. It accelerates the host-side decoder ~10-15x. Without it the build falls back to a scalar path automatically. tinyBLAS Justine Tunney's llamafile sgemm kernels is on by default. To build the quantization tool: cmake -B build -DTRANSCRIBE BUILD TOOLS=ON cmake --build build Pre-built GGUFs for all supported models are hosted on Hugging Face under handy-computer https://huggingface.co/handy-computer . Each per-model doc linked in the table above includes direct download links for every quant. Convert from source only if you need a different dtype or a checkpoint that isn't pre-built. The converter loads directly from NVIDIA's NeMo checkpoints via ASRModel.from pretrained . Requires uv https://docs.astral.sh/uv/ ; the parakeet env ships NeMo and its deps. uv run --project scripts/envs/parakeet \ scripts/convert-parakeet.py nvidia/parakeet-tdt-0.6b-v2 This writes models/parakeet-tdt-0.6b-v2/parakeet-tdt-0.6b-v2-F32.gguf following the llama.cpp-style <slug -<QUANT .gguf naming convention. Pass a local .nemo path or extracted directory for offline conversion. The transcribe-quantize tool produces smaller models from the reference GGUF. Available presets: F16 , Q8 0 , Q6 K , Q5 K M , Q4 K M . build/bin/transcribe-quantize \ models/parakeet-tdt-0.6b-v2/parakeet-tdt-0.6b-v2-F32.gguf \ models/parakeet-tdt-0.6b-v2/parakeet-tdt-0.6b-v2-Q4 K M.gguf \ --quant Q4 K M build/bin/transcribe-cli -m models/parakeet-tdt-0.6b-v2/parakeet-tdt-0.6b-v2-F32.gguf samples/jfk.wav Input must be 16 kHz mono WAV. Use ffmpeg or sox to convert other formats: ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav Official bindings wrap the C API for other languages: | Language | Path | |---|---| | Python | | bindings/typescript /handy-computer/transcribe.cpp/blob/main/bindings/typescript bindings/rust/transcribe-cpp /handy-computer/transcribe.cpp/blob/main/bindings/rust/transcribe-cpp bindings/swift /handy-computer/transcribe.cpp/blob/main/bindings/swift See docs/bindings.md /handy-computer/transcribe.cpp/blob/main/docs/bindings.md for how the bindings are generated and kept in sync with the header. cd build && ctest Some tests require a real model file. Enable them with: cmake -B build -DTRANSCRIBE BUILD REAL MODEL TESTS=ON cmake --build build TRANSCRIBE PARAKEET GGUF=path/to/model.gguf ctest --test-dir build For the model-family smoke-test, numerical-validation, and benchmark pattern expected of new ports, see docs/model-family-testing.md /handy-computer/transcribe.cpp/blob/main/docs/model-family-testing.md . A huge thanks to Mozilla AI https://www.mozilla.ai/ and their BiR Program https://www.mozilla.ai/company/bir . This whole project started out as an idea, not even an implementation direction. It was a research project in how to accelerate transcription models across all platforms as easily as possible. The BiR program and Davide helped support the research, and my eventual direction to choose to implement and inference engine backed by ggml. And also experimenting with automated model porting using agentic programming tools. Hugging Face https://huggingface.co/ provided the project extra storage so we can host all of the models which we support. We want to provide canonical references for as many models as reasonably possible, the support from Hugging Face helps to enable this. Modal https://modal.com/ helped to provide GPU credits so the project can test and validate the projects implementations match the transformers or nemo reference source. This is critical to ensuring that we have as close to a production grade inference engine that works everywhere. We believe it is critical to have accurate transcriptions and the only way to ensure this is through long running WER checks which Modal helps to provide. Every model published under handy-computer https://huggingface.co/handy-computer on hugggingface has had the WER checked, so you can trust the results. And if there are any regressions, you bet we will be fixing them. Blacksmith https://www.blacksmith.sh/ provides many of the CI runners for this project. That helps to keep transcribe.cpp well tested and ensure our releases are as smooth as possible. The CI is quick and a drop in replacement for the standard Github Actions runners. I ran into limits very fast with them and super happy upon reaching out to Blacksmith they were able to provide runners for the project. include/transcribe.h Public C API single header src/ Library internals C++17 src/arch/parakeet/ Parakeet family implementation src/arch/cohere/ Cohere Transcribe family implementation examples/cli/ CLI binary source tools/transcribe-quantize/ Quantization tool source bindings/ Python, TypeScript, Rust, and Swift bindings docs/ Porting and validation guidance scripts/ Python converter + test tooling ggml/ Vendored ggml see ggml/UPSTREAM for pinned SHA src/third party/miniz/ Vendored miniz deflate codec see its UPSTREAM file samples/ Test audio files tests/ Unit and smoke tests transcribe.cpp is MIT-licensed. See LICENSE /handy-computer/transcribe.cpp/blob/main/LICENSE for details. Vendored third-party components ggml, miniz — both MIT are attributed in THIRD-PARTY-LICENSES.md /handy-computer/transcribe.cpp/blob/main/THIRD-PARTY-LICENSES.md .