cd /news/machine-learning/holo-tolk-tokenizer-free-speech-stt-… · home topics machine-learning article
[ARTICLE · art-42728] src=discuss.huggingface.co ↗ pub= topic=machine-learning verified=true sentiment=· neutral

HoLo-ToLk: tokenizer-free speech (STT + TTS) on the 0-parameter HSL byte substrate

Researcher HoLo-ToLk built speech-to-text and text-to-speech models using a tokenizer-free byte substrate, achieving a character error rate of 0.194 on STT, beating a mel-spectrogram baseline, while TTS remains a feasibility demo with unstable free-run synthesis.

read1 min views1 publishedJun 28, 2026

Follow-up to my earlier post on the 0-parameter input layer.

I took the HSL byte substrate (no tokenizer, no learned input embedding) and built

two small speech models on top, to see whether “bytes as signal” carries through to

audio. I’m calling the line HoLo-ToLk.

STT (speech → text) — the result I’m most confident about.

Feeding the raw HSL substrate to a char-CTC baseline is weak on its own (CER ~0.67).

Adding a small model-side spectral lens (log-mel + a learnable gated fusion over the

frozen substrate) flips it: CER 0.194, beating a mel-spectrogram baseline (0.213) in the same setup, confirmed across 4 seeds. So the honest takeaway is a controlled

comparison — substrate + lens > mel, same setup — not a SOTA number (8 kHz, char-CTC,

no LM; readable but rough). TTS (text → speech) — here the byte substrate is even more natural: UTF-8 text bytes

go straight in as HSL features, no tokenizer/vocab. A small AR transformer + guided

attention + HiFi-GAN gives a single-speaker voice. Held-out teacher-forced mel-L1 is 0.296 (multi-seed) and some samples sound genuinely natural — but free-run synthesis

on arbitrary sentences is still rough/unstable. So I’m framing TTS as a feasibility

demo, not a usable TTS.

Both are research/devlog results, not production or SOTA. The two models are separate

today; the goal is to unify them into one over time.

Try it (combined demo, both tabs):

Substrate: pip install hsl-embedding-zero

Happy to answer questions on the lens design or the byte→signal encoding — and very

open to critique, especially on the TTS free-run instability.

── more in #machine-learning 4 stories · sorted by recency
── more on @holo-tolk 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/holo-tolk-tokenizer-…] indexed:0 read:1min 2026-06-28 ·