# Vāgdhenu: Open-source meter-aware text-to-speech for Sanskrit

> Source: <https://prathosh.in/vagdhenu/>
> Published: 2026-06-30 05:33:55+00:00

Paste a Sanskrit verse in any Indian script — the meter is detected automatically.

First chant takes ~10–60s while the model warms up. If the demo doesn't load,
use the [backup demo ↗](https://prathoshap-vagdhenu-demo.hf.space?__theme=light).

Six vṛttas rendered by this system — including verses from the shipped deployments.

Vāgdhenu maps a metrical verse to its chanted *pārāyaṇa* recitation. Its voice is a
flow-matching TTS backbone **retrained on a purpose-recorded, carefully designed
single-speaker Sanskrit chant corpus** (~5 hours), with a further voice-steering retrain;
the neural vocoder is likewise fine-tuned for the chant register. Around the trained model sits
the machinery a faithful Sanskrit chant pipeline needs: a script-aware frontend that routes
Sanskrit through Kannada orthography (avoiding the Hindi schwa-deletion that Devanagari triggers);
visarga sandhi with the jihvāmūlīya and upadhmānīya allophones; the aspiration contrast; the three
sibilants and the full retroflex series kept distinct; homorganic anusvāra and vocalic ṝ; and a
vṛtta-aware mechanism that detects the meter and selects a matched reference under the
*half-reference rule*. The retrained model reaches an expert MOS of about **4.6**,
and dense conjuncts — including retroflex aspirates — render correctly, the class earlier
architectures could not crack.

This system produced two corpora at scale.

● **Mahābhārata Tātparya Nirṇaya** — 32 chapters, 5,183 verses (~17.5h) ·
[video series ↗](https://www.youtube.com/playlist?list=PLL1s8qiaGy0IP0G_PhlwaGA5EOfzoKrV_)

● **Śrīmad Bhāgavatam** — ~18,000 verses across 12 books ·
[karaoke-video series ↗](https://www.youtube.com/playlist?list=PLDiYyVdyo2Sc)
