Paste a Sanskrit verse in any Indian script — the meter is detected automatically.
First chant takes ~10–60s while the model warms up. If the demo doesn't load,
use the backup demo ↗. Six vṛttas rendered by this system — including verses from the shipped deployments.
Vāgdhenu maps a metrical verse to its chanted pārāyaṇa recitation. Its voice is a flow-matching TTS backbone retrained on a purpose-recorded, carefully designed single-speaker Sanskrit chant corpus (~5 hours), with a further voice-steering retrain; the neural vocoder is likewise fine-tuned for the chant register. Around the trained model sits the machinery a faithful Sanskrit chant pipeline needs: a script-aware frontend that routes Sanskrit through Kannada orthography (avoiding the Hindi schwa-deletion that Devanagari triggers); visarga sandhi with the jihvāmūlīya and upadhmānīya allophones; the aspiration contrast; the three sibilants and the full retroflex series kept distinct; homorganic anusvāra and vocalic ṝ; and a vṛtta-aware mechanism that detects the meter and selects a matched reference under the half-reference rule. The retrained model reaches an expert MOS of about 4.6, and dense conjuncts — including retroflex aspirates — render correctly, the class earlier architectures could not crack.
This system produced two corpora at scale.
● Mahābhārata Tātparya Nirṇaya — 32 chapters, 5,183 verses (~17.5h) ·
video series ↗ ● Śrīmad Bhāgavatam — ~18,000 verses across 12 books · karaoke-video series ↗