{"slug": "four-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark", "title": "Four free neural TTS options for CI pipelines — edge-tts, Kokoro, MeloTTS, Bark", "summary": "A developer evaluated four free neural TTS options—edge-tts, Kokoro, MeloTTS, and Bark—for use in CI pipelines without a GPU. edge-tts offers broadcast-quality voices via an unofficial Microsoft endpoint, while Kokoro and MeloTTS run locally with slower CPU inference. Bark provides the most expressiveness but requires large model downloads.", "body_md": "Building a [two-host video pipeline](https://dev.to/articles/two-host-video-pipeline-edge-tts-pillow-ffmpeg) put me through most of the free neural TTS options that can run in GitHub Actions without a GPU. The criteria I care about: zero API cost, acceptable voice quality, runs headless in CI, and doesn't require CUDA at inference time.\n\nHere's a comparison of the four I tested or seriously evaluated.\n\n**GitHub**: [rany2/edge-tts](https://github.com/rany2/edge-tts) | **License**: MIT (wrapper) | **Voices**: 400+ across 100+ languages\n\nedge-tts is a Python wrapper around Microsoft Edge's read-aloud TTS endpoint — the same one that fires when you right-click text in Edge and select \"Read aloud.\" It streams MP3 output. Quality on the `en-US-GuyNeural`\n\nand `en-US-AvaNeural`\n\nvoices is genuinely broadcast-quality; it's noticeably better than older open-source models and competitive with commercial APIs.\n\nSpeed is fast because it's streaming from a remote endpoint: a 10-minute audio file generates in 30-60 seconds regardless of CI runner hardware.\n\nThe catch: it calls an unofficial Microsoft endpoint. Microsoft hasn't published a public contract for it and could restrict access without warning. I've been running it daily for about a month without issues, but this is a real operational risk.\n\n```\npip install edge-tts\nedge-tts --voice en-US-GuyNeural --text \"Hello world\" --write-media out.mp3\n```\n\n**Best for**: CI pipelines where voice quality matters and you can accept an external unofficial API dependency.\n\n**HuggingFace**: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) | **License**: Apache 2.0 | **Params**: 82M\n\nKokoro is a small TTS model that runs entirely locally. Voice quality is good for the model size — noticeably better than older models like Tacotron2 and FastSpeech2, though below edge-tts on naturalness for longer passages.\n\nThe main tradeoff for CI: inference runs on CPU at well below real-time on a standard GitHub Actions runner. A 10-minute audio job could take significantly longer than 10 minutes to render, depending on segment count and text density. For short-form content (under 3 minutes) this is usually fine; for longer videos it's the bottleneck.\n\nFirst run downloads ~320MB of model weights. If you cache these in GitHub Actions, subsequent runs skip the download.\n\n``` python\nfrom kokoro import KPipeline\npipeline = KPipeline(lang_code=\"a\")  # \"a\" = American English\naudio, sr = next(pipeline(\"Hello world\", voice=\"af_heart\"))\n```\n\n**Best for**: fully local inference without external API calls, projects where you need auditable offline-capable TTS.\n\n**GitHub**: [myshell-ai/MeloTTS](https://github.com/myshell-ai/MeloTTS) | **License**: MIT | **Languages**: English, Chinese, Japanese, Korean, French, Spanish\n\nMeloTTS from MyShell.ai is a multilingual model with better-than-average English naturalness in my testing. The Python package is `melo-tts`\n\n(pip), and the API lets you set speaker ID and speed per utterance without reloading the model between clips — useful when you're rendering hundreds of short dialogue segments in a batch.\n\nCPU inference speed is in the same range as Kokoro. Model download is around 500MB. The MIT license is a practical advantage if you're building a product on top of it — no Apache license compatibility questions.\n\n``` python\nfrom melo.api import TTS\ntts = TTS(language=\"EN\", device=\"cpu\")\ntts.tts_to_file(\"Hello world\", tts.hps.data.spk2id[\"EN-Default\"], \"out.wav\")\n```\n\n**Best for**: multilingual content pipelines, or when you want MIT-licensed local TTS with solid English quality.\n\n**GitHub**: [suno-ai/bark](https://github.com/suno-ai/bark) | **License**: MIT | **Size**: ~1.7GB (small), ~8GB (large)\n\nBark is the most capable of the four for voice expressiveness. You can specify laughter (`[laughs]`\n\n), sighs, hesitations, and non-speech sounds inline in the prompt text. Quality on the large model is competitive with commercial TTS APIs.\n\nThe problem for standard CI: the large model needs a GPU with substantial VRAM and takes minutes to render 30 seconds of audio on CPU. The small model fits in RAM but quality drops noticeably. GitHub Actions standard runners have no GPU, making the large model impractical and the small model a significant quality downgrade.\n\n**Best for**: local GPU inference where expressive voice effects justify the hardware requirement. Not practical for standard CPU-only CI runners.\n\n| Tool | Voice quality | CPU speed | External API | CI practical |\n|---|---|---|---|---|\n| edge-tts | excellent | fast (streaming) | yes (unofficial) | yes |\n| Kokoro-82M | good | slow | no | yes (short video) |\n| MeloTTS | good | slow | no | yes (short video) |\n| Bark (large) | excellent | very slow | no | no |\n\nFor automated video pipelines on standard GitHub Actions runners, edge-tts is the practical choice if you accept the unofficial API dependency. If you need fully local inference and your videos stay under 3-4 minutes, Kokoro or MeloTTS both work within a reasonable job time budget. Bark belongs on a GPU machine, not a free CI runner.\n\n*Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.*", "url": "https://wpnews.pro/news/four-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark", "canonical_source": "https://dev.to/morinaga/four-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark-3l2f", "published_at": "2026-06-26 22:12:54+00:00", "updated_at": "2026-06-26 23:04:33.782638+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "natural-language-processing", "developer-tools", "ai-tools"], "entities": ["Microsoft", "HuggingFace", "MyShell.ai", "Suno", "edge-tts", "Kokoro", "MeloTTS", "Bark"], "alternates": {"html": "https://wpnews.pro/news/four-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark", "markdown": "https://wpnews.pro/news/four-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark.md", "text": "https://wpnews.pro/news/four-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark.txt", "jsonld": "https://wpnews.pro/news/four-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark.jsonld"}}