Four free neural TTS options for CI pipelines — edge-tts, Kokoro, MeloTTS, Bark

wpnews.pro

cd /news/artificial-intelligence/four-free-neural-tts-options-for-ci-… · home › topics › artificial-intelligence › article

[ARTICLE · art-41407] src=dev.to ↗ pub=2026-06-26T22:12Z topic=artificial-intelligence verified=true sentiment=· neutral

Four free neural TTS options for CI pipelines — edge-tts, Kokoro, MeloTTS, Bark

A developer evaluated four free neural TTS options—edge-tts, Kokoro, MeloTTS, and Bark—for use in CI pipelines without a GPU. edge-tts offers broadcast-quality voices via an unofficial Microsoft endpoint, while Kokoro and MeloTTS run locally with slower CPU inference. Bark provides the most expressiveness but requires large model downloads.

read4 min views1 publishedJun 26, 2026

Building a two-host video pipeline put me through most of the free neural TTS options that can run in GitHub Actions without a GPU. The criteria I care about: zero API cost, acceptable voice quality, runs headless in CI, and doesn't require CUDA at inference time.

Here's a comparison of the four I tested or seriously evaluated.

GitHub: rany2/edge-tts | License: MIT (wrapper) | Voices: 400+ across 100+ languages

edge-tts is a Python wrapper around Microsoft Edge's read-aloud TTS endpoint — the same one that fires when you right-click text in Edge and select "Read aloud." It streams MP3 output. Quality on the en-US-GuyNeural

and en-US-AvaNeural

voices is genuinely broadcast-quality; it's noticeably better than older open-source models and competitive with commercial APIs.

Speed is fast because it's streaming from a remote endpoint: a 10-minute audio file generates in 30-60 seconds regardless of CI runner hardware.

The catch: it calls an unofficial Microsoft endpoint. Microsoft hasn't published a public contract for it and could restrict access without warning. I've been running it daily for about a month without issues, but this is a real operational risk.

pip install edge-tts
edge-tts --voice en-US-GuyNeural --text "Hello world" --write-media out.mp3

Best for: CI pipelines where voice quality matters and you can accept an external unofficial API dependency.

HuggingFace: hexgrad/Kokoro-82M | License: Apache 2.0 | Params: 82M

Kokoro is a small TTS model that runs entirely locally. Voice quality is good for the model size — noticeably better than older models like Tacotron2 and FastSpeech2, though below edge-tts on naturalness for longer passages.

The main tradeoff for CI: inference runs on CPU at well below real-time on a standard GitHub Actions runner. A 10-minute audio job could take significantly longer than 10 minutes to render, depending on segment count and text density. For short-form content (under 3 minutes) this is usually fine; for longer videos it's the bottleneck.

First run downloads ~320MB of model weights. If you cache these in GitHub Actions, subsequent runs skip the download.

from kokoro import KPipeline
pipeline = KPipeline(lang_code="a")  # "a" = American English
audio, sr = next(pipeline("Hello world", voice="af_heart"))

Best for: fully local inference without external API calls, projects where you need auditable offline-capable TTS.

GitHub: myshell-ai/MeloTTS | License: MIT | Languages: English, Chinese, Japanese, Korean, French, Spanish

MeloTTS from MyShell.ai is a multilingual model with better-than-average English naturalness in my testing. The Python package is melo-tts

(pip), and the API lets you set speaker ID and speed per utterance without re the model between clips — useful when you're rendering hundreds of short dialogue segments in a batch.

CPU inference speed is in the same range as Kokoro. Model download is around 500MB. The MIT license is a practical advantage if you're building a product on top of it — no Apache license compatibility questions.

from melo.api import TTS
tts = TTS(language="EN", device="cpu")
tts.tts_to_file("Hello world", tts.hps.data.spk2id["EN-Default"], "out.wav")

Best for: multilingual content pipelines, or when you want MIT-licensed local TTS with solid English quality.

GitHub: suno-ai/bark | License: MIT | Size: ~1.7GB (small), ~8GB (large)

Bark is the most capable of the four for voice expressiveness. You can specify laughter ([laughs]

), sighs, hesitations, and non-speech sounds inline in the prompt text. Quality on the large model is competitive with commercial TTS APIs.

The problem for standard CI: the large model needs a GPU with substantial VRAM and takes minutes to render 30 seconds of audio on CPU. The small model fits in RAM but quality drops noticeably. GitHub Actions standard runners have no GPU, making the large model impractical and the small model a significant quality downgrade.

Best for: local GPU inference where expressive voice effects justify the hardware requirement. Not practical for standard CPU-only CI runners.

Tool	Voice quality	CPU speed	External API	CI practical
edge-tts	excellent	fast (streaming)	yes (unofficial)	yes
Kokoro-82M	good	slow	no	yes (short video)
MeloTTS	good	slow	no	yes (short video)
Bark (large)	excellent	very slow	no	no

For automated video pipelines on standard GitHub Actions runners, edge-tts is the practical choice if you accept the unofficial API dependency. If you need fully local inference and your videos stay under 3-4 minutes, Kokoro or MeloTTS both work within a reasonable job time budget. Bark belongs on a GPU machine, not a free CI runner.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

source & further reading

dev.to — original article Instrument AI Agent Decision Tracing with OpenTelemetry How I Stopped Worrying About AI API Bills: A Data-Driven Breakdown of... Why AI Clusters Fail Even When GPUs Are Idle

~/api · this article 200

$curl api.wpnews.pro/v1/news/four-free-neural-tts-opt…

Read original on dev.to → dev.to/morinaga/four-free-neural-tts-options-for…

mentioned entities

Microsoft

HuggingFace

MyShell.ai

Suno

edge-tts

Kokoro

MeloTTS

Bark

metadata

slugfour-free-neural-tts-options-for-ci-pipelines-edge-tts-kokoro-melotts-bark

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevSkill for generating cheatsheet …

next →How I built the OSS alternatives…

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 26 Jun · #artificial-intelligence

How I built the OSS alternatives directory: GitHub ETL, Turso, and the UPSERT trap I hit

dev.to · 26 Jun · #artificial-intelligence

Build an AI Audio Translator in Python on Telnyx Inference

dev.to · 26 Jun · #artificial-intelligence

How a .NET dev built an AI assistant

dev.to · 26 Jun · #artificial-intelligence

How I Stopped Worrying About AI API Bills: A Data-Driven Breakdown of...

── more on @microsoft 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required