Self-hosted video creation is coming

A developer is migrating video creation to an entirely self-hosted, open-source model, aiming to eliminate external API dependencies. The project currently uses only two APIs—11labs for voice and Vertex AI for embeddings—while running the LLM locally with Qwen 3.6 27B for scriptwriting and RAG tasks. The developer plans to switch to Qwen VL embeddings next and is seeking a capable open-source text-to-speech model to complete the fully self-hosted pipeline.

Day 10 of migrating video cration to an entirely self-hosted model. The goal is to open-source video compilation with AI. The video below was made using only 2 API dependencies. 11labs for voice and Vertex AI for multimodal embeddings. The rest is locally run. LLM is Qwen 3.6 27B. Impeccable for agentic tasks like scriptwriting and RAG. Hoping to switch to Qwen VL embeddings next, so embedding costs go to the local power plant instead. At a significant discount. Still no capable open source model for text-to-speech, although if you know one, drop it in a comment. It's the last missing piece.