# How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts

> Source: <https://dev.to/nils44344/how-i-built-a-free-self-hosted-pipeline-that-auto-generates-faceless-youtube-shorts-3je4>
> Published: 2026-05-23 13:35:32+00:00

Every "AI YouTube" tutorial ends the same way: sign up for ChatGPT Plus, then ElevenLabs, then Pictory, then n8n Cloud. Add it up and you're paying **$75–100/month** before you've made a single video — let alone a single dollar.

I didn't want a subscription stack. I wanted something that ran on my own machine, used free tiers and local models, and that I actually owned. So I built it, and I just open-sourced it under MIT.

It's called **FreeFaceless**, and it takes one command to go from nothing to an uploaded Short:

```
script → voiceover → captions → b-roll → assembled video → YouTube upload
```

Repo: [https://github.com/nils44344/FreeFaceless](https://github.com/nils44344/FreeFaceless)

Here's how each stage works — and the one bug that cost me an evening.

## The orchestration

The whole thing is a linear pipeline. Here's the heart of it (trimmed):

``` python
def run_once(publish_at=None, upload_to_youtube=True):
    data = script.generate()                          # 1. Groq writes the script
    voice_mp3 = voice.synth(data["full_text"], ...)   # 2. edge-tts voiceover
    words = captions.transcribe_words(voice_mp3)      # 3. local Whisper timing
    scenes = visuals.fetch_for_scenes(data["scenes"]) # 4. Pexels b-roll
    ass = captions.write_ass(words, ...)              # 5. caption file
    final = assemble.build(scenes, voice_mp3, ass, …) # 6. ffmpeg
    if upload_to_youtube:
        upload.upload_video(final, data["title"], …)  # 7. YouTube Data API
```

Every stage is its own module, and everything is driven by a single `config.yaml`

— so changing the niche, voice, or caption style is an edit, not a code change.

## 1. Script generation — Groq (free tier)

Groq's free tier serves Llama 3.3 70B fast, and it's OpenAI-compatible, so the official `openai`

SDK works by just pointing the base URL at Groq:

``` python
from openai import OpenAI
client = OpenAI(api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1")

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    response_format={"type": "json_object"},  # forces clean JSON
    messages=[{"role": "system", "content": SYSTEM_PROMPT}, ...],
)
```

The prompt asks for a hook + 4–6 facts + a CTA, returned as JSON with per-scene `visual_query`

strings I can feed straight to stock search. JSON mode means no fragile regex parsing.

## 2. Voiceover — edge-tts (free, no key)

`edge-tts`

exposes Microsoft's neural voices for free, no API key:

``` python
import edge_tts
communicate = edge_tts.Communicate(text, "en-US-ChristopherNeural", rate="-12%")
await communicate.save("voice.mp3")
```

The quality is genuinely good enough for faceless content, and there are dozens of voices/accents to match the niche.

## 3. Word-level captions — faster-whisper (local)

This is the part most paid tools charge per-minute for. `faster-whisper`

runs locally on CPU and gives **word-level timestamps**, which I turn into karaoke-style captions:

``` python
from faster_whisper import WhisperModel
model = WhisperModel("base", device="cpu", compute_type="int8")
segments, _ = model.transcribe("voice.mp3", word_timestamps=True)
```

Then I write an ASS subtitle file, 3 words at a time, in a big bold style — the look every Shorts channel uses. (FreeFaceless ships the open-licensed **Anton** font so it works out of the box.)

## 4. B-roll — Pexels (free API)

Each scene's `visual_query`

becomes a Pexels Videos search, pulling vertical clips. Free API, generous limits.

## 5. Assembly — ffmpeg

ffmpeg crops every clip to 1080×1920, concatenates them to match the voiceover length, overlays the audio, and burns in the captions:

```
"-vf", f"subtitles='{ass_path}':fontsdir='{fonts_dir}'"
```

## 6. Upload — YouTube Data API

OAuth desktop flow, token cached after the first browser login, then every future run refreshes silently. Supports immediate or scheduled publishing.

## The bug that cost me an evening: SSL on Windows

On my machine, every HTTPS call died with `CERTIFICATE_VERIFY_FAILED`

. The culprit: antivirus doing TLS interception with a custom root cert that Python's bundled `certifi`

doesn't know about. The fix is one import, before any network client is built:

``` python
import truststore
truststore.inject_into_ssl()  # use the OS cert store instead of certifi
```

If you build anything network-heavy on Windows, keep this in your back pocket.

## Honest limitations

-
**Free tiers are rate-limited.** This is built for one channel on a normal schedule, not bulk farms. Push it hard and you'll hit limits. -
**Windows-first.** The Python core runs anywhere; the helper scripts are PowerShell. Cross-platform PRs very welcome. -
**It's a production tool, not a money machine.** It automates*making*videos. Views and revenue depend on your content and the algorithm — no tool changes that.

## Try it / contribute

The repo has a full setup guide (including the Google OAuth walkthrough, which is the only fiddly part):

[https://github.com/nils44344/FreeFaceless](https://github.com/nils44344/FreeFaceless)

If it's useful, a star helps other people find it — and I'd genuinely love feedback, especially on making the setup smoother for non-developers and getting it running on macOS/Linux.