Drop in a long video — podcast, interview, talk, stream — and VibeClip cuts it into vertical, captioned, ready-to-post shorts. Then you refine every clip by chatting: “make clip 2 punchier,” “bigger captions,” “add a zoom at 0:05,” “undo.”
Quick start · Features · How it works · Bring your own key · Configuration · Contributing
Left: the raw clip. Right: after one sentence — “make it mrbeast style and add gameplay underneath” — captioned, reframed to 9:16, and split-screened. Real pipeline output, not a mockup.
Footage: Andy Dickinson (CC-BY) · gameplay: Orbital - No Copyright Gameplay (CC-BY) · Minecraft © Mojang.
Spin up a private instance in three commands. All you add is one LLM key.
git clone https://github.com/oktaydbk54/vibeclip.git
cd vibeclip
cp .env.example .env # add ONE line: OPENAI_API_KEY=sk-...
docker compose up -d --build
With the defaults (EMAIL_MODE=console
, REQUIRE_EMAIL_VERIFICATION=false
) sign-up logs
you straight in — no email provider needed. Bring an OpenAI or DeepSeek key
(DeepSeek is the cheap one), or point LLM_BASE_URL
at any OpenAI-compatible server (Ollama, LM Studio, OpenRouter…). Prefer no Docker? See local install.
🎬 Long → shorts, automatically |
Transcribes on-device, scores the strongest moments (hook / flow / value — not a dumb keyword scan), reframes to 9:16 around the speaker, and burns word-synced captions. |
💬 Edit by chatting |
A tool-calling agent turns plain language into real edits — trims, filler-word removal (“uhh”/“ee”), zooms, styles, music, b-roll, brand overlays. One undo reverts a whole multi-step plan. |
🎨 Styles in one shot |
hormozi , mrbeast , podcast_minimal , kinetic — captions, pace, zoom, music and SFX applied together. Drop in your own preset as a JSON file. |
🖥️ A real studio UI |
Web app with a live 9:16 preview, clip cards, a CapCut-style timeline, and the chat copilot right beside it. |
🔑 Your key, your data |
Bring your own LLM key (OpenAI · Gemini · Claude · DeepSeek · any compatible endpoint). Nothing is proxied through us — there is no “us.” |
🏠 Self-host first |
One Docker command. Speech-to-text and every render run locally via faster-whisper + ffmpeg. AGPL-3.0, no SaaS lock-in. |
upload
│
┌──────▼───────┐ faster-whisper (local, no API key)
│ transcribe │
└──────┬───────┘
┌──────▼────────────┐ LLM "brain" (your key) — structure + scored moments
│ analyze structure │
│ find highlights │
└──────┬────────────┘
┌──────▼───────┐ per clip, replayed from cached intermediates (~2–4s/edit)
│ auto edit │ jumpcut → 9:16 reframe → captions → music+ambience (ducked)
│ │ → SFX → fades · then your chat commands layer on top
└──────┬───────┘
export → vertical MP4, publish-ready
Only two things ever hit the network: your chosen LLM (to understand intent and score moments) and, optionally, Pexels (stock b-roll). Speech-to-text and all rendering stay on your machine.
VibeClip never ships with a key and never proxies your prompts anywhere except the provider you choose. Two ways to supply one:
Per instance— setOPENAI_API_KEY
(orDEEPSEEK_API_KEY
, or any OpenAI-compatible endpoint viaLLM_BASE_URL
) in.env
.Per user— each account pastes its own key on the in-app** Settingspage, with a livetest-connection. Keys areencrypted at rest** and never sent back to the browser.
| Provider | Routed via | Notes |
|---|---|---|
| OpenAI | ||
| native | Default, best-supported. | |
| DeepSeek | ||
| native | The budget pick — a typical short costs a few cents. | |
| Google Gemini | ||
| OpenAI-compat endpoint | gemini-2.5-flash / pro . |
|
| Anthropic Claude | ||
| OpenAI-compat endpoint | claude-haiku / sonnet . |
|
| Anything else | ||
LLM_BASE_URL |
||
| Ollama, LM Studio, OpenRouter, your own proxy… |
Speech-to-text runs locally and needs no key.
Everything is driven by .env
(see .env.example
for the full, commented list). The ones that matter most:
| Variable | Default | Purpose |
|---|---|---|
OPENAI_API_KEY |
||
| — | Your LLM key (preferred). | |
DEEPSEEK_API_KEY |
||
| — | Cheaper fallback, used if no OpenAI key. | |
LLM_BASE_URL |
||
| — | Any OpenAI-compatible endpoint (local models, proxies). | |
EMAIL_MODE |
||
console |
||
console prints OTP to the log; resend sends real email. |
||
REQUIRE_EMAIL_VERIFICATION |
||
false |
||
true enforces email confirmation (public instances). |
||
HOSTED_STUDIO |
||
true |
||
true = the landing offers login/signup (use your own instance). false = a public marketing site that points everyone to GitHub to self-host (no login). |
||
GA_MEASUREMENT_ID |
||
| — | Empty = no analytics injected (self-host default). | |
SITE_URL |
||
http://localhost:8765 |
||
| Public base URL for blog canonical/OG/sitemap. | ||
VIDEO_ENCODER |
||
libx264 |
||
Use h264_videotoolbox on Apple Silicon. |
||
VIBECLIP_BIND |
||
127.0.0.1 |
||
docker-compose publish address (0.0.0.0 to expose). |
||
MAX_UPLOAD_SECONDS |
||
0 |
||
Longest uploadable video, seconds. 0 = no limit (self-host). |
||
MAX_PROJECTS_PER_USER |
||
0 |
||
Projects per account. 0 = unlimited; cap it on a public instance. |
Requirements: Python 3.12+, ffmpeg, and the DejaVu fonts (for caption rendering).
cp .env.example .env # add your LLM key
uv sync # or: pip install -e .
python -m chat.app # → http://127.0.0.1:8765
First run downloads the Whisper model. Prefer the terminal? python -m chat.cli <video.mp4>
.
The repo bundles a small library of royalty-free media (music, ambience, SFX, demo
footage) for the built-in styles. Some tracks are CC-BY (Kevin MacLeod) and require
crediting in your video description — see the CREDITS
files under assets/
. VibeClip never bundles or uses copyrighted/branded game footage.
Issues and PRs welcome — start with CONTRIBUTING.md. Security reports: see
. Be excellent to each other (
SECURITY.md
GNU AGPL-3.0 — see LICENSE. You can self-host and modify VibeClip freely; if you run a modified version as a network service, you must offer that modified source to its users. Copyright © 2026 the VibeClip authors.
Built for people who'd rather
talk to their editor than fight it.