I built an AI faceless video generator in 2 months — here's the stack

The article details the technical stack and development process behind Keyvello, an AI faceless video generator that creates short-form videos from a prompt in 2–5 minutes. The stack includes Next.js, Supabase, GPT-5.5, Fal.ai, ElevenLabs, and FFmpeg, with Modal used for video pipelines to avoid Vercel cold start issues. Key lessons learned include the necessity of implementing row-level security from the start and the discovery that users overwhelmingly prefer using pre-built templates over a blank canvas.

Six months ago I started Keyvello keyvello.com — an AI video generator that turns a prompt into a complete short-form video in 2–5 minutes. Here's the technical breakdown for fellow builders. The problem Faceless creators on TikTok / YouTube Shorts / Reels spend 2–4 hours per video on scripting, voiceovers, B-roll, captions, and editing. Most burn out before they post 10 videos. The stack - Frontend: Next.js 16, React 19, TypeScript, Tailwind CSS 4, Radix UI - Backend: Next.js API Routes App Router - DB: Supabase Postgres + Auth + RLS - AI: GPT-5.5 for scripts, Fal.ai for images, ElevenLabs for voices - Video: FFmpeg via fluent-ffmpeg , Sharp for image processing - Storage: Cloudflare R2 S3-compatible - Payments: Dodo Payments - Compute: Vercel for the app, Modal for the video pipelines - State: Zustand The pipeline prompt → GPT-4o script → scene splitter → parallel Flux images + ElevenLabs audio → FFmpeg composition Modal → R2 upload → status update What surprised me - Modal beats running FFmpeg in Vercel. Cold starts on Vercel functions made 60s+ videos impossible. Modal webhooks solved it. - RLS is non-negotiable from day one. Retro-fitting row-level security at 1K users is painful. - Credit refunds need their own RPC. I hit a silent failure with increment user credits getting blocked by a trigger. Use add credits instead. - Users want templates, not raw control. I shipped a "blank canvas" mode early. Nobody used it. The 11 named templates AI Stories, Fake Texts, Stick Animation, etc. do 95% of generations. What's next Better lipsync for the talking-avatar templates. Tighter cost controls per template tier. Affiliate program. If you're building something in AI video, would love to compare notes — drop a comment.