OpenMontage the first open-source, agentic video production system OpenMontage, the first open-source agentic video production system, has been released, enabling AI coding assistants to handle research, scripting, asset generation, editing, and final composition from plain-language descriptions. The system can produce real video videos using free stock footage and open archives, with example productions costing as little as $0.15 to $1.33. The first open-source, agentic video production system. Paste A Video start-from-a-video-you-already-love · Quick Start quick-start · Try These Prompts try-these-prompts · Pipelines pipelines · How It Works how-it-works · Providers /calesthio/OpenMontage/blob/main/docs/PROVIDERS.md · Agent Guide /calesthio/OpenMontage/blob/main/AGENT GUIDE.md Follow The Build Turn your AI coding assistant into a full video production studio. Describe what you want in plain language — your agent handles research, scripting, asset generation, editing, and final composition. Important distinction: OpenMontage can make image-based videos, but it can also make a real video video for free/open-source workflows: the agent builds a corpus from free stock footage and open archives, retrieves actual motion clips, edits them into a timeline, and renders a finished piece. That is not the usual "animate a handful of stills and call it video" trick. signal-from-tomorrow final with music upload v2.mp4 "SIGNAL FROM TOMORROW"— a cinematic sci-fi trailer fully produced through OpenMontage: concept, script, scene plan, Veo-generated motion clips, soundtrack, and Remotion composition. the last banana v3 github.mp4 "THE LAST BANANA"— a 60-second Pixar-style animated short about a lonely banana who finds friendship with a kiwi. 6 Kling v3-generated motion clips via fal.ai , Google Chirp3-HD narration, royalty-free piano music, TikTok-style word-level captions, and Remotion composition. Total cost:$1.33. void-linkedin.mp4 "VOID — Neural Interface"— a product ad produced with just one API key OpenAI . 4 AI-generated images gpt-image-1 , TTS narration, auto-sourced royalty-free music, word-level subtitles via WhisperX, and Remotion data visualizations. Total cost:$0.69. Zero manual asset work. candyland.mp4 "Afternoon in Candyland"— a Ghibli-style anime animation. A little girl's whimsical afternoon adventure through candy gates, gumdrop rivers, and lollipop gardens. 12 FLUX-generated images with multi-image crossfade, cinematic camera motion zoom, pan, Ken Burns , sparkle/petal/firefly particle overlays, and ambient music with auto-detected energy offset. Total cost:$0.15. No video generation, no manual editing. mori-no-seishin.mp4 "Mori no Seishin"— a Ghibli-style anime animation of a forest spirit's journey through ancient woods. 12 FLUX-generated images with parallax crossfade, drift and pan camera motion, firefly and petal particles, cinematic vignette lighting, and ambient forest soundtrack. Total cost:$0.15. Still images brought to life through Remotion's animation engine. deep-ocean.mp4 "Into the Abyss"— a deep ocean exploration rendered in anime style. Bioluminescent gardens, coral cathedrals, and creatures of light — 12 FLUX-generated images with sparkle and mist particle overlays, light-ray effects, smooth camera motion, and ambient oceanic soundtrack. Total cost:$0.15. Zero video generation APIs needed. Subscribe to @OpenMontage on YouTube https://www.youtube.com/@OpenMontage?sub confirmation=1 to see new videos as they ship — every video includes the full prompt, pipeline, tools used, and cost so you can reproduce it yourself. Starting from a reference video is often faster than starting from a blank prompt. OpenMontage can start from a YouTube video, Short, Reel, TikTok, or local clip and turn it into a grounded production plan: Paste a reference video The agent analyzes transcript, pacing, scenes, keyframes, and style You get 2-3 differentiated concepts, an honest tool path, cost estimates, and a sample before full production "Here's a YouTube Short I love. Make me something like this, but about quantum computing." What you get back is not "best guess prompt spaghetti." You get: What it keeps from the reference: pacing, hook style, structure, tone What it changes : topic, visual treatment, angle, narration approach What it will cost at your target duration, before asset generation starts What it will actually look like with your currently available tools Works with Claude Code, Cursor, Copilot, Windsurf, Codex — any AI coding assistant that can read files and run code. Python 3.10+ — python.org https://www.python.org/downloads/ FFmpeg — brew install ffmpeg / sudo apt install ffmpeg / ffmpeg.org https://ffmpeg.org/download.html Node.js 18+ — nodejs.org https://nodejs.org/ An AI coding assistant — Claude Code, Cursor, Copilot, Windsurf, or Codex git clone https://github.com/calesthio/OpenMontage.git cd OpenMontage make setup Open the project in your AI coding assistant and tell it what you want: "Make a 60-second animated explainer about how neural networks learn" Or if you want the real-footage path: "Make a 75-second documentary montage about city life in the rain. Use real footage only, no narration, elegiac tone, with music." That's it. The agent researches your topic with live web search, generates AI images, writes and narrates the script with voice direction, finds royalty-free background music automatically, burns in word-level subtitles, and renders the final video. Before you see anything, the system runs a multi-point self-review — ffprobe validation, frame sampling, audio level analysis, delivery promise verification, and subtitle checks. Every provider selection is scored across 7 dimensions with an auditable decision log. Every creative decision gets your approval. NoRun manually: make ? pip install -r requirements.txt && cd remotion-composer && npm install && cd .. && pip install piper-tts && cp .env.example .env Windows:If npm install fails with ERR INVALID ARG TYPE , use npx --yes npm install instead. This repo is built for agentic operation. If you're an OpenClaw-style agent, here is the shortest path to becoming useful fast: Read the contract first Start with, then AGENT GUIDE.md . PROJECT CONTEXT.md Do not improvise the production workflow OpenMontage is pipeline-driven. Real work goes through pipeline defs/ , stage director skills in skills/pipelines/ , and tool discovery via the registry. Check the actual capability envelope Run: python python -c "from tools.tool registry import registry; import json; registry.discover ; print json.dumps registry.support envelope , indent=2 " python -c "from tools.tool registry import registry; import json; registry.discover ; print json.dumps registry.provider menu , indent=2 " Treat every video request as a pipeline selection problem Pick the right pipeline first, then read the manifest, then read the stage skill, then use tools. .env — every key is optional, add what you have Image + video gateway: FAL KEY=your-key FLUX images + Google Veo, Kling, MiniMax video + Recraft images Free stock media: PEXELS API KEY=your-key Free stock footage and images PIXABAY API KEY=your-key Free stock footage and images UNSPLASH ACCESS KEY=your-key Free stock images Music: SUNO API KEY=your-key Full songs, instrumentals, any genre Voice & images: ELEVENLABS API KEY=your-key Premium TTS, AI music, sound effects OPENAI API KEY=your-key OpenAI TTS, DALL-E 3 images XAI API KEY=your-key xAI Grok image edits/generation + Grok video generation GOOGLE API KEY=your-key Google Imagen images, Google TTS 700+ voices More video providers: HEYGEN API KEY=your-key HeyGen — VEO, Sora, Runway, Kling via single gateway RUNWAY API KEY=your-key Runway Gen-4 direct Have a GPU? Unlock free local video generation make install-gpu Then add to .env: VIDEO GEN LOCAL ENABLED=true VIDEO GEN LOCAL MODEL=wan2.1-1.3b or wan2.1-14b, hunyuan-1.5, ltx2-local, cogvideo-5b You don't need paid API keys to make real videos. Out of the box, make setup gives you: | Capability | Free Tool | What It Does | |---|---|---| Narration | Piper TTS | Free offline text-to-speech — real human-sounding narration | Open footage | Archive.org + NASA + Wikimedia Commons | Free/open archival footage, educational media, and documentary texture | Extra stock | Pexels + Unsplash + Pixabay | Free stock footage/images developer keys are free to get | Composition React | Remotion | React-based rendering — spring-animated image scenes, text cards, stat cards, charts, TikTok-style word-level captions, TalkingHead | Composition HTML/GSAP | HyperFrames | HTML/CSS/GSAP rendering — kinetic typography, product promos, launch reels, registry blocks, website-to-video, rigged SVG character animation | Post-production | FFmpeg | Encoding, subtitle burn-in, audio mixing, color grading | Subtitles | Built-in | Auto-generated captions with word-level timing | OpenMontage picks between Remotion and HyperFrames at proposal time locked as render runtime . Remotion is the default for data-driven explainers and anything using the existing React scene stack; HyperFrames is the default for motion-graphics-heavy briefs that express naturally as HTML + GSAP, including the character-animation pipeline's SVG/GSAP rig output. See skills/core/hyperframes.md for the full decision matrix. Two free-ish paths: Image-based video: Piper narrates your script, images provide the visuals, and Remotion animates them into a polished edit. Local character animation: SVG rigs, pose libraries, GSAP timelines, and HyperFrames render cartoon character acting to projects/