{"slug": "demo-video", "title": "demo-video", "summary": "This article provides instructions for creating a narrated demo video by using the `agent-browser` tool to capture screenshots of a web application, then generating synchronized voiceover audio via the Inworld TTS API. It details a sequential workflow for producing an MP4 video with still-image scenes, including critical rate-limit handling for the TTS API and validation steps. Finally, the guide explains how to upload the completed video to an R2 storage bucket using `wrangler` for public sharing.", "body_md": "---\nname: demo-video\ndescription: Generate a narrated demo video from browser screenshots and TTS audio. Captures scenes via agent-browser, generates voice narration via Inworld TTS API, and stitches into an MP4 with ffmpeg.\n---\n\n# Demo Video Generation\n\nGenerate a narrated demo video from browser screenshots and TTS audio. The output is an MP4 with still-image scenes synced to voice narration.\n\n## Prerequisites\n\n- `agent-browser` installed\n- `ffmpeg` installed\n- Inworld API key in `api/.dev.vars` (`INWORLD_API_KEY`)\n\n## Process\n\n### Step 1: Take Screenshots\n\nUse `agent-browser` to navigate the app and capture screenshots. Save all files to `tmp/video/` in the project root.\n\nExample:\n```\nscene1_overview.png — Navigate to main page, full page screenshot\nscene2_feature.png — Interact with a feature, capture result\nscene3_detail.png — Click into detail view, capture\n```\n\n### Step 2: Build the Video (use a subagent)\n\n**IMPORTANT:** Once all screenshots are captured and narration text is written, delegate the TTS generation, ffmpeg stitching, and R2 upload to a **background subagent** using the Task tool. This keeps the main conversation responsive and avoids filling the context window with ffmpeg output.\n\nThe subagent prompt should include:\n- The list of screenshot filenames in `tmp/video/`\n- The narration text for each scene\n- Instructions to generate TTS audio, create video segments, concatenate, upload to R2, and `open` the final video\n\nBelow are the details the subagent needs:\n\n#### TTS Audio Generation\n\nCall the Inworld TTS sync API to generate MP3 audio files for each scene.\n\n**API endpoint:** `POST https://api.inworld.ai/tts/v1/voice` (non-streaming, returns complete audio in one response)\n\n**Request:**\n```bash\nINWORLD_API_KEY=$(grep '^INWORLD_API_KEY=' api/.dev.vars | cut -d'=' -f2-)\ncurl -s -X POST \"https://api.inworld.ai/tts/v1/voice\" \\\n  -H \"Authorization: Basic ${INWORLD_API_KEY}\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"Your narration text here.\",\n    \"voiceId\": \"layercode_production__el_5\",\n    \"modelId\": \"inworld-tts-1.5-max\",\n    \"audioConfig\": {\n      \"audioEncoding\": \"MP3\",\n      \"sampleRateHertz\": 22050\n    }\n  }' | jq -r '.audioContent' | base64 --decode > tmp/video/scene1_audio.mp3\n```\n\n**CRITICAL — Inworld rate limit:** The Inworld API rejects concurrent requests with a misleading `SESSION_TOKEN_INVALID` error (gRPC code 16). **Generate TTS files sequentially, not in parallel.** Chain them with `&&` in a single bash command.\n\n**Text limit:** 2,000 characters per request. Split longer narration across multiple calls.\n\n**Validation:** Always check file sizes after generation. Valid MP3 files are 50KB+. Files of 3 bytes or less indicate the API returned an error — check the raw response with `curl ... | head -c 500`.\n\n#### ffmpeg Video Segments\n\nFor each scene, create a video that displays the screenshot for the duration of its audio:\n\n```bash\nffmpeg -loop 1 -i scene1_overview.png -i scene1_audio.mp3 \\\n  -c:v libx264 -tune stillimage -c:a aac -ar 22050 -b:a 128k \\\n  -pix_fmt yuv420p -shortest -y scene1.mp4\n```\n\n#### Concatenate into Final Video\n\n```bash\nprintf \"file 'scene1.mp4'\\nfile 'scene2.mp4'\\nfile 'scene3.mp4'\\n\" > concat.txt\nffmpeg -f concat -safe 0 -i concat.txt -c copy -y demo.mp4\n```\n\nFinal output: `tmp/video/demo.mp4`\n\nRun `open tmp/video/demo.mp4` to show the final video to the user.\n\n#### Upload Video to R2\n\nUpload the final video to the `toyo-dev-demo-videos` R2 bucket using wrangler. The bucket has public access enabled so videos can be shared via URL.\n\n```bash\n# Use the current git branch name as the filename\nFILENAME=\"$(git branch --show-current).mp4\"\n\n# Upload to R2 (must run from api/ directory)\ncd api && npx wrangler r2 object put \"toyo-dev-demo-videos/${FILENAME}\" \\\n  --file ../tmp/video/demo.mp4 \\\n  --content-type \"video/mp4\" \\\n  --remote\n```\n\n**Other R2 commands:**\n```bash\ncd api && npx wrangler r2 object list toyo-dev-demo-videos --remote    # List videos\ncd api && npx wrangler r2 object delete \"toyo-dev-demo-videos/${FILENAME}\" --remote  # Delete\n```\n\n**Public URL format:** `https://pub-<hash>.r2.dev/<filename>` (if public access is configured on the bucket).\n", "url": "https://wpnews.pro/news/demo-video", "canonical_source": "https://gist.github.com/dctanner/ee7fe7997bba0efbca49b8c0bdc1936a", "published_at": "2026-03-18 16:45:59+00:00", "updated_at": "2026-05-22 20:35:05.660742+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "open-source"], "entities": ["Inworld", "ffmpeg", "R2"], "alternates": {"html": "https://wpnews.pro/news/demo-video", "markdown": "https://wpnews.pro/news/demo-video.md", "text": "https://wpnews.pro/news/demo-video.txt", "jsonld": "https://wpnews.pro/news/demo-video.jsonld"}}