| name | heygen-biweekly-video |
|---|---|
| description | Produce a launch-grade biweekly team-update video — avatar-hosted (HeyGen CLI), built in HyperFrames, with real captured product UI, real preview videos, kinetic captions, smooth camera moves, and music ducked under the voice. Use when: biweekly / sprint recap video, avatar-narrated dev update, changelog-as-video. |
A repeatable pipeline for an After-Effects-quality team update: a HeyGen avatar narrates over real captured product UI + motion graphics, every beat synced to the avatar's voice. Output: 1920×1080 / 60fps / ~90s.
heygen
CLI authed (heygen auth status
) — get it athttps://github.com/heygen-com/skillshyperframes
CLI (npx hyperframes
) —https://github.com/heygen-com/hyperframesbun
,ffmpeg
/ffprobe
,agent-browser
,gh
,jq
Everything REAL — no mockups. Capture actual product UI via website-to-html; use real preview videos; use real numbers from GitHub.The avatar's voice is the single narration spine. One continuous take; transcribe for word-level timings; sync every act and caption to those cues.Verify by rendering frames— extract a frame (ffmpeg -ss <t> ... -vframes 1
) andlookbefore claiming any scene works.
| # | Phase | What to do |
|---|---|---|
| 0 | Reference | |
Study heygen-com/hyperframes-launches — the quality bar. Pick the closest launch, reuse its music/SFX. |
||
| 1 | Count | |
gh search prs --author=@me --created=<start>..<end> → cluster into 3 hero themes |
||
| 2 | Numbers | |
| Pull public metrics (GitHub stars, PRs, releases, contributors, catalog size) | ||
| 3 | Avatar | |
Write ~250-word script → heygen video create -d '{"type":"avatar","avatar_id":"<ID>","script":"...","output_format":"webm"}' → transparent webm → transcribe for cues |
||
| 4 | Capture UI | |
| Run your app locally → capture via agent-browser + CDP MHTML → convert to standalone HTML | ||
| 5 | Build acts | |
| cold-open → intro → hero sections → stats → CTA/outro | ||
| 6 | Wire master | |
| Avatar full→PiP→full, music + SFX + voice | ||
| 7 | Render + master | |
hyperframes render --fps 60 --quality high → duck music under voice → deliver |
heygen auth status
heygen avatar looks get <LOOK_ID> # confirm look + engines; group_id is NOT a 2nd avatar
jq -Rs '{type:"avatar",avatar_id:"<ID>",script:.,output_format:"webm",aspect_ratio:"9:16",resolution:"1080p"}' \
script.txt > req.json
heygen video create -d req.json # → video_id; webm = transparent (alpha_mode=1)
VID=<video_id>
for i in $(seq 1 90); do
s=$(heygen video get $VID | jq -r .data.status)
[ "$s" = completed ] && break; sleep 20
done
heygen video download $VID --output-path assets/host.webm --force
ffmpeg -y -i assets/host.webm -vn -c:a aac -b:a 192k -ar 48000 assets/host-voice.m4a
npx hyperframes transcribe assets/host.webm --model base.en # → word-level transcript.json
Fix Whisper mishears (product names like "HeyGen", "HyperFrames") before generating captions.
agent-browser set viewport 1920 1080
agent-browser open "http://localhost:5190/"
agent-browser wait --load networkidle
agent-browser click @eN
agent-browser screenshot /tmp/state.png # reference
agent-browser get cdp-url | grep -o 'ws://[^ ]*' > /tmp/cdp.txt
bun grab-mhtml.mjs "$(cat /tmp/cdp.txt)" /tmp/page.mhtml
bun mhtml-to-html.mjs /tmp/page.mhtml captures/page.html
perl -i -pe 's/[\w.-]*\@mhtml\.blink/about:blank/g' captures/page.html
bun build-studio-bg.mjs captures/page.html compositions/page-bg.html page-bg 15
Light-themed captures (Next.js etc.) MUST be iframe-isolated — their global CSS leaks and turns the whole video white. Use <iframe src="../captures/page.html">
instead of inlining.
A flat data-volume
on the music track is NOT enough — the track's body is far louder than its intro. Flatten dynamics first, then duck:
bash master-audio.sh renders/final.mp4 assets/music.mp3 assets/host-voice.m4a <voiceStartSec> \
~/Downloads/output.mp4 0.03 0.22
Light captures leak CSS globally→ iframe-isolate them** Use**, nevergsap.fromTo()
in sub-compositionsgsap.from()
—immediateRender
breaksNever CSS— usetransform
- GSAP transform on same element
xPercent
/yPercent
or flex centeringor the renderer reports FROZEN/SILENT<video>
/<audio>
need anid
One stat per scene for data acts — three at once overlapVerify by extracting frames— collisions are invisible until you look** Avatar look vs group ID**— they can look like two avatars but be one
#!/usr/bin/env bun
import { writeFileSync } from "node:fs";
const [url, outPath] = process.argv.slice(2);
const ws = new WebSocket(url);
const send = (o) => ws.send(JSON.stringify(o));
ws.addEventListener("open", () => send({ id: 1, method: "Target.getTargets" }));
ws.addEventListener("message", (ev) => {
const d = JSON.parse(ev.data);
if (d.id === 1) {
const t = d.result.targetInfos.find((t) => t.type === "page" && t.url.includes("localhost"));
if (!t) { console.log("no page target"); ws.close(); return; }
send({ id: 2, method: "Target.attachToTarget", params: { targetId: t.targetId, flatten: true } });
}
if (d.id === 2) send({ id: 3, sessionId: d.result.sessionId, method: "Page.captureSnapshot", params: { format: "mhtml" } });
if (d.id === 3) {
if (d.result?.data) { writeFileSync(outPath, d.result.data); console.log("MHTML OK:", d.result.data.length, "bytes"); }
ws.close();
}
});
setTimeout(() => process.exit(0), 20000);
js
#!/usr/bin/env bun
import { readFileSync, writeFileSync } from "node:fs";
const [inPath, outPath] = process.argv.slice(2);
const raw = readFileSync(inPath, "latin1");
const boundaryMatch = raw.match(/boundary="([^"]+)"/);
const boundary = "--" + boundaryMatch[1];
const chunks = raw.split(boundary).filter((c) => c.trim() && !c.trim().startsWith("--"));
function parsePart(chunk) {
const idx = chunk.search(/\r?\n\r?\n/);
if (idx === -1) return null;
const headerBlock = chunk.slice(0, idx);
const sepLen = chunk.slice(idx).match(/^\r?\n\r?\n/)[0].length;
let body = chunk.slice(idx + sepLen);
const headers = {};
for (const line of headerBlock.split(/\r?\n/)) {
const m = line.match(/^([\w-]+):\s*(.*)$/);
if (m) headers[m[1].toLowerCase()] = m[2].trim();
}
return { headers, body };
}
function decodeQuotedPrintable(str) {
const noSoft = str.replace(/=\r?\n/g, "");
const bytes = [];
for (let i = 0; i < noSoft.length; i++) {
if (noSoft[i] === "=" && i + 2 < noSoft.length && /^[0-9A-Fa-f]{2}$/.test(noSoft.substr(i + 1, 2))) {
bytes.push(parseInt(noSoft.substr(i + 1, 2), 16)); i += 2;
} else bytes.push(noSoft.charCodeAt(i) & 0xff);
}
return Buffer.from(bytes);
}
function decodeBody(headers, body) {
const enc = (headers["content-transfer-encoding"] || "").toLowerCase();
if (enc === "base64") return Buffer.from(body.replace(/\s+/g, ""), "base64");
if (enc === "quoted-printable") return decodeQuotedPrintable(body);
return Buffer.from(body, "latin1");
}
let htmlPart = null;
const resourceMap = new Map();
for (const chunk of chunks) {
const part = parsePart(chunk);
if (!part) continue;
const ctype = (part.headers["content-type"] || "").split(";")[0].trim();
const loc = part.headers["content-location"];
const buf = decodeBody(part.headers, part.body);
if (ctype === "text/html" && !htmlPart) { htmlPart = buf.toString("utf8"); continue; }
if (loc) resourceMap.set(loc, `data:${ctype};base64,${buf.toString("base64")}`);
}
let html = htmlPart;
for (const loc of [...resourceMap.keys()].sort((a, b) => b.length - a.length)) html = html.split(loc).join(resourceMap.get(loc));
html = html.replace(/<script[\s\S]*?<\/script>/gi, "");
html = html.replace(/<link[^>]*rel=["']?(?:preload|prefetch|modulepreload)["']?[^>]*>/gi, "");
writeFileSync(outPath, html, "utf8");
console.log(`wrote ${outPath} (${(html.length / 1024).toFixed(0)} KB) — inlined ${resourceMap.size} resources`);
bash
#!/usr/bin/env bash
set -euo pipefail
VIDEO=$1; MUSIC=$2; VOICE=$3; VSTART=$4; OUT=$5; DUCK=${6:-0.05}; INTRO=${7:-0.22}
DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO")
DELAY=$(awk "BEGIN{printf \"%d\", $VSTART*1000}")
TMP=$(mktemp -d)/master.m4a
ffmpeg -y -v error -i "$MUSIC" -i "$VOICE" -filter_complex "\
[0:a]atrim=0:${DUR},dynaudnorm=f=200:g=15,volume='if(lt(t,${VSTART}),${INTRO},${DUCK})':eval=frame[bed]; \
[1:a]adelay=${DELAY}|${DELAY},apad=whole_dur=${DUR},volume=1.0[vox]; \
[bed][vox]amix=inputs=2:duration=longest:normalize=0[mix]; \
[mix]loudnorm=I=-14:TP=-1.5:LRA=11[out]" \
-map "[out]" -t "$DUR" -c:a aac -b:a 256k "$TMP"
ffmpeg -y -v error -i "$VIDEO" -i "$TMP" -map 0:v -map 1:a -c:v copy -c:a aac -movflags +faststart "$OUT"
echo "wrote $OUT (music ducked to ${DUCK} under voice, -14 LUFS)"
Quick way (installs the HeyGen + HyperFrames CLI skills that this workflow uses):
npx skills add heygen-com/skills
npx skills add heygen-com/hyperframes
Then for this skill:
Save this file as ~/.claude/skills/heygen-biweekly-video/SKILL.md
, then invoke with /heygen-biweekly-video
in Claude Code.