A new ModelStudio Skill ships today — feed it any novel or short story and it scaffolds a full React SPA with branching scenes, AI-generated portraits, cinematic cutscenes, optional TTS narration, procedural Web Audio music, and save slots. Then you open
localhost
and play.Repo:
[github.com/modelstudioai/skills]
Skill path:skills/novel-game/
By[ModelStudioAI]
In one sentence: hand your Agent a novel, get back a playable browser game you can share with a friend.
This is not "generate a plot synopsis and call it interactive". It is not "here is a prompt, go paste it into some web UI". The novel-game
skill literally:
bl
CLI — character portraits as 5-second looping videos or "breathing" still images, cutscenes as 1080p video or Ken-Burns-style stills, narration via TTS, BGM and SFX procedurally synthesized through Web Audio (no external audio files);public/assets/
so the running game makes A typical run produces 8–10 scenes, 3–5 branching endings, 6–8 AI portraits, 5–8 AI cutscenes, a procedural soundtrack, and three manual save slots plus autosave.
It is the first true end-to-end demo of the bl multimodal stack. Earlier first-party skills lean on a single modality — docs lookup, prompt studio, financial agent, single-shot short-form video.
novel-game
is the first that wires video + image + speech + procedural audio + frontend engineering into a single pipeline whose output you can click on.It pushes "Agent Skill" from tool-call to product delivery. Most skills today expose an API. This one delegates a creative pipeline and hands back a runnable web app. That is a different abstraction altogether.
It is the cleanest "show, don't tell" demo we have. Walking a stakeholder through a live run beats ten slides — the Agent literally turned a novel into a game while we watched.
The full version lives in SKILL.md. Condensed:
1 · Requirements. The Agent fires a single AskUserQuestion
with seven decisions: source material (EPUB / TXT / freeform prompt), game type (visual novel / text adventure / text RPG), UI style (pixel / cyberpunk / ink-wash / minimal), narrative POV, asset mode (video / image / hybrid — hybrid recommended), audio mode (BGM only / +SFX / +TTS narration), target duration (15 / 30 / 60+ minutes).
2 · Story design. Extract 1–3 main lines, 3–5 branching choice points, 3–5 endings, 6–8 characters, 5–8 cutscene moments, plus unlockable codex entries.
3 · Project scaffold. npx create-react-app
with the canonical layout: components/
, data/
, hooks/
, styles/
, scripts/
, public/assets/
.
4 · Data model. A single story.js
describes the full scene graph — branching choices, flag mutations, cutscene triggers, codex unlocks, ending conditions. A sibling generated-assets.json
indexes every AI-generated asset with its local path and type (video
/ image
/ mp3
).
5 · Implementation patterns. Typewriter text via timed setInterval
(40–50ms per char), choice panel with hover affordances, hash routing so any chapter is a deep link, localStorage
-backed autosave plus three manual slots, portrait component that auto-detects video vs image, cutscene component with Ken Burns for stills, mobile-aware touch targets (≥44px) and touch-action: manipulation
, lazy video with explicit memory release on unmount.
6 · Asset generation. scripts/generate-assets.sh
drives the bl
CLI:
bl video generate --download # portrait or cutscene as 5s 720p video
bl image generate # portrait or background, 768x1024 / 1920x1080
bl speech synthesize --voice longxiaochun # TTS narration
bl video ref # multi-reference video for character consistency
Video jobs take 2–5 minutes each, so the script submits in --async
mode with 3–5 concurrent jobs, then batch-downloads. Roughly a 4× wall-time speedup vs sequential.
Assuming a 30-minute / 15–18 scene mid-tier run in hybrid asset mode:
| Item | Count | Unit | Subtotal |
|---|---|---|---|
| Character portraits (image) | 8 × 768×1024 | bl image generate |
|
| cents range | |||
| Key cutscenes (720p video) | 5 × 5s | ¥0.9/s × 5s × 5 | ≈ ¥22.5 |
| Scene backgrounds (image) | 8 × 1920×1080 | bl image generate |
|
| cents range | |||
| TTS narration (optional) | 15 × ~30s | bl speech synthesize |
|
| single-digit ¥ |
Authoritative pricing is on the ModelStudio console. New accounts get free credits — plenty to walk the demo end-to-end.
The headline: roughly a cup of coffee, and you have your own playable visual novel.
1. Install the skill:
npx skills add modelstudioai/skills
Pick novel-game
from the prompt (or --all
to grab the whole bundle).
2. Wire up bl
and an API key:
npm i -g bailian-cli
bl auth login
Grab a key at bailian.console.aliyun.com — new accounts get free credits.
3. Ask your favorite Agent (Claude Code / Qoder / Cursor / Cline / …) in natural language:
Adapt the Ye Wenjie arc from "The Three-Body Problem" into a visual novel,
ink-wash style, 30-minute playtime, hybrid assets.
The Agent owns the rest: ask the seven decisions, design the branches, scaffold the React project, write the code, generate every asset, and npm start
.
Prompt safety. Video prompts containing weapons, smoking, or explicit violence get rejected. The skill ships a built-in rewrite table — content lessons distilled from dozens of real runs.
Procedural BGM. Music is synthesized live via Web Audio, not pre-baked MP3s. Fixed MIDI pitch arrays, multi-voice layering, convolution reverb, ADSR envelopes, detuned pad sustains. Sounds composed; ships at zero file size.
Mobile gotchas. iOS Safari auto-zooms inputs under 16px. touch-action: manipulation
to kill the 300ms click delay. env(safe-area-inset-bottom)
for the home indicator. Dual-bind click
touchend
to recover responsiveness on edge browsers. All wired in by default.
Video memory release. When leaving a scene, video.(); video.removeAttribute('src'); video.load();
is required — without it the WebView leaks frame buffers and a 30-minute play session ends in jank.
Our bar for first-party skills is one line: someone shipped real output with it, and the design is worth copying.
novel-game
clears both bars. Author @lishengzxc used it to produce a complete novel-to-game adaptation — not a screenshot demo, an actual React project people have played. The pitfall guide in SKILL.md
is paid-for-in-real-time wisdom.
If you build content, games, or interactive narratives, this is a low-cost weekend to spend.
Share what you build over at Issues, or open a PR if you spot a missing tip in the SKILL.
Repo · github.com/modelstudioai/skills
Skill · skills/novel-game/
Try the model · ModelStudio HappyHorse 1.1 playground
— ModelStudioAI on GitHub