This post is my submission for DEV Education Track: Build Apps with Google AI Studio.
Voilaa! is a full-stack educational playground that transforms any YouTube video into a rich, interactive learning experience — think live quizzes, flashcard decks, formula simulators, and data visualizations — all generated on-the-fly by Google Gemini AI.
The idea is simple: paste a YouTube URL, choose your academic depth, and within seconds Gemini analyzes the video's content and synthesizes a fully functional, self-contained interactive HTML learning app tailored to that exact lesson.
The magic is a two-stage AI chain running entirely on the server side:
The first model acts as a pedagogist. It watches the video and produces a structured JSON payload containing:
spec
flashcards
The prompt I crafted for this stage was the most important piece of the whole project:
You are a pedagogist and product designer with deep expertise in crafting
engaging learning experiences via interactive web apps.
Examine the contents of the attached video. Then, provide the following in JSON:
1. "spec": A detailed spec for an interactive web app designed to complement
the video and reinforce its key ideas. The spec must be thorough and
self-contained (must not mention it is based on a video).
2. "flashcards": A list of at least 5 key terms and concise definitions
extracted from the video.
The goal of the app is to enhance understanding through simple and playful
design. A junior web developer should be able to implement it in a single
HTML file (with all styles and scripts inline). The spec must clearly outline
the core mechanics, and those mechanics must be highly effective in reinforcing
the video's key ideas.
The second model receives the spec and generates a pristine, single-file HTML/CSS/JS application — no frameworks, no external dependencies — ready to run inside a sandboxed iframe.
| Layer | Technology |
|---|---|
| Frontend | React 18 + Vite (SPA) |
| Styling | Tailwind CSS + motion animations |
| Code Editor | Monaco Editor (same engine as VS Code) |
| Charts | Recharts |
| Icons | Lucide React |
| AI | |
@google/genai TypeScript SDK |
|
| Backend | Node.js + Express 5 |
| Runtime | |
tsx (direct TypeScript execution) |
The Gemini API key lives exclusively on the server — never exposed to the client bundle.
Once a learning app is generated, users get a three-tab workspace:
A live sandboxed <iframe>
running the generated app — fully interactive, no page reload required.
A full Monaco Editor showing (and letting you edit) the raw generated HTML/JS/CSS. Any saved changes hot-reload the preview instantly.
Inspect or edit the curriculum blueprint produced by the Semantic Analyst — great for prompting a regeneration with tweaks.
There's also a Zen Mode (fades surrounding UI to focus on the lesson) and Fullscreen Mode for distraction-free study.
🔗
Live App →([https://voilaa-498153626537.us-west1.run.app/])
→ Gemini analyzes chord progressions, tension, and resolution
→ Generates an interactive piano simulator with chord-click feedback
→ Flashcard deck covers: Tonic, Dominant, Leading Tone, Cadence, Voice Leading
→ Gemini generates a step-by-step animated bubble sort / merge sort visualizer
→ Flashcards cover time complexity, in-place sorting, stability, etc.
I expected the hardest part to be the frontend sandbox mechanics. It wasn't. The hardest part was prompt engineering the Semantic Analyst.
Early versions of the spec prompt produced specs that were either too vague ("make an interactive quiz") or too ambitious ("build a multi-page React app with a backend"). The breakthrough was adding the constraint:
"A junior web developer should be able to implement it in a single HTML file."
This single sentence dramatically improved output quality — Gemini started producing specs with clearly scoped, concrete mechanics instead of wishful thinking.
Two-model chains unlock quality you can't get from one prompt. Separating "think about what to build" from "write the code" produced dramatically better results. The planning model could focus entirely on pedagogy; the coding model could focus entirely on implementation.
Temperature matters more than model choice for creative educational content. A temperature of ~0.75 produced the most varied and playful learning apps, while staying coherent.
Keeping the API key server-side is non-negotiable. Even for a hackathon demo, having Express proxy all Gemini calls protects your quota and prevents key leakage.
Sandboxed iframes are underrated. Running user-generated HTML inside <iframe sandbox="allow-scripts">
meant I could ship AI-generated code directly to the browser without worrying about XSS or DOM pollution.
Voilaa! was a genuinely fun project to build. The combination of Gemini's multimodal understanding and the flexibility of the @google/genai
SDK made what could have been a complex AI integration feel surprisingly clean. If you've got a YouTube rabbit hole you're currently lost in — try turning it into an interactive lesson instead. 🎬✨