# Voilaa! — Turning Any YouTube Video into an Interactive Learning App with Google Gemini

> Source: <https://dev.to/miii/voilaa-turning-any-youtube-video-into-an-interactive-learning-app-with-google-gemini-2kl5>
> Published: 2026-06-27 11:15:38+00:00

*This post is my submission for DEV Education Track: Build Apps with Google AI Studio.*

**Voilaa!** is a full-stack educational playground that transforms any YouTube video into a rich, interactive learning experience — think live quizzes, flashcard decks, formula simulators, and data visualizations — all generated on-the-fly by Google Gemini AI.

The idea is simple: paste a YouTube URL, choose your academic depth, and within seconds Gemini analyzes the video's content and synthesizes a fully functional, self-contained interactive HTML learning app tailored to *that exact lesson*.

The magic is a two-stage AI chain running entirely on the server side:

The first model acts as a *pedagogist*. It watches the video and produces a structured JSON payload containing:

`spec`

`flashcards`

The prompt I crafted for this stage was the most important piece of the whole project:

```
You are a pedagogist and product designer with deep expertise in crafting 
engaging learning experiences via interactive web apps.

Examine the contents of the attached video. Then, provide the following in JSON:
1. "spec": A detailed spec for an interactive web app designed to complement 
   the video and reinforce its key ideas. The spec must be thorough and 
   self-contained (must not mention it is based on a video).
2. "flashcards": A list of at least 5 key terms and concise definitions 
   extracted from the video.

The goal of the app is to enhance understanding through simple and playful 
design. A junior web developer should be able to implement it in a single 
HTML file (with all styles and scripts inline). The spec must clearly outline 
the core mechanics, and those mechanics must be highly effective in reinforcing 
the video's key ideas.
```

The second model receives the spec and generates a pristine, single-file HTML/CSS/JS application — no frameworks, no external dependencies — ready to run inside a sandboxed iframe.

| Layer | Technology |
|---|---|
| Frontend | React 18 + Vite (SPA) |
| Styling | Tailwind CSS + motion animations |
| Code Editor | Monaco Editor (same engine as VS Code) |
| Charts | Recharts |
| Icons | Lucide React |
| AI |
`@google/genai` TypeScript SDK |
| Backend | Node.js + Express 5 |
| Runtime |
`tsx` (direct TypeScript execution) |

The Gemini API key lives **exclusively on the server** — never exposed to the client bundle.

Once a learning app is generated, users get a three-tab workspace:

A live sandboxed `<iframe>`

running the generated app — fully interactive, no page reload required.

A full Monaco Editor showing (and letting you edit) the raw generated HTML/JS/CSS. Any saved changes hot-reload the preview instantly.

Inspect or edit the curriculum blueprint produced by the Semantic Analyst — great for prompting a regeneration with tweaks.

There's also a **Zen Mode** (fades surrounding UI to focus on the lesson) and **Fullscreen Mode** for distraction-free study.

🔗

Live App →([https://voilaa-498153626537.us-west1.run.app/])

→ Gemini analyzes chord progressions, tension, and resolution

→ Generates an interactive piano simulator with chord-click feedback

→ Flashcard deck covers: Tonic, Dominant, Leading Tone, Cadence, Voice Leading

→ Gemini generates a step-by-step animated bubble sort / merge sort visualizer

→ Flashcards cover time complexity, in-place sorting, stability, etc.

I expected the hardest part to be the frontend sandbox mechanics. It wasn't. The hardest part was **prompt engineering the Semantic Analyst**.

Early versions of the spec prompt produced specs that were either too vague ("make an interactive quiz") or too ambitious ("build a multi-page React app with a backend"). The breakthrough was adding the constraint:

"A junior web developer should be able to implement it in a single HTML file."

This single sentence dramatically improved output quality — Gemini started producing specs with clearly scoped, concrete mechanics instead of wishful thinking.

**Two-model chains unlock quality you can't get from one prompt.** Separating "think about what to build" from "write the code" produced dramatically better results. The planning model could focus entirely on pedagogy; the coding model could focus entirely on implementation.

**Temperature matters more than model choice** for creative educational content. A temperature of ~0.75 produced the most varied and playful learning apps, while staying coherent.

**Keeping the API key server-side is non-negotiable.** Even for a hackathon demo, having Express proxy all Gemini calls protects your quota and prevents key leakage.

**Sandboxed iframes are underrated.** Running user-generated HTML inside `<iframe sandbox="allow-scripts">`

meant I could ship AI-generated code directly to the browser without worrying about XSS or DOM pollution.

Voilaa! was a genuinely fun project to build. The combination of Gemini's multimodal understanding and the flexibility of the `@google/genai`

SDK made what could have been a complex AI integration feel surprisingly clean. If you've got a YouTube rabbit hole you're currently lost in — try turning it into an interactive lesson instead. 🎬✨
