# I Built an AI-Powered Meeting Platform From Scratch — Here’s How It Actually Works

> Source: <https://dev.to/anupam_kumar/i-built-an-ai-powered-meeting-platform-from-scratch-heres-how-it-actually-works-31p>
> Published: 2026-06-03 19:33:42+00:00

A complete breakdown of Hoovik: WebRTC signaling, distributed Node.js with Redis, real-time emotion AI, RAG on meeting transcripts, and a Python transcription pipeline — all wired together.

👉 GitHub: [https://github.com/AnupamKumar-1/Hoovik](https://github.com/AnupamKumar-1/Hoovik)

🌐 Live Demo: [https://hoovik.onrender.com](https://hoovik.onrender.com)

🎮 Interactive Demo: [https://app.supademo.com/demo/cmpy5ggyv95b0qmy7ccrkd3ms?utm_source=link](https://app.supademo.com/demo/cmpy5ggyv95b0qmy7ccrkd3ms?utm_source=link)

I've previously written about individual parts of Hoovik, including its emotion analysis system and WebRTC signaling architecture.

Those articles focused on specific subsystems. This one focuses on the complete platform.

Hoovik is not a single application. It is a collection of services working together: a React/WebRTC frontend, a distributed Node.js backend, a transcription pipeline, a real-time emotion recognition service, and a retrieval-augmented search system built on meeting transcripts.

This article walks through how those systems interact, the architectural decisions behind them, and the tradeoffs encountered while building each component.

Hoovik is a multi-party video meeting platform that combines real-time communication, AI-assisted analysis, and transcript intelligence.

The platform includes:

The system is composed of four primary services.

The remainder of this article follows the lifecycle of a meeting and explains how each service participates.

The backend is responsible for:

The deployment runs as multiple PM2 processes connected through:

Room state cannot safely live in process memory when multiple Node.js instances are handling requests.

Instead, mutable meeting state is stored in Redis.

Participants are stored in a Redis Hash:

text meeting:participants:

Each field contains a serialized participant object.

This design allows:

Join order is stored separately and is used for WebRTC role assignment.

Joining a room modifies shared state.

To prevent race conditions, room joins are serialized using a Redis-backed distributed lock.

js await withRoomLock(meetingCode, async () => { // join logic });

The lock uses:

This guarantees that only one join operation mutates room state at a time.

Authentication uses JWT access tokens and refresh token rotation.

Login issues:

Refresh tokens are rotated on every refresh request, reducing replay risk while preserving user sessions.

The frontend is a React application built around specialized hooks that manage independent subsystems.

Major responsibilities include:

Peer connections are managed through dedicated React hooks and implement the perfect negotiation pattern.

The application supports:

Two independent detection paths exist.

When available:

js RTCRtpReceiver.getSynchronizationSources()

is used to obtain RTP audio levels directly.

Browsers without SSRC support use:

The application selects the appropriate method dynamically.

The host captures:

Captured media is sent directly to the emotion service using dedicated Socket.IO connections.

Each participant receives an independent emotion-service connection, allowing participant-level media state tracking and backpressure control.

The emotion service can instruct the frontend to adjust capture rates through server status and backpressure events.

Emotion events collected during a meeting are stored locally and later submitted when generating an AI summary.

The backend combines:

This enables AI summaries to highlight notable discrepancies between spoken content and observed participant emotions.

The transcript service is implemented in FastAPI.

Its responsibilities include:

The service uses:

for transcription and emotion tagging.

Meeting recordings are uploaded after a meeting ends.

The service immediately returns:

http 202 Accepted

and performs processing in a background task.

The processing pipeline is:

`Audio Upload`

↓

FFmpeg Conversion

↓

Whisper Transcription

↓

Segment Merging

↓

NLP Emotion Classification (DistilRoBERTa)

↓

Transcript Callback To Node Backend

After processing completes, the transcript service sends structured transcript data back to the Node.js backend.

Retry logic is used to improve reliability during temporary backend failures.

The emotion service performs real-time inference on participant media streams.

The frontend sends:

directly to the service.

The service performs inference using:

and emits:

text emotion.result

events back to the frontend.

Inference continues even when a participant disables one modality.

Examples:

This allows emotion tracking to continue without requiring both media streams.

The service also emits:

events that allow the frontend to dynamically adjust capture rates and reduce load.

After transcripts are stored, they can be indexed for semantic retrieval.

The indexing pipeline consists of:

When speaker segments are available, chunks preserve:

Otherwise, a sliding-window chunking strategy is used.

Embeddings are generated using:

text nomic-embed-text-v1.5

Embedding results are cached in Redis to avoid redundant computation.

Transcript indexing runs asynchronously through BullMQ workers.

This prevents long-running embedding operations from blocking API requests.

Retrieval combines:

to balance relevance and diversity.

Retrieved context is passed to Groq-hosted language models to generate answers.

Session history is maintained to support multi-turn conversations over meeting data.

Access control follows the same authorization model as transcript access:

Several known tradeoffs remain in the current architecture.

These decisions were acceptable for the current scale of the platform, but dedicated workers and queue-based processing would be natural next steps.

Hoovik evolved from a simple video meeting application into a distributed platform that combines WebRTC, real-time machine learning, transcript intelligence, and retrieval-augmented search.

The most interesting part of the project was not any single technology. It was designing the boundaries between services and making them work reliably together under real-world constraints.

If you'd like to explore the implementation, try the interactive demo or browse the source code on GitHub.
