Embeddings is all you need

A new in-browser voice-to-action system uses a tiny embedding model (MiniLM-L6-v2) to classify intents via cosine similarity, achieving sub-50ms latency without any server or large language model. The pipeline runs entirely in the browser using Web Speech API and WASM, enabling fast, private intent classification for tasks like shopping list management.

Voice → Embedding → Action 100% in-browser · no server · no LLM · < 50 ms after warm-up Intent classification using a tiny embedding model MiniLM-L6-v2, 23 MB, WASM — cosine similarity, not a language model Click to speak Transcript 🛒 Shopping list - Say "add milk" or "remove bread"… ⚡ Custom actions Intent — Confidence — Latency — Example commands — click to trigger with this text Cosine similarity per intent Waiting for first command… Local pipeline · no server · no LLM Web Speech API → Transcript → MiniLM embedding WASM → Cosine similarity → DOM action