Moss is a sub-10 ms semantic search runtime built for Conversational AI agents. Hybrid retrieval (semantic + Keyword Search), built-in embeddings, metadata filtering, and a WebAssembly build that runs in the browser - all from a single SDK that embeds in your application.
No network hop on the hot path. No clusters to tune. Point the SDK at Moss Cloud, load your index, and query it in under 10 ms. Python, TypeScript, Elixir, and C.
Before you start: sign up at moss.dev for project_id
and project_key
- free tier available.
The snippets below need Python 3.10+ or Node.js 20+.
pip install moss
python
from moss import MossClient, QueryOptions
client = MossClient("your_project_id", "your_project_key")
await client.create_index("support-docs", [
{"id": "1", "text": "Refunds are processed within 3-5 business days."},
{"id": "2", "text": "You can track your order on the dashboard."},
{"id": "3", "text": "We offer 24/7 live chat support."},
])
await client.load_index("support-docs")
results = await client.query("support-docs", "how long do refunds take?", QueryOptions(top_k=3))
for doc in results.docs:
print(f"[{doc.score:.3f}] {doc.text}") # Returned in {results.time_taken_ms}ms
npm install @moss-dev/moss
js
import { MossClient } from "@moss-dev/moss";
const client = new MossClient("your_project_id", "your_project_key");
// Create an index and add documents
await client.createIndex("support-docs", [
{ id: "1", text: "Refunds are processed within 3-5 business days." },
{ id: "2", text: "You can track your order on the dashboard." },
{ id: "3", text: "We offer 24/7 live chat support." },
]);
// Load and query β results in <10 ms
await client.loadIndex("support-docs");
const results = await client.query("support-docs", "how long do refunds take?", { topK: 3 });
results.docs.forEach((doc) => {
console.log(`[${doc.score.toFixed(3)}] ${doc.text}`); // Returned in ${results.timeTakenInMs}ms
});
Most retrieval stacks call out to a remote vector database. The round trip alone runs 200β500 ms - enough to break a real-time conversation.
Moss runs search and embedding inside your process. There's no network hop on the hot path, so query latency lands in the single digits - fast enough that retrieval disappears from the latency budget. If you're building a voice bot, a copilot, or any agent that talks to humans, that's the difference between a tool that feels alive and one that feels laggy.
End-to-end query latency (embedding + search) on 100,000 documents, 750 measured queries, top_k=5. Tested with Macbook pro (M4 Pro, 24GB).
| System | P50 | P95 | P99 | Mean |
|---|---|---|---|---|
| Moss | ||||
| 3.1 ms | ||||
| 4.3 ms | ||||
| 5.4 ms | ||||
| 3.3 ms | ||||
| Pinecone | 432.6 ms | 732.1 ms | 934.2 ms | 485.8 ms |
| Qdrant | 597.6 ms | 682.0 ms | 771.4 ms | 596.5 ms |
| ChromaDB | 351.8 ms | 423.5 ms | 538.5 ms | 358.0 ms |
Moss includes embedding in the measurement β competitors use an external embedding service (modal). Pinecone and Qdrant use cloud search.
Moss isn't a database! It's a search runtime. You don't manage clusters, tune HNSW parameters, or worry about sharding. You index documents, load them into the runtime, and query. That's it.
Sub-10 ms semantic search- single-digit-ms p99 in ourbenchmarks** Hybrid search**- semantic + keyword in a single query** Built-in embedding models**- no OpenAI key required (or bring your own)** Metadata filtering**-$eq
,$and
,$in
,$near
operatorsRuns in the browser too- separate WebAssembly SDK () for client-side semantic search with no server@moss-dev/moss-web
Database connectors- ingest directly from SQLite, MongoDB, MySQL, and Supabase ()packages/moss-data-connector/
CLI- manage indexes and query from the terminal ()packages/moss-cli/
SDKs- Python (3.10+), TypeScript / Node.js (20+), Elixir, and C ()libmoss
Framework integrations- LangChain, DSPy, LlamaIndex, Pipecat, LiveKit, Vapi, ElevenLabs, Strands Agents
This repo contains working examples you can copy straight into your project:
examples/
βββ python/ # Python SDK samples
β βββ load_and_query_sample.py
β βββ comprehensive_sample.py
β βββ custom_embedding_sample.py
β βββ metadata_filtering.py
βββ python-classification/ # Classification example
βββ javascript/ # TypeScript SDK samples
β βββ load_and_query_sample.ts
β βββ comprehensive_sample.ts
β βββ custom_embedding_sample.ts
βββ javascript-web/ # Browser / WASM SDK samples
βββ c/ # C SDK samples (libmoss)
βββ go/ # Go SDK samples
βββ voice-agents/ # End-to-end voice agents (ambient + multi-agent)
β βββ airline-pnr/ # Ambient retrieval; per-PNR Moss indexes, swap mid-call
β βββ mortgage-lending/ # Multi-agent flow with shared session state
βββ cookbook/ # Framework integrations
βββ langchain/ # LangChain retriever
βββ dspy/ # DSPy module
βββ crewai/ # CrewAI integration
βββ haystack/ # Haystack retriever
βββ autogen/ # AutoGen integration
βββ mastra/ # Mastra retriever
βββ pydantic-ai/ # Pydantic AI integration
βββ daytona/ # Daytona sandbox example
apps/
βββ next-js/ # Next.js semantic search UI
βββ pipecat-moss/ # Pipecat voice agent with Moss retrieval
βββ vapi-moss/ # Vapi voice agent with Moss retrieval
βββ elevenlabs-moss/ # ElevenLabs voice agent with Moss retrieval
βββ livekit-moss-vercel/ # LiveKit voice agent on Vercel
βββ agora-moss/ # Agora Conversational AI MCP server with Moss retrieval
βββ moss-llamaindex/ # LlamaIndex RAG backend + frontend
βββ moss-bun/ # Bun runtime example
βββ docker/ # Dockerized examples (ECS/K8s pattern)
moss-live-labs/ # Experimental zone: prototypes and community demos
βββ python/ # Minimal Python quickstart + advanced query example
βββ typescript/ # Minimal TypeScript quickstart + advanced query example
βββ examples/ # Larger experiments (image search, voice agents)
β βββ voice-agent/ # LiveKit + Moss voice assistant
β βββ advanced-voice-agent/ # Persona impersonator built on a PDF knowledge base
β βββ image-search/ # FastAPI + React image search over COCO
βββ community-demos/ # Community-contributed projects
βββ voice-agents/ # bharat-benefits, shoplabs-voice-agent
cd examples/python
pip install -r requirements.txt
cp ../../.env.example .env # Add your credentials
python load_and_query_sample.py
cd examples/javascript
npm install
cp ../../.env.example .env # Add your credentials
npx tsx load_and_query_sample.ts
cd apps/next-js
npm install
cp ../../.env.example .env # Add your credentials
npm run dev # Open http://localhost:3000
Sub-10 ms retrieval plugged into Pipecat's real-time voice pipeline β a customer support agent that actually keeps up with conversation.
cd apps/pipecat-moss/pipecat-quickstart
A privacy-first voice AI stack: Ollama for LLM inference, Moss for retrieval, Pipecat for real-time audio - the LLM and retrieval both run on your machine.
cd apps/pipecat-moss/ollama-local
docker compose up
Full API reference: docs.moss.dev.
| Framework | Status | Example |
|---|---|---|
examples/cookbook/langchain/
DSPyexamples/cookbook/dspy/
LlamaIndexapps/moss-llamaindex/
CrewAIexamples/cookbook/crewai/
AutoGenexamples/cookbook/autogen/
Haystackexamples/cookbook/haystack/
Mastraexamples/cookbook/mastra/
Pydantic AIexamples/cookbook/pydantic-ai/
Pipecatapps/pipecat-moss/
LiveKitapps/livekit-moss-vercel/
Vapiapps/vapi-moss/
ElevenLabsapps/elevenlabs-moss/
Agoraapps/agora-moss/
Strands Agentspackages/strands-agents-moss/
Next.jsapps/next-js/
VitePresspackages/vitepress-plugin-moss/
Vercel AI SDKpackages/vercel-sdk/
Three parts:
Moss Cloud- handles ingestion, document embedding, storage, and distribution. Point the SDK at it with a project ID and key.** Index**- your documents and their vectors, packaged as a single artifact that lives on Moss Cloud.** Runtime**- embedded in your application. It pulls indexes over HTTPS, holds them in memory, and serves queries locally.
Once an index is loaded, queries don't leave your process - that's where the sub-10 ms latency comes from. Document changes flow through Moss Cloud and the runtime stays in sync.
Server-side-moss
(Python) and@moss-dev/moss
(Node.js 20+) embed the runtime in your backend. Use this when your agent runs on a server.Browser-@moss-dev/moss-web
is a WebAssembly build that downloads the index and runs queries entirely client-side, no server required. Use this for static sites, browser extensions, and offline-first apps. See.examples/javascript-web/
Full Python SDK source code is available at sdks/python/.
Here's where the community can have the most impact:
New SDK bindingsβ Swift, Go, Elixir,...** Framework integrations**β CrewAI, Haystack, AutoGen** Reranking support**β plug in cross-encoder rerankers** Doc-parsing connectors**β PDF, DOCX, HTML, Markdown ingestion** Examples and tutorials**β if you build something with Moss, we'd love to feature it
See our Contributing Guide for setup instructions and our Roadmap for what's planned.
Check out issues labeled good first issue to get started.
Discordβ ask questions, share what you're buildingGitHub Issuesβ bug reports and feature requestsTwitterβ announcements and updates
BSD 2-Clause License β the SDKs, examples, and integrations in this repo are fully open source.
Built by the team at