{"slug": "i-removed-the-vector-database-from-my-ai-agent-stack", "title": "I removed the vector database from my AI agent stack", "summary": "Moss, a sub-10 ms semantic search runtime, removes the need for a vector database in AI agent stacks by embedding search and embedding inside the application process, eliminating network hops. Benchmarks show Moss achieves 3.3 ms mean latency on 100,000 documents, compared to 358-596 ms for ChromaDB, Pinecone, and Qdrant. The runtime supports hybrid retrieval, built-in embeddings, and runs in the browser via WebAssembly.", "body_md": "Moss is a sub-10 ms semantic search runtime built for Conversational AI agents. Hybrid retrieval (semantic + Keyword Search), built-in embeddings, metadata filtering, and a WebAssembly build that runs in the browser - all from a single SDK that embeds in your application.\n\nNo network hop on the hot path. No clusters to tune. Point the SDK at Moss Cloud, load your index, and query it in **under 10 ms**. Python, TypeScript, Elixir, and C.\n\n**Before you start:** sign up at [moss.dev](https://moss.dev) for `project_id`\n\nand `project_key`\n\n- free tier available.\n\nThe snippets below need Python 3.10+ or Node.js 20+.\n\n```\npip install moss\npython\nfrom moss import MossClient, QueryOptions\n\nclient = MossClient(\"your_project_id\", \"your_project_key\")\n\n# Create an index and add documents\nawait client.create_index(\"support-docs\", [\n    {\"id\": \"1\", \"text\": \"Refunds are processed within 3-5 business days.\"},\n    {\"id\": \"2\", \"text\": \"You can track your order on the dashboard.\"},\n    {\"id\": \"3\", \"text\": \"We offer 24/7 live chat support.\"},\n])\n\n# Load and query — results in <10 ms\nawait client.load_index(\"support-docs\")\nresults = await client.query(\"support-docs\", \"how long do refunds take?\", QueryOptions(top_k=3))\n\nfor doc in results.docs:\n    print(f\"[{doc.score:.3f}] {doc.text}\")  # Returned in {results.time_taken_ms}ms\nnpm install @moss-dev/moss\njs\nimport { MossClient } from \"@moss-dev/moss\";\n\nconst client = new MossClient(\"your_project_id\", \"your_project_key\");\n\n// Create an index and add documents\nawait client.createIndex(\"support-docs\", [\n  { id: \"1\", text: \"Refunds are processed within 3-5 business days.\" },\n  { id: \"2\", text: \"You can track your order on the dashboard.\" },\n  { id: \"3\", text: \"We offer 24/7 live chat support.\" },\n]);\n\n// Load and query — results in <10 ms\nawait client.loadIndex(\"support-docs\");\nconst results = await client.query(\"support-docs\", \"how long do refunds take?\", { topK: 3 });\n\nresults.docs.forEach((doc) => {\n  console.log(`[${doc.score.toFixed(3)}] ${doc.text}`); // Returned in ${results.timeTakenInMs}ms\n});\n```\n\n**Most retrieval stacks call out to a remote vector database. The round trip alone runs 200–500 ms - enough to break a real-time conversation.**\n\nMoss runs search and embedding *inside* your process. There's no network hop on the hot path, so query latency lands in the single digits - fast enough that retrieval disappears from the latency budget. If you're building a voice bot, a copilot, or any agent that talks to humans, that's the difference between a tool that feels alive and one that feels laggy.\n\nEnd-to-end query latency (embedding + search) on 100,000 documents, 750 measured queries, top_k=5. Tested with Macbook pro (M4 Pro, 24GB).\n\n| System | P50 | P95 | P99 | Mean |\n|---|---|---|---|---|\nMoss |\n3.1 ms |\n4.3 ms |\n5.4 ms |\n3.3 ms |\n| Pinecone | 432.6 ms | 732.1 ms | 934.2 ms | 485.8 ms |\n| Qdrant | 597.6 ms | 682.0 ms | 771.4 ms | 596.5 ms |\n| ChromaDB | 351.8 ms | 423.5 ms | 538.5 ms | 358.0 ms |\n\nMoss includes embedding in the measurement — competitors use an external embedding service ([modal](https://modal.com/docs/examples/text_embeddings_inference)). Pinecone and Qdrant use cloud search.\n\nMoss isn't a database! It's a **search runtime**. You don't manage clusters, tune HNSW parameters, or worry about sharding. You index documents, load them into the runtime, and query. That's it.\n\n**Sub-10 ms semantic search**- single-digit-ms p99 in our[benchmarks](#benchmarks)** Hybrid search**- semantic + keyword in a single query** Built-in embedding models**- no OpenAI key required (or bring your own)** Metadata filtering**-`$eq`\n\n,`$and`\n\n,`$in`\n\n,`$near`\n\noperators**Runs in the browser too**- separate WebAssembly SDK () for client-side semantic search with no server`@moss-dev/moss-web`\n\n**Database connectors**- ingest directly from SQLite, MongoDB, MySQL, and Supabase ()`packages/moss-data-connector/`\n\n**CLI**- manage indexes and query from the terminal ()`packages/moss-cli/`\n\n**SDKs**- Python (3.10+), TypeScript / Node.js (20+), Elixir, and C ()`libmoss`\n\n**Framework integrations**- LangChain, DSPy, LlamaIndex, Pipecat, LiveKit, Vapi, ElevenLabs, Strands Agents\n\nThis repo contains working examples you can copy straight into your project:\n\n```\nexamples/\n├── python/                  # Python SDK samples\n│   ├── load_and_query_sample.py\n│   ├── comprehensive_sample.py\n│   ├── custom_embedding_sample.py\n│   └── metadata_filtering.py\n├── python-classification/   # Classification example\n├── javascript/              # TypeScript SDK samples\n│   ├── load_and_query_sample.ts\n│   ├── comprehensive_sample.ts\n│   └── custom_embedding_sample.ts\n├── javascript-web/          # Browser / WASM SDK samples\n├── c/                       # C SDK samples (libmoss)\n├── go/                      # Go SDK samples\n├── voice-agents/            # End-to-end voice agents (ambient + multi-agent)\n│   ├── airline-pnr/         # Ambient retrieval; per-PNR Moss indexes, swap mid-call\n│   └── mortgage-lending/    # Multi-agent flow with shared session state\n└── cookbook/                # Framework integrations\n    ├── langchain/           # LangChain retriever\n    ├── dspy/                # DSPy module\n    ├── crewai/              # CrewAI integration\n    ├── haystack/            # Haystack retriever\n    ├── autogen/             # AutoGen integration\n    ├── mastra/              # Mastra retriever\n    ├── pydantic-ai/         # Pydantic AI integration\n    └── daytona/             # Daytona sandbox example\n\napps/\n├── next-js/                 # Next.js semantic search UI\n├── pipecat-moss/            # Pipecat voice agent with Moss retrieval\n├── vapi-moss/               # Vapi voice agent with Moss retrieval\n├── elevenlabs-moss/         # ElevenLabs voice agent with Moss retrieval\n├── livekit-moss-vercel/     # LiveKit voice agent on Vercel\n├── agora-moss/              # Agora Conversational AI MCP server with Moss retrieval\n├── moss-llamaindex/         # LlamaIndex RAG backend + frontend\n├── moss-bun/                # Bun runtime example\n└── docker/                  # Dockerized examples (ECS/K8s pattern)\n\nmoss-live-labs/              # Experimental zone: prototypes and community demos\n├── python/                  # Minimal Python quickstart + advanced query example\n├── typescript/              # Minimal TypeScript quickstart + advanced query example\n├── examples/                # Larger experiments (image search, voice agents)\n│   ├── voice-agent/         # LiveKit + Moss voice assistant\n│   ├── advanced-voice-agent/ # Persona impersonator built on a PDF knowledge base\n│   └── image-search/        # FastAPI + React image search over COCO\n└── community-demos/         # Community-contributed projects\n    └── voice-agents/        # bharat-benefits, shoplabs-voice-agent\ncd examples/python\npip install -r requirements.txt\ncp ../../.env.example .env   # Add your credentials\npython load_and_query_sample.py\ncd examples/javascript\nnpm install\ncp ../../.env.example .env   # Add your credentials\nnpx tsx load_and_query_sample.ts\ncd apps/next-js\nnpm install\ncp ../../.env.example .env   # Add your credentials\nnpm run dev                  # Open http://localhost:3000\n```\n\nSub-10 ms retrieval plugged into [Pipecat's](https://github.com/pipecat-ai/pipecat) real-time voice pipeline — a customer support agent that actually keeps up with conversation.\n\n```\ncd apps/pipecat-moss/pipecat-quickstart\n# See README for setup and Pipecat Cloud deployment\n```\n\nA privacy-first voice AI stack: **Ollama** for LLM inference, **Moss** for retrieval, **Pipecat** for real-time audio - the LLM and retrieval both run on your machine.\n\n```\ncd apps/pipecat-moss/ollama-local\ndocker compose up\n```\n\nFull API reference: [docs.moss.dev](https://docs.moss.dev).\n\n| Framework | Status | Example |\n|---|---|---|\n|\n\n`examples/cookbook/langchain/`\n\n[DSPy](https://github.com/stanfordnlp/dspy)`examples/cookbook/dspy/`\n\n[LlamaIndex](https://github.com/run-llama/llama_index)`apps/moss-llamaindex/`\n\n[CrewAI](https://github.com/crewAIInc/crewAI)`examples/cookbook/crewai/`\n\n[AutoGen](https://github.com/microsoft/autogen)`examples/cookbook/autogen/`\n\n[Haystack](https://github.com/deepset-ai/haystack)`examples/cookbook/haystack/`\n\n[Mastra](https://mastra.ai)`examples/cookbook/mastra/`\n\n[Pydantic AI](https://ai.pydantic.dev)`examples/cookbook/pydantic-ai/`\n\n[Pipecat](https://github.com/pipecat-ai/pipecat)`apps/pipecat-moss/`\n\n[LiveKit](https://github.com/livekit/livekit)`apps/livekit-moss-vercel/`\n\n[Vapi](https://vapi.ai)`apps/vapi-moss/`\n\n[ElevenLabs](https://elevenlabs.io)`apps/elevenlabs-moss/`\n\n[Agora](https://www.agora.io/)`apps/agora-moss/`\n\n[Strands Agents](https://github.com/strands-agents/sdk-python)`packages/strands-agents-moss/`\n\n[Next.js](https://nextjs.org)`apps/next-js/`\n\n[VitePress](https://vitepress.dev)`packages/vitepress-plugin-moss/`\n\n[Vercel AI SDK](https://sdk.vercel.ai)`packages/vercel-sdk/`\n\nThree parts:\n\n**Moss Cloud**- handles ingestion, document embedding, storage, and distribution. Point the SDK at it with a project ID and key.** Index**- your documents and their vectors, packaged as a single artifact that lives on Moss Cloud.** Runtime**- embedded in your application. It pulls indexes over HTTPS, holds them in memory, and serves queries locally.\n\nOnce an index is loaded, queries don't leave your process - that's where the sub-10 ms latency comes from. Document changes flow through Moss Cloud and the runtime stays in sync.\n\n**Server-side**-`moss`\n\n(Python) and`@moss-dev/moss`\n\n(Node.js 20+) embed the runtime in your backend. Use this when your agent runs on a server.**Browser**-`@moss-dev/moss-web`\n\nis a WebAssembly build that downloads the index and runs queries entirely client-side, no server required. Use this for static sites, browser extensions, and offline-first apps. See.`examples/javascript-web/`\n\nFull Python SDK source code is available at [ sdks/python/](/usemoss/moss/blob/main/sdks/python).\n\nHere's where the community can have the most impact:\n\n**New SDK bindings**— Swift, Go, Elixir,...** Framework integrations**— CrewAI, Haystack, AutoGen** Reranking support**— plug in cross-encoder rerankers** Doc-parsing connectors**— PDF, DOCX, HTML, Markdown ingestion** Examples and tutorials**— if you build something with Moss, we'd love to feature it\n\nSee our [Contributing Guide](/usemoss/moss/blob/main/CONTRIBUTING.md) for setup instructions and our [Roadmap](/usemoss/moss/blob/main/ROADMAP.md) for what's planned.\n\nCheck out issues labeled [ good first issue](https://github.com/usemoss/moss/labels/good%20first%20issue) to get started.\n\n[Discord](https://moss.link/discord)— ask questions, share what you're building[GitHub Issues](https://github.com/usemoss/moss/issues)— bug reports and feature requests[Twitter](https://x.com/usemoss)— announcements and updates\n\n[BSD 2-Clause License](/usemoss/moss/blob/main/LICENSE) — the SDKs, examples, and integrations in this repo are fully open source.\n\nBuilt by the team at", "url": "https://wpnews.pro/news/i-removed-the-vector-database-from-my-ai-agent-stack", "canonical_source": "https://github.com/usemoss/moss", "published_at": "2026-06-27 00:05:54+00:00", "updated_at": "2026-06-27 00:35:47.611933+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "ai-tools", "machine-learning", "natural-language-processing"], "entities": ["Moss", "Pinecone", "Qdrant", "ChromaDB", "OpenAI", "Moss Cloud", "WebAssembly", "MossClient"], "alternates": {"html": "https://wpnews.pro/news/i-removed-the-vector-database-from-my-ai-agent-stack", "markdown": "https://wpnews.pro/news/i-removed-the-vector-database-from-my-ai-agent-stack.md", "text": "https://wpnews.pro/news/i-removed-the-vector-database-from-my-ai-agent-stack.txt", "jsonld": "https://wpnews.pro/news/i-removed-the-vector-database-from-my-ai-agent-stack.jsonld"}}