Your RAG Is Underperforming Because Your Embeddings Are Too Simple

Cohere's new Compass embedding model addresses the limitations of standard single-vector embeddings in enterprise RAG systems by preserving the structure of multi-aspect data. Instead of compressing documents into single vectors, Compass accepts structured JSON inputs to capture relationships between distinct concepts, reducing the need for brittle post-retrieval filters. The model is currently in private beta with an open-source Python SDK available for testing.

Most production RAG systems are built on a simple premise: convert documents into single vectors and find the ones closest to a query vector. This works for simple documents, but fails on the messy, multi-aspect data that defines enterprise reality. Cohere's Compass is a new embedding model designed for this specific problem, and it suggests a necessary evolution in how we build retrieval systems. Standard embedding models, including powerful ones like Cohere's own Embed v3, map an entire document to a single point in semantic space. This is a lossy compression. If a document contains multiple distinct concepts—like an invoice with a specific sender, due date, and line items—the resulting vector is an average of all those concepts. The relationships between them are lost. This leads to retrieval errors that are painfully familiar to anyone who has shipped a RAG product. A search for a "red T-shirt" might return "blue and yellow jeans" because the vector for colors is muddled with the vector for clothing type. In an enterprise context, a query for an invoice from a specific person might fail because the contextual link between the sender and the attached document was severed during the chunking and embedding process. To compensate, engineers build brittle, complex classification layers and metadata filters on top of the vector search. This is a workaround, not a solution. It treats the symptom—poor retrieval quality—instead of the underlying disease: an embedding model that doesn't understand the structure of your data. Compass is designed to handle this multi-aspect data directly. Instead of feeding it a raw text chunk, you provide a JSON document that preserves the data's inherent structure. The model then creates a multi-aspect representation that can be stored in any vector database, capturing the relationships between the different concepts. For example, a traditional RAG pipeline might index an email and its PDF attachment as two separate, unrelated chunks. The crucial context—that this specific PDF was sent by a particular person at a specific time—is lost. The Compass workflow uses an SDK to parse the email and its attachments into a single, structured JSON object. This JSON is then passed to the embedding model, which generates an output that understands the document's internal relationships. This approach moves the complexity from post-retrieval filtering into the embedding model itself, where it can be handled more effectively. It allows for more precise, context-aware data retrieval without the need for manual classification layers. The workflow involves using a dedicated SDK to prepare and index your data. While the full system is in private beta, the open-source Python SDK shows the intended structure. You would use a parser client to convert your raw files into the structured JSON format, and then an index client to handle the embedding and storage. Here is a conceptual look at how you might use the Python client to index documents: python from cohere compass.clients.compass import CompassClient from cohere compass.clients.parser import CompassParserClient from cohere compass.models.config import MetadataConfig, MetadataStrategy Configuration for your Compass instance COMPASS API URL = "<YOUR COMPASS URL " PARSER API URL = "<YOUR PARSER URL " BEARER TOKEN = "<YOUR API TOKEN " 1. Use the parser client to convert raw files into structured JSON This would point to a directory of your raw PDFs, DOCX, etc. parser client = CompassParserClient parser url=PARSER API URL You can define strategies for how metadata is extracted and handled metadata config = MetadataConfig metadata strategy=MetadataStrategy.AUTO parsed docs = parser client.parse folder folder path="./path/to/your/data", metadata config=metadata config 2. Use the main client to create an index and add the parsed documents compass client = CompassClient index url=COMPASS API URL, bearer token=BEARER TOKEN index name = "enterprise-document-index" compass client.create index index name The parsed docs object contains the structured data ready for the multi-aspect embedding model. compass client.add documents index name, documents=parsed docs This structured process ensures the model receives the rich, multi-aspect context that single-vector embeddings would otherwise discard. The key takeaway is that the foundation of your RAG system—the retrieval model—deserves more attention. Simply using the largest, most powerful generative model won't fix a system that retrieves irrelevant documents. The future of enterprise AI isn't a single, monolithic model but a suite of specialized tools for specific jobs. For builders working with complex, structured data, this means evaluating and adopting embedding models that are purpose-built for that data. A model like Compass, designed for multi-aspect retrieval, can be the component that elevates a system from a proof-of-concept to a production-grade tool that delivers genuinely relevant results.