Jarvis AI Platform: Implementing Semantic Memory Retrieval with pgvector Jarvis AI Platform implemented semantic memory retrieval using pgvector and Ollama's nomic-embed-text model. The system converts user queries and stored memories into 768-dimensional embeddings, enabling cosine similarity searches to find semantically related content even when exact words don't match. This allows the AI assistant to recall context from long-term memory without explicit keyword matches. How we taught a Java AI assistant to find memories by meaning, not just keywords. In Part 2, I explained the architecture behind Jarvis AI Platform's memory system. Working Memory ✅ Phase 1 Session Memory ✅ Phase 1 Long-Term Memory 🔨 Phase 2 Semantic Memory 🔨 Phase 2 The last two layers are the most interesting. And the hardest to build. This article covers exactly how we implemented them. Imagine Jarvis stores this memory about you: User is building Jarvis AI Platform in Java Now you ask: You: How is my coding project coming along? A keyword search finds nothing. "coding project" ≠ "Jarvis AI Platform" The words don't match. But the meaning does. That's the problem semantic search solves. An embedding is a way to represent text as a list of numbers. "User is building Jarvis AI Platform" → 0.23, -0.41, 0.88, 0.12, ... https://dev.to768%20numbers "How is my coding project coming along?" → 0.21, -0.38, 0.91, 0.09, ... https://dev.to768%20numbers Texts with similar meaning produce vectors that are close together in mathematical space. Texts with different meanings produce vectors that are far apart. This allows us to find semantically related content even when the exact words don't match. We use Ollama's nomic-embed-text model. ollama pull nomic-embed-text Why this model: Runs 100% locally 768-dimensional output Fast generation ~200ms per text No API key required Excellent quality for English text Here is how everything connects. User sends: "How is my coding project?" ↓ AiOrchestrator ↓ ┌───────────────────────────────┐ │ Mono.zip ALL IN PARALLEL : │ │ 1. Session history Redis │ │ 2. Long-term memories │ ← Phase 2 │ 3. RAG document context │ ← Phase 3 └───────────────────────────────┘ ↓ EmbeddingService.embed userQuery → 0.21, -0.38, 0.91, ... ↓ pgvector cosine similarity search → "User is building Jarvis AI Platform" 0.87 similarity → "User prefers Java over Python" 0.71 similarity ↓ PromptAssembler Injects memories into prompt ↓ OllamaProvider ↓ "Your Jarvis project sounds exciting How's the memory system coming along?" The AI responds with context about your project even though you never mentioned it in this session. The first building block is generating embeddings. Spring AI provides an EmbeddingModel interface. Ollama implements it automatically when you add the starter dependency. @Slf4j @Service @RequiredArgsConstructor public class EmbeddingService { private final EmbeddingModel embeddingModel; / Generate embedding for a single text. Ollama call is blocking → boundedElastic thread. / public Mono