MCP + RAG: Why I Stopped Building Complex RAG Systems After MCP Changed Everything

wpnews.pro

Honestly, I've spent the last four years building increasingly complex RAG systems. Chunking strategies, embedding models, vector databases, rerankers, hybrid search... you name it, I've probably wasted a weekend trying it.

I had this 1,800-hour knowledge base project called Papers — six years of notes, articles, bookmarks, everything. I built RAG version after RAG version, each time thinking "this time it'll be perfect."

Spoiler: It never was.

Then I added MCP (Model Context Protocol) support. And I realized something that completely changed how I think about knowledge retrieval: MCP makes traditional complex RAG obsolete for most use cases.

Let me explain what I learned the hard way.

If you've built a RAG system, you know the drill:

I went through every iteration. At one point, my RAG system was over 2,000 lines of code. I had configurable chunkers, multiple embedding providers, caching layers, hybrid search... it was impressive. It also didn't work that well.

Here's what bothered me the most: I kept throwing more complexity at the problem, but the fundamental issue never went away. I was trying to make my knowledge base smart, but AI already got smart.

Why was I reimplementing all this understanding logic when the AI can already do it better than me?

When I added MCP support to Papers, I started with the simplest possible approach:

search_notes

and get_note_content

string.contains()

)That's it. 150 lines of code. That's the entire MCP server. Compare that to 2,000 lines of complex RAG.

At first, I thought this was just a stepping stone. I figured I'd gradually add all the fancy RAG stuff back in. But... I never did. Because it works better this way.

Wait, what? How can simple text matching beat a sophisticated RAG system?

Let me show you the actual code so you can see how simple this really is:

@Service
public class SimpleMcpKnowledgeService {
    private final NoteRepository noteRepository;

    public McpSearchResponse searchNotes(String query, int maxResults) {
        // Yes, this is really it. Simple text search.
        List<Note> matching = noteRepository.findAll().stream()
            .filter(note -> 
                note.getTitle().toLowerCase().contains(query.toLowerCase()) ||
                note.getContent().toLowerCase().contains(query.toLowerCase()))
            .limit(maxResults)
            .toList();

        return McpSearchResponse.builder()
            .totalMatches(matching.size())
            .notes(matching.stream()
                .map(n -> McpNoteSummary.builder()
                    .id(n.getId())
                    .title(n.getTitle())
                    .preview(getPreview(n, query))
                    .createdAt(n.getCreatedAt())
                    .build())
                .toList())
            .build();
    }

    public String getNoteFullContent(String noteId) {
        return noteRepository.findById(noteId)
            .map(Note::getContent)
            .orElse("Note not found with id: " + noteId);
    }
}

And the MCP controller:

@RestController
@RequestMapping("/mcp")
@CrossOrigin(origins = "*", allowedHeaders = "*")
public class McpController {
    private final SimpleMcpKnowledgeService knowledgeService;

    @PostMapping("/tools/call")
    public ResponseEntity<McpCallResponse> callTool(
            @RequestBody McpCallRequest request,
            @RequestHeader(value = "X-API-Key", required = false) String apiKey) {

        if (!authService.validateKey(apiKey)) {
            return ResponseEntity.status(HttpStatus.UNAUTHORIZED).build();
        }

        ToolCall call = request.getCall();
        if ("search_notes".equals(call.getName())) {
            String query = call.getParameters().get("query").asText();
            int maxResults = call.getParameters().has("max_results") 
                ? call.getParameters().get("max_results").asInt() 
                : 10;

            McpSearchResponse result = knowledgeService.searchNotes(query, maxResults);
            return ResponseEntity.ok(McpCallResponse.success(result));
        }

        if ("get_note_content".equals(call.getName())) {
            String noteId = call.getParameters().get("note_id").asText();
            String content = knowledgeService.getNoteFullContent(noteId);
            return ResponseEntity.ok(McpCallResponse.success(content));
        }

        return ResponseEntity.badRequest().build();
    }
}

That's the core of it. Two tools. Simple search. Full notes instead of chunks. That's it.

So here's the thing I didn't expect: this approach beats my fancy RAG system 9 times out of 10. Here's why:

Traditional RAG chunks your content into small pieces before retrieval. But why chunk when AI can read the whole note and figure out what's relevant?

With MCP:

The AI is better at deciding what's relevant than your pre-chunking ever was. It knows what it's looking for.

Ever had RAG give you a chunk that's just half an answer, and the rest is in the next chunk that didn't get retrieved? It's so frustrating.

With full notes, the AI gets the complete context. It can see how ideas connect. It doesn't miss the other half of the explanation because your chunker split it in the wrong place.

Let's be honest — how many times have you debugged:

With simple text search, what can go wrong? If the keyword exists, it matches. If it doesn't, it doesn't. Predictable. No weird embedding drift. No dimension mismatches. Nothing.

Wait, doesn't this require multiple tool calls? Yes. But modern AI clients handle that automatically. And tokens are cheap.

Compare:

Same token budget, better result because the AI is in control.

Don't get me wrong — I'm not saying RAG is completely useless. There are still cases where it makes sense:

Scenario	Still Need RAG?	Why
100k+ large documents	✅ Yes	You can't load full documents every time, context window won't fit
Production high-scale	✅ Yes	Multiple round trips cost more latency
Semantic search is critical	❌ Not really	AI can do the understanding if you give it the candidates
Your notes are all 10k+ words	❌ It depends	Even 10k words is fine, AI can skip what it doesn't need
You need to monetize	✅ Yes	Complexity is a feature for fundraising

Honestly, for most personal knowledge bases, side projects, and even internal company tools — you don't need complex RAG. MCP changes everything.

My knowledge base has about 2,800 notes with ~3 million words total. Simple text search works fine. MCP gives the AI the access it needs, and that's all you really need.

I promised I'd keep this real, not just marketing fluff. Here's the actual breakdown:

Let me show you a concrete example. Six months ago, I asked my fancy RAG system:

"What did I write about MCP server error handling?"

The RAG system:

Today, with MCP:

The extra 10 seconds is worth it for getting the complete right answer instead of a half-wrong fast answer.

Another example: I was looking for something I wrote about Spring Boot CORS configuration. My old RAG chunked it across three chunks. None got retrieved because the chunk that had the keywords didn't have the actual solution. With MCP, the search finds the note by title, AI reads the whole thing, done.

If you already have a knowledge base, you can add MCP support in an afternoon. Here's the step-by-step:

Create a search_notes

tool that takes a query and returns matching note titles + previews. Keep it simple. Start with text matching.

Create a get_note_content

tool that returns the complete markdown of any note by ID.

Support multiple authentication methods (as I learned the hard way here).

Make sure OPTIONS requests work correctly and don't require auth (I wrote about that too).

That's actually it. You don't need anything else. Connect it to your favorite MCP client and start using it.

If you want to see the complete working example, check out the Papers GitHub repo — everything is there, including the full MCP server implementation.

Wait, what about vector databases? Do I still use one?

Honestly, I still have it in my codebase. I just don't use it. The simple approach works better for my use case. Maybe I'll bring it back if I ever get 100,000 notes. But for now, simple text search is perfect.

And even if you do want vector search, with MCP you can still add it as an option. The beauty is that you can incrementally add complexity — you don't need it day one. Start simple, add it later if you actually need it.

MCP isn't just another protocol. It's a paradigm shift in how we think about integrating AI with our data.

Before MCP:

After MCP:

I spent six years over-engineering my knowledge base. After MCP, I deleted 1,850 lines of complex RAG code. The system works better now with 150 lines.

That's the power of MCP. It lets you go back to simple.

Are you still building complex RAG systems? Have you tried MCP for your knowledge base? I'd love to hear about your experience in the comments.

Did you find that simpler architectures work better with MCP, or are you still doing full complex RAG inside your MCP server? Let me know!

(Full code for the simple MCP knowledge server is available on GitHub if you want to fork it or steal the code for your own project.)

source & further reading

dev.to — original article Jarvis AI Platform: Implementing Semantic Memory Retrieval with pgvector MCP Logging: What I Wish I Knew Before Deploying My Production MCP Server (3 Weeks of Production Pain) Pydantic passed. Types matched. The downstream system still got garbage.

MCP + RAG: Why I Stopped Building Complex RAG Systems After MCP Changed Everything

Run your AI side-project on zahid.host