Every Claude session starts fresh. You copy context, explain your setup, reintroduce your project, and then do it all over again the next day. I got tired of this and created a solution. second-brain-cloudflare is a self-hosted MCP server that provides Claude, ChatGPT, Cursor, and any MCP-compatible client with persistent memory across sessions. It operates entirely on Cloudflare's free tier. Here’s how it works. wrangler deploy
bge-small-en-v1.5
for embeddings,
@cf/meta/llama-4-scout-17b-16e-instruct
for web UI synthesisOne deployment. No external databases. No API keys needed beyond your Cloudflare account token.
Pure vector similarity has a drawback. A memory from three months ago can outrank something you saved yesterday if it’s semantically closer. The solution is to fetch three times more candidates than needed (topK=5 pulls 15), then score each using a tag-aware half-life: adjusted_score = cosine_similarity × e^(-age_in_days / half_life) Before storing anything, embed the incoming content and query Vectorize for its nearest neighbor: duplicate-candidate tagWithout this step, Claude creates 20–30 nearly identical entries for the same decision. Long notes split at sentence ends, with a 200-character overlap. Each chunk receives its own vector. Chunk IDs are stored in D1, so forget() reliably removes all related vectors. Queries now support time limits: Queries flow through @cf/meta/llama-4-scout-17b-16e-instruct before being rendered. Answers stream in real time, with source memories that can be collapsed underneath. You’ll find Append and Forget buttons. This runs on your own Cloudflare account.
Deploy: https://thesecondbrain.dev
GitHub: https://github.com/rahilp/second-brain-cloudflare
If this was helpful, please give it a star.