{"slug": "i-built-persistent-ai-memory-for-claude-on-cloudflare-s-free-tier", "title": "I built persistent AI memory for Claude on Cloudflare's free tier", "summary": "\"second-brain-cloudflare,\" a self-hosted MCP server that provides persistent memory for AI assistants like Claude and ChatGPT across sessions, running entirely on Cloudflare's free tier. It uses vector embeddings and a tag-aware half-life scoring system to improve memory retrieval, while preventing duplicate entries and supporting real-time query streaming with a web UI. The solution requires no external databases or API keys beyond a Cloudflare account token.", "body_md": "Every Claude session starts fresh. You copy context, explain your setup, reintroduce your project, and then do it all over again the next day. I got tired of this and created a solution.\nsecond-brain-cloudflare is a self-hosted MCP server that provides Claude, ChatGPT, Cursor, and any MCP-compatible client with persistent memory across sessions. It operates entirely on Cloudflare's free tier. Here’s how it works.\nwrangler deploy\nbge-small-en-v1.5\nfor embeddings,\n@cf/meta/llama-4-scout-17b-16e-instruct\nfor web UI synthesisOne deployment. No external databases. No API keys needed beyond your Cloudflare account token.\nPure vector similarity has a drawback. A memory from three months ago can outrank something you saved yesterday if it’s semantically closer. The solution is to fetch three times more candidates than needed (topK=5 pulls 15), then score each using a tag-aware half-life:\nadjusted_score = cosine_similarity × e^(-age_in_days / half_life)\nBefore storing anything, embed the incoming content and query Vectorize for its nearest neighbor:\nduplicate-candidate\ntagWithout this step, Claude creates 20–30 nearly identical entries for the same decision.\nLong notes split at sentence ends, with a 200-character overlap. Each chunk receives its own vector. Chunk IDs are stored in D1, so forget() reliably removes all related vectors.\nQueries now support time limits:\nQueries flow through @cf/meta/llama-4-scout-17b-16e-instruct\nbefore being rendered. Answers stream in real time, with source memories that can be collapsed underneath. You’ll find Append and Forget buttons. This runs on your own Cloudflare account.\nDeploy: https://thesecondbrain.dev\nGitHub: https://github.com/rahilp/second-brain-cloudflare\nIf this was helpful, please give it a star.", "url": "https://wpnews.pro/news/i-built-persistent-ai-memory-for-claude-on-cloudflare-s-free-tier", "canonical_source": "https://dev.to/rahil_pirani_c48446facc8c/i-built-persistent-ai-memory-for-claude-on-cloudflares-free-tier-12kc", "published_at": "2026-05-20 04:45:51+00:00", "updated_at": "2026-05-20 05:02:52.228068+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools", "cloud-computing", "open-source"], "entities": ["Claude", "Cloudflare", "ChatGPT", "Cursor", "Llama"], "alternates": {"html": "https://wpnews.pro/news/i-built-persistent-ai-memory-for-claude-on-cloudflare-s-free-tier", "markdown": "https://wpnews.pro/news/i-built-persistent-ai-memory-for-claude-on-cloudflare-s-free-tier.md", "text": "https://wpnews.pro/news/i-built-persistent-ai-memory-for-claude-on-cloudflare-s-free-tier.txt", "jsonld": "https://wpnews.pro/news/i-built-persistent-ai-memory-for-claude-on-cloudflare-s-free-tier.jsonld"}}