I built persistent AI memory for Claude on Cloudflare's free tier

wpnews.pro

cd /news/artificial-intelligence/i-built-persistent-ai-memory-for-cla… · home › topics › artificial-intelligence › article

[ARTICLE · art-2071] src=dev.to ↗ pub=2026-05-20T04:45Z topic=artificial-intelligence verified=true sentiment=↑ positive

I built persistent AI memory for Claude on Cloudflare's free tier

"second-brain-cloudflare," a self-hosted MCP server that provides persistent memory for AI assistants like Claude and ChatGPT across sessions, running entirely on Cloudflare's free tier. It uses vector embeddings and a tag-aware half-life scoring system to improve memory retrieval, while preventing duplicate entries and supporting real-time query streaming with a web UI. The solution requires no external databases or API keys beyond a Cloudflare account token.

read1 min views14 publishedMay 20, 2026

Every Claude session starts fresh. You copy context, explain your setup, reintroduce your project, and then do it all over again the next day. I got tired of this and created a solution. second-brain-cloudflare is a self-hosted MCP server that provides Claude, ChatGPT, Cursor, and any MCP-compatible client with persistent memory across sessions. It operates entirely on Cloudflare's free tier. Here’s how it works. wrangler deploy

bge-small-en-v1.5
for embeddings,
@cf/meta/llama-4-scout-17b-16e-instruct
for web UI synthesisOne deployment. No external databases. No API keys needed beyond your Cloudflare account token.

Pure vector similarity has a drawback. A memory from three months ago can outrank something you saved yesterday if it’s semantically closer. The solution is to fetch three times more candidates than needed (topK=5 pulls 15), then score each using a tag-aware half-life: adjusted_score = cosine_similarity × e^(-age_in_days / half_life) Before storing anything, embed the incoming content and query Vectorize for its nearest neighbor: duplicate-candidate tagWithout this step, Claude creates 20–30 nearly identical entries for the same decision. Long notes split at sentence ends, with a 200-character overlap. Each chunk receives its own vector. Chunk IDs are stored in D1, so forget() reliably removes all related vectors. Queries now support time limits: Queries flow through @cf/meta/llama-4-scout-17b-16e-instruct before being rendered. Answers stream in real time, with source memories that can be collapsed underneath. You’ll find Append and Forget buttons. This runs on your own Cloudflare account.

Deploy: https://thesecondbrain.dev
GitHub: https://github.com/rahilp/second-brain-cloudflare
If this was helpful, please give it a star.

source & further reading

dev.to — original article Tired of Refreshing? Building a Smart Medical Appointment Agent with Playwright and LLM Function Calling 🏥🤖 Evidence First, Answer Second: Building an Observable Industrial AI Agent with SigNoz Observability should be a `git diff`, not a weekend: instrumenting an AI app with one command using SigNoz

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-persistent-ai-me…

Read original on dev.to → dev.to/rahil_pirani_c48446facc8c/i-built-persist…

mentioned entities

Claude

Cloudflare

ChatGPT

Cursor

Llama

metadata

slugi-built-persistent-ai-memory-for-claude-on-cloudflare-s-free-tier

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevCare Compass: Pairing Gemma 4 Wi…

next →Anthropic Launches Self-Hosted C…

── more in #artificial-intelligence 4 stories · sorted by recency

byteiota.com · 26 Jul · #artificial-intelligence

Claude 5 Context Engineering: Anthropic Deleted 80% Prompt

promptcube3.com · 26 Jul · #artificial-intelligence

Decoy Fonts: Bypassing Claude's Vision

pgsync.com · 26 Jul · #artificial-intelligence

PGSync adds semantic search without data leaving your infrastructure

promptcube3.com · 26 Jul · #artificial-intelligence

Claude Code: Mastering Context Engineering

── more on @claude 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 24 Jul · #artificial-intelligence

AI Workflow Skills: What Actually Lasts Until 2030

wpnews · 30 Jun · #ai-infrastructure

MongoDB embeds reranking into Atlas as enterprises look to simplify AI stacks for scale

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required