cd /news/ai-agents/arn-local-semantic-memory-server-for… Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-15656] src=github.com pub= topic=ai-agents verified=true sentiment=↑ positive

ARN: Local semantic memory server for AI agents (Pi 5, 22ms recall, 10/10 tests)

Developer Mohamed (MrKali) released ARN, a local semantic memory server for AI agents that runs on a Raspberry Pi 5 with 22ms recall speed and scored 10/10 in tests. The open-source tool stores agent interactions by meaning rather than keywords, allowing agents to retain context across sessions without cloud services or monthly fees. ARN uses three memory tiers modeled on human memory, eight domain-specialized cortical columns, and features like contradiction detection and temporal tagging to manage agent memory locally.

read10 min publishedMay 27, 2026

AI agents forget everything between sessions. ARN fixes that, locally, with no cloud and no monthly bill.

It runs a small server on your machine. Every time your agent talks to a user, ARN stores what happened. Next session, it pulls back what's relevant β€” not by keyword match but by meaning. Your agent picks up where it left off.

Runs on a Raspberry Pi 5. Costs $0/month. One command to set up.

Hi, I'm Mohamed (MrKali). I built this because I was tired of re-explaining context to my agents every session. It started as a side project on my Pi 5 and turned into something that actually works.

Prerequisites: Python 3.10+, Mac or Linux (including Raspberry Pi)

git clone https://github.com/tuuhe99-del/ARN-Adaptive-Reasoning-Network.git
cd ARN-Adaptive-Reasoning-Network
./arn-setup

That's it. arn-setup

installs dependencies, starts the server, installs a launchd service so it auto-starts on login (Mac), and wires the OpenClaw plugin if you're using it. No manual config.

Verify it's running:

curl http://localhost:8742/v1/health

Store and recall something:

curl -X POST http://localhost:8742/v1/memory/store \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "me", "content": "Mohamed prefers Python for scripting", "importance": 0.8}'

curl -X POST http://localhost:8742/v1/memory/recall \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "me", "query": "what language does the user code in?", "top_k": 3}'

ARN is a memory server. Your agent stores facts and events, and retrieves them by semantic similarity β€” meaning, not keyword.

Under the hood:

Three memory tiersβ€” episodic (recent specific events), semantic (repeated patterns consolidated over time), working (current session context). Loosely modeled on how human memory is structured.8 domain-specialized cortical columnsβ€” code, conversation, facts, procedures, preferences, temporal, errors, general. Each column evaluates incoming memories independently, so the system knows the difference between a code snippet and a personal preference.Calibrated surprise scoringβ€” each domain tracks its own baseline of what's "normal" using Welford's algorithm. Genuinely novel information gets prioritized.Consolidationβ€” runs as a background task. Clusters similar episodes into semantic memories over time, the way sleep-based consolidation works in humans.Contradiction detectionβ€” when new info conflicts with stored info, it flags the conflict, keeps both, and timestamps them. Doesn't silently overwrite.Temporal taggingβ€” tag episodes withtime_context='past'|'current'|'future'

. Queries with temporal keywords ("currently", "used to") filter automatically.Protected memoriesβ€” episodes stored withsource='api'

are never superseded, decayed, or evicted. Use this for ground-truth facts about a user.

Scoring formula:

score = 0.58 Γ— similarity + 0.13 Γ— recency + 0.19 Γ— importance + surprise_bonus βˆ’ supersession_penalty

Server runs on http://localhost:8742

. Auth is optional β€” set ARN_API_KEY

to require X-Api-Key

on all writes.

Method Path Auth What it does
POST
/v1/memory/store
optional Store a memory episode
POST
/v1/memory/recall
optional Retrieve relevant memories by semantic similarity
POST
/v1/memory/context
optional Get a formatted context block ready to inject into a prompt
POST
/v1/memory/exchange
required Store a full user + agent exchange in one call
POST
/v1/memory/workflow
required Store a multi-step tool workflow with results
POST
/v1/memory/inject
required Inject relevant memories directly into a prompt string
POST
/v1/memory/feedback
required Send reinforcement signal (thumbs up/down) on a recalled memory
POST
/v1/memory/embed_similarity
required Compute semantic similarity between two texts
POST
/v1/memory/link / unlink / links
required Explicit memory graph β€” link episodes together
POST
/v1/memory/maintain
required Manually trigger consolidation
POST
/v1/memory/edit
required Edit an existing episode
POST
/v1/memory/delete
required Soft-delete an episode
POST
/v1/memory/list
required List all episodes for an agent
GET
/v1/memory/stats/{agent_id}
optional Episode counts, memory tier sizes, scoring stats
GET
/v1/health
none Health check
DELETE
/v1/memory/agent
required Wipe all data for an agent
GET
/dashboard
none Browser dashboard (HTML)

Each agent_id

gets fully isolated storage. No cross-agent data leakage.

Rate limiting: token bucket, 60 req/s per IP by default.

The main integration path for OpenClaw users is the JavaScript plugin at openclaw-arn-plugin/

. This replaces OpenClaw's markdown memory files (USER.md, MEMORY.md, IDENTITY.md, etc.) with live semantic memory that learns from every interaction.

What it does automatically:

  • Before every agent turn: retrieves relevant memories and injects them into the prompt
  • After every turn: stores user messages, agent replies, tool calls, and tool results
  • Labels everything by source: user

,agent

,tool:{name}

,compaction

  • Deduplicates: won't inject the same memory twice in a session
  • Detects topic shifts: when the conversation changes subject, triggers a fresh recall pass
  • Persists session state across gateway restarts

Install:

./arn-setup --client openclaw --profile redteam  # adjust profile to match yours

Or add manually to your openclaw.json

:

{
  "plugins": {
    "entries": {
      "arn-memory": {
        "path": "/path/to/ARN-Adaptive-Reasoning-Network/openclaw-arn-plugin",
        "config": {
          "arnEndpoint": "http://localhost:8742",
          "apiKey": "your-api-key",
          "storeMessages": true,
          "storeTools": true,
          "topK": 5,
          "minScore": 0.35,
          "tokenBudget": 1500,
          "topicShiftThreshold": 0.45
        }
      }
    }
  }
}
Tier Model Disk Speed Quality
nano (default)
all-MiniLM-L6-v2 22MB ~30ms Good
small
all-mpnet-base-v2 420MB ~60ms Better
base
bge-base-en-v1.5 440MB ~80ms Best retrieval
base-e5
e5-base-v2 440MB ~80ms Alternative

Switch tiers at any time without losing memories:

./arn-switch-model base   # migrates all stored vectors, zero data loss

Set tier at startup:

export ARN_EMBEDDING_TIER=base
python3 -m uvicorn arn_v9.api.server:app --host 0.0.0.0 --port 8742

In stress tests, nano and bge-base both scored 7/7. The bigger model didn't win on any scenario. I'd use nano unless recall quality is specifically a problem for you.

Variable Default What it does
ARN_EMBEDDING_TIER
nano
Embedding model tier
ARN_DATA_DIR
~/.arn_data
Where episode databases and vectors are stored
ARN_API_KEY
(none)
If set, all write endpoints require X-Api-Key header
ARN_RATE_LIMIT_RPS
60
Max requests per second per IP
ARN_DECAY_INTERVAL_SECONDS
3600
How often the decay loop runs
ARN_CONSOLIDATE_THRESHOLD
10
Episodes needed before consolidation triggers

Files written:

~/.arn_data/{agent_id}/arn_metadata.db

β€” SQLite episode metadata~/.arn_data/{agent_id}/vectors.npy

β€” memmap vector store~/.arn_data/.model_fingerprint

β€” detects silent model swaps between restarts~/.arn_data/session_state.json

β€” OpenClaw plugin session persistence

10/10 on the OpenClaw recall battery β€” sequential tests across a real running agent session:

Test Scenario Result
T1 Identity recall (name, project) PASS
T2 Tool recall (Ollama, DeepSeek, Gemini) PASS
T3 ARN description recall PASS
T4 Language preference (Python) PASS
T5 Privacy β€” refuses to hallucinate SSN/bank info PASS
T6 Hardware recall (Mac, Pi 5, 8GB) PASS
T7 Cross-session conversation recall PASS
T8 Project recall from recent sessions PASS
T9 Workflow memory β€” store and recall tool steps PASS
T10 Dynamic recommendation from known setup PASS

7/7 on adversarial stress tests (benchmarks/stress_test.py

):

Test Result
Cross-session persistence (4 restarts + noise) PASS
Distractor resistance (5 needles in 500 haystack) PASS
Contradiction handling (most-recent-wins) PASS
Temporal reasoning (with tagging) PASS
Hallucination refusal PASS
Paraphrase robustness PASS
Scale (1K and 3K episodes, ~170ms latency) PASS
ARN-Adaptive-Reasoning-Network/
β”œβ”€β”€ arn-setup                  # One-command install
β”œβ”€β”€ arn-switch-model           # One-command model migration
β”œβ”€β”€ install.sh                 # Alternative install script
β”œβ”€β”€ arn_v9/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ embeddings.py      # Embedding engine, tier support
β”‚   β”‚   └── cognitive.py       # Memory scoring, cortical columns, consolidation
β”‚   β”œβ”€β”€ storage/
β”‚   β”‚   └── persistence.py     # SQLite + memmap, protected sources, fingerprinting
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── server.py          # FastAPI REST server, rate limiting
β”‚   β”œβ”€β”€ plugin.py              # Python API (ARNPlugin class)
β”‚   β”œβ”€β”€ scripts/
β”‚   β”‚   β”œβ”€β”€ arn_cli.py         # CLI interface
β”‚   β”‚   └── migrate_to_base_tier.py  # Vector migration tool
β”‚   β”œβ”€β”€ tests/
β”‚   β”‚   β”œβ”€β”€ check_env.py       # Pre-flight environment check
β”‚   β”‚   └── test_all.py        # Unit + semantic test suite
β”‚   └── benchmarks/
β”‚       β”œβ”€β”€ stress_test.py     # Adversarial scenarios
β”‚       └── simulate_agent.py  # 5-day agent simulation
β”œβ”€β”€ openclaw-arn-plugin/       # OpenClaw JS plugin
β”‚   β”œβ”€β”€ index.js               # Plugin logic (store + inject hooks)
β”‚   └── openclaw.plugin.json   # Plugin manifest
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ run_arn_battery.sh     # 10-test recall battery
β”‚   └── arn_agent.sh           # OpenClaw agent runner for tests
└── launchd/
    └── com.arn.server.plist   # macOS auto-start service
python
from arn_v9.plugin import ARNPlugin

with ARNPlugin(agent_id="my_agent", data_root="./memory") as p:
    p.store("User used to prefer Java",
            time_context='past', importance=0.6)
    p.store("User switched to Python last year",
            time_context='current', importance=0.8)

    results = p.recall("what does the user currently prefer?")

    for r in results:
        if r['confidence_tier'] == 'low':
            print("Not enough matching info")

Or via the lower-level class:

from arn_v9 import ARNv9

arn = ARNv9(data_dir="./my_agent_memory")
arn.perceive("Deployed on Raspberry Pi 5 with 8GB RAM", importance=0.7)
results = arn.recall("what hardware does the user run?", top_k=3)
arn.close()

I'm being upfront because I'd rather you hit these on my docs page than mid-project:

No inter-agent memory sharingβ€” eachagent_id

is isolated. If you need two agents to share knowledge, you'd have to build a sync layer on top. I haven't.Contradiction detection is a word-overlap heuristicβ€” real NLI would be better. It works for most cases but will miss semantic contradictions that don't share vocabulary.** Temporal reasoning requires explicit tagging**β€” the system can't automatically figure out that a stored fact is outdated. You have to tag it. Auto-inferring this from content is an open problem.Text onlyβ€” no images, audio, or structured data.** English-tuned by default**β€” the default models are English-only. Multilingual support means swapping toparaphrase-multilingual-MiniLM-L12-v2

or similar.workers=1 recommendedβ€” the embedding model is ~90MB per process. Running multiple workers multiplies RAM usage. For higher throughput, put a reverse proxy in front and scale horizontally with separate containers.Scoring thresholds are empirically tunedβ€” the weights work well in testing but I'm not certain they're the right defaults for every use case. If you tune them, I'd be interested in what you find.

If you're looking for somewhere to add real value:

NLI-based contradiction detectionβ€” even a small cross-encoder would beat the word-overlap heuristic** Async consolidation**β€” it already runs as a background asyncio task, but batching and priority queue improvements would help high-throughput setupsCross-agent shared semantic layerβ€” read-only organizational knowledge that multiple agents can draw on** Multilingual embedding support**β€” swap the default model, ensure the test suite covers non-English recall** LangChain / CrewAI adapters**β€” I built the OpenClaw plugin because that's what I use. Other frameworks need their own thin wrappers** Mem0/Zep comparison benchmark**β€” head-to-head on published benchmarks would make this more credible

PRs welcome. If you're unsure whether something fits, open an issue first.

PolyForm Small Business 1.0.0 β€” see LICENSE.md and COMMERCIAL.md.

Short version:

Free if you're an individual, researcher, hobbyist, or at a company with fewer than 100 people and under $1M revenuePaid license required if you're at a larger company using this commercially

If you fit the free tier, use it β€” keep the license file in your fork and you're done. If your company is over the threshold and you want to build on this, open an issue titled "Commercial licensing inquiry."

I picked this over MIT because this project took real work. If it's useful to you personally, I want you to have it free. If a corporation is making money off it, I'd like a share of that.

My name is Mohamed Mohamed (MrKali). I built this on a Raspberry Pi 5 I recovered from a corrupted SD card, using OpenClaw as my agent framework.

If you want to reach out, open an issue or reach me through the contacts on my GitHub profile. If you find bugs or have ideas, say so.

Thanks for looking at this.

β€” Mohamed

── more in #ai-agents 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/arn-local-semantic-m…] indexed:0 read:10min 2026-05-27 Β· β€”