{"slug": "sawtooth-an-async-multi-tiered-memory-framework-for-llm-agents", "title": "Sawtooth – An async, multi-tiered memory framework for LLM agents", "summary": "Sawtooth Memory, a new async hierarchical memory framework for LLM agents, eliminates the blocking latency and data loss of standard memory systems by offloading summarization to background workers. The framework stores user messages in milliseconds and preserves critical facts like UUIDs and names in an immutable ledger, preventing the \"Lost in the Middle\" hallucination effect. Benchmarks show an 11.3x reduction in main thread latency and 100% fact recall accuracy compared to traditional summary memory.", "body_md": "**A high-performance, non-blocking hierarchical memory framework for LLM Agents.**\n\nStandard LLM memory systems (like LangChain's `ConversationSummaryMemory`\n\n) process conversation history sequentially on the main application thread. Every time a user sends a message, the entire application freezes while the system waits for an LLM to generate a new historical summary. Furthermore, these summaries suffer from the \"Lost in the Middle\" hallucination effect, frequently deleting specific UUIDs, names, or rules to save tokens.\n\n**Sawtooth Memory** eliminates this latency and data loss. It immediately stores the user's message and returns control to the application in milliseconds, offloading the heavy summarization to an asynchronous background worker. To prevent hallucinations, it extracts critical facts into an immutable ledger before summarizing.\n\nFor deep architectural deep-dives, comprehensive API specifications, and advanced lifecycle configurations, please refer to the official documentation:\n\n[View Detailed Architecture & API Reference (DOCUMENTATION.md)](/HtooTayZa/sawtooth-memory/blob/main/DOCUMENTATION.md)\n\n```\n  Standard Memory (Blocking)            Sawtooth Memory (Async)\n  ──────────────────────────            ───────────────────────\n\n  [ Application ]                       [ Application ]\n         │                                     │\n         ▼                                     ▼\n  [ Save Context ]                      [ ContextManager ]\n         │                                     │\n         ▼                                     ├───────────────────┐ (Instant Return)\n  [ LLM Summarizes ]                           ▼                   ▼\n  (App freezes for 5-10s)               [ Next User Turn ]  [ Background Worker ]\n         │                                                         │\n         ▼                                                         ▼\n  [ Next User Turn ]                                        [ LLM Summarizes ]\n```\n\nWhen your agent is ready to respond, Sawtooth stitches together an optimized context payload from distinct layers, ensuring critical facts are never summarized away.\n\n```\n    Agent Loop\n        │\n        ▼\n┌─────────────────────┐\n│   ContextManager    │\n│  ┌───────────────┐  │\n│  │ L0 System     │  │  immutable persona + tool schemas\n│  │ L2 Archive    │  │  compressed narrative memory\n│  │ L1.5 Entities │  │  exact IDs, paths, UUIDs\n│  │ L1 Working    │  │  recent raw conversation\n│  └───────────────┘  │\n└──────────┬──────────┘\n           │\n           ▼\n     build_prompt()\n           │\n           ▼\n        LLM API\n```\n\nBy moving compression to the background, Sawtooth achieves massive latency reductions on the main thread while maintaining 100% recall accuracy.\n\n**Local GPU Benchmark (NVIDIA RTX 5060 | Model: phi4-mini | 20-Message Conversation)**\n\n| Performance Metric | Standard Summary Memory | Sawtooth Hierarchical | Architectural Advantage |\n|---|---|---|---|\nMain Thread Latency |\n64.15 seconds | 5.70 seconds |\n11.3x Faster Execution |\nFinal Prompt Payload |\n506 tokens | 454 tokens |\n10% Lower Token Cost |\nUUID / Fact Recall |\nVariable / Hallucinates | 100% Retained |\nGuaranteed via L1.5 Ledger |\n\nFor full methodology, cloud comparisons, and reproducibility steps, view our [Read the Performance Benchmarks](/HtooTayZa/sawtooth-memory/blob/main/BENCHMARKS.md).\n\n```\npip install sawtooth-memory\n```\n\n*Optional dependencies for cloud providers:*\n\n```\npip install langchain-openai langchain-anthropic langchain-google-genai\n```\n\nInitialize the `ContextManager`\n\nand let the background worker handle the heavy lifting. Sawtooth is universally compatible with local air-gapped models (Ollama) and cloud APIs.\n\n``` python\nimport asyncio\nfrom sawtooth_memory import ContextManager, ContextManagerConfig\nfrom sawtooth_memory.config import OllamaConfig\n\nasync def main():\n    config = ContextManagerConfig(\n        soft_limit_tokens=1000,\n        hard_limit_tokens=2000,\n        ollama=OllamaConfig(base_url=\"http://localhost:11434\", model=\"phi4\")\n    )\n\n    async with ContextManager(system_prompt=\"You are a helpful assistant.\", config=config) as cm:\n\n        # 1. Instantly ingest messages (Main thread is never blocked)\n        await cm.add_message(\"user\", \"My transaction ID is txn_998877_alpha\")\n        await cm.add_message(\"assistant\", \"I have noted your transaction ID.\")\n\n        # 2. Build the optimized prompt to send to your main LLM\n        prompt = cm.build_prompt()\n        print(prompt)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nSawtooth eliminates the \"black-box\" of agent memory by providing deterministic audit trails. You can query the memory system to see exactly why a fact was retained in the prompt.\n\n``` python\ntrace = cm.explain_prompt()\n\nimport json\nprint(json.dumps(trace, indent=2))\n```\n\n**Output:**\n\n```\n{\n  \"system_prompt\": \"You are a helpful assistant.\",\n  \"l2_summary_lineage\": [\n    \"User initiated troubleshooting for router.\",\n    \"User provided MAC address.\"\n  ],\n  \"l1_5_entities\": [\n    {\n      \"key\": \"user_transaction_id\",\n      \"value\": \"txn_998877_alpha\",\n      \"origin\": \"Anchored via L1.5 explicit instruction\"\n    }\n  ],\n  \"l1_active_messages\": 4,\n  \"total_tokens\": 342\n}\n```\n\nSawtooth provides a native `SawtoothMemorySaver`\n\nadapter, acting as a drop-in checkpointer replacement for LangGraph architectures.\n\n``` python\nfrom langgraph.graph import StateGraph\nfrom sawtooth_memory.integrations.langgraph import SawtoothMemorySaver\n\ngraph_builder = StateGraph(State)\n# ... add nodes and edges ...\n\nmemory_saver = SawtoothMemorySaver(cm)\ngraph = graph_builder.compile(checkpointer=memory_saver)\n```\n\n-\n**Phase 1: Core Architecture** -\nL1/L2 Hierarchical Buffer\n\n-\nAsynchronous Background Worker\n\n-\nLocal (Ollama) & Cloud compatibility\n\n-\n**Phase 2: Observability & Telemetry** -\nEventBus Subsystem\n\n-\nExplainability Traces\n\n-\nPersistent JSONL Auditing Journal\n\n-\nPerformance Benchmarking Harness\n\n-\n**Phase 3: Advanced Architectures (Up Next)** -\nMulti-Agent Memory Pooling (Shared contextual state)\n\n-\nSemantic Vector L3 Archival Memory (RAG integration)\n\n-\nRedis/Postgres Adapter for Distributed Deployments\n\nWe welcome pull requests. See our [CONTRIBUTING.md](https://www.google.com/search?q=CONTRIBUTING.md) for guidelines on how to run the test suite and ensure code quality.\n\nThis project is licensed under the MIT License - see the [LICENSE.md](/HtooTayZa/sawtooth-memory/blob/main/LICENSE.md) file for details.", "url": "https://wpnews.pro/news/sawtooth-an-async-multi-tiered-memory-framework-for-llm-agents", "canonical_source": "https://github.com/HtooTayZa/sawtooth-memory", "published_at": "2026-06-06 08:53:03+00:00", "updated_at": "2026-06-06 09:18:07.158594+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-tools", "ai-infrastructure", "ai-research"], "entities": ["LangChain", "Sawtooth Memory"], "alternates": {"html": "https://wpnews.pro/news/sawtooth-an-async-multi-tiered-memory-framework-for-llm-agents", "markdown": "https://wpnews.pro/news/sawtooth-an-async-multi-tiered-memory-framework-for-llm-agents.md", "text": "https://wpnews.pro/news/sawtooth-an-async-multi-tiered-memory-framework-for-llm-agents.txt", "jsonld": "https://wpnews.pro/news/sawtooth-an-async-multi-tiered-memory-framework-for-llm-agents.jsonld"}}