Sawtooth – An async, multi-tiered memory framework for LLM agents

wpnews.pro

A high-performance, non-blocking hierarchical memory framework for LLM Agents.

Standard LLM memory systems (like LangChain's ConversationSummaryMemory

) process conversation history sequentially on the main application thread. Every time a user sends a message, the entire application freezes while the system waits for an LLM to generate a new historical summary. Furthermore, these summaries suffer from the "Lost in the Middle" hallucination effect, frequently deleting specific UUIDs, names, or rules to save tokens.

Sawtooth Memory eliminates this latency and data loss. It immediately stores the user's message and returns control to the application in milliseconds, off the heavy summarization to an asynchronous background worker. To prevent hallucinations, it extracts critical facts into an immutable ledger before summarizing.

For deep architectural deep-dives, comprehensive API specifications, and advanced lifecycle configurations, please refer to the official documentation:

View Detailed Architecture & API Reference (DOCUMENTATION.md)

  Standard Memory (Blocking)            Sawtooth Memory (Async)
  ──────────────────────────            ───────────────────────

  [ Application ]                       [ Application ]
         │                                     │
         ▼                                     ▼
  [ Save Context ]                      [ ContextManager ]
         │                                     │
         ▼                                     ├───────────────────┐ (Instant Return)
  [ LLM Summarizes ]                           ▼                   ▼
  (App freezes for 5-10s)               [ Next User Turn ]  [ Background Worker ]
         │                                                         │
         ▼                                                         ▼
  [ Next User Turn ]                                        [ LLM Summarizes ]

When your agent is ready to respond, Sawtooth stitches together an optimized context payload from distinct layers, ensuring critical facts are never summarized away.

    Agent Loop
        │
        ▼
┌─────────────────────┐
│   ContextManager    │
│  ┌───────────────┐  │
│  │ L0 System     │  │  immutable persona + tool schemas
│  │ L2 Archive    │  │  compressed narrative memory
│  │ L1.5 Entities │  │  exact IDs, paths, UUIDs
│  │ L1 Working    │  │  recent raw conversation
│  └───────────────┘  │
└──────────┬──────────┘
           │
           ▼
     build_prompt()
           │
           ▼
        LLM API

By moving compression to the background, Sawtooth achieves massive latency reductions on the main thread while maintaining 100% recall accuracy.

Local GPU Benchmark (NVIDIA RTX 5060 | Model: phi4-mini | 20-Message Conversation)

Performance Metric	Standard Summary Memory	Sawtooth Hierarchical	Architectural Advantage
Main Thread Latency
64.15 seconds	5.70 seconds
11.3x Faster Execution
Final Prompt Payload
506 tokens	454 tokens
10% Lower Token Cost
UUID / Fact Recall
Variable / Hallucinates	100% Retained
Guaranteed via L1.5 Ledger

For full methodology, cloud comparisons, and reproducibility steps, view our Read the Performance Benchmarks.

pip install sawtooth-memory

Optional dependencies for cloud providers:

pip install langchain-openai langchain-anthropic langchain-google-genai

Initialize the ContextManager

and let the background worker handle the heavy lifting. Sawtooth is universally compatible with local air-gapped models (Ollama) and cloud APIs.

import asyncio
from sawtooth_memory import ContextManager, ContextManagerConfig
from sawtooth_memory.config import OllamaConfig

async def main():
    config = ContextManagerConfig(
        soft_limit_tokens=1000,
        hard_limit_tokens=2000,
        ollama=OllamaConfig(base_url="http://localhost:11434", model="phi4")
    )

    async with ContextManager(system_prompt="You are a helpful assistant.", config=config) as cm:

        await cm.add_message("user", "My transaction ID is txn_998877_alpha")
        await cm.add_message("assistant", "I have noted your transaction ID.")

        prompt = cm.build_prompt()
        print(prompt)

if __name__ == "__main__":
    asyncio.run(main())

Sawtooth eliminates the "black-box" of agent memory by providing deterministic audit trails. You can query the memory system to see exactly why a fact was retained in the prompt.

trace = cm.explain_prompt()

import json
print(json.dumps(trace, indent=2))

Output:

{
  "system_prompt": "You are a helpful assistant.",
  "l2_summary_lineage": [
    "User initiated troubleshooting for router.",
    "User provided MAC address."
  ],
  "l1_5_entities": [
    {
      "key": "user_transaction_id",
      "value": "txn_998877_alpha",
      "origin": "Anchored via L1.5 explicit instruction"
    }
  ],
  "l1_active_messages": 4,
  "total_tokens": 342
}

Sawtooth provides a native SawtoothMemorySaver

adapter, acting as a drop-in checkpointer replacement for LangGraph architectures.

from langgraph.graph import StateGraph
from sawtooth_memory.integrations.langgraph import SawtoothMemorySaver

graph_builder = StateGraph(State)

memory_saver = SawtoothMemorySaver(cm)
graph = graph_builder.compile(checkpointer=memory_saver)

Phase 1: Core Architecture - L1/L2 Hierarchical Buffer

Asynchronous Background Worker

Local (Ollama) & Cloud compatibility

Phase 2: Observability & Telemetry - EventBus Subsystem

Explainability Traces

Persistent JSONL Auditing Journal

Performance Benchmarking Harness

Phase 3: Advanced Architectures (Up Next) - Multi-Agent Memory Pooling (Shared contextual state)

Semantic Vector L3 Archival Memory (RAG integration)

Redis/Postgres Adapter for Distributed Deployments

We welcome pull requests. See our CONTRIBUTING.md for guidelines on how to run the test suite and ensure code quality.

This project is licensed under the MIT License - see the LICENSE.md file for details.

source & further reading

github.com — original article

Sawtooth – An async, multi-tiered memory framework for LLM agents

Run your AI side-project on zahid.host