We're still the only one to hit #1 on both LoCoMo and LongMemEval. Here is how to use it.

Backboard has achieved the top position on both the LoCoMo and LongMemEval benchmarks for long-term AI memory, a feat no other system has matched without modifying the original evaluation guidelines. The company attributes its success to message-level memory architecture that builds and retrieves facts during conversations, rather than relying on larger context windows that degrade over long horizons. Backboard's memory feature is available as a single parameter, `memory="Auto"`, which stores and recalls facts across conversations using the same `assistant_id`.

Backboard is 1 on LoCoMo and LongMemEval, the two academic benchmarks for long-term AI memory without changing the original guidelines. Other companies have gamed by using newer models with bigger context windows. This post explains why the result matters anyway, what it actually measures, and how to use the memory that earned it. These are not "find a fact in a wall of text" tests. They measure whether a system can build, maintain, and reason over memory across many conversations. LoCoMo Long-term Conversational Memory evaluates very long-term memory over multi-session dialogues that span weeks. It tests single-session recall, cross-session reasoning, temporal reasoning, outside knowledge, and adversarial questions. LongMemEval scores five distinct abilities: information extraction, multi-session reasoning, temporal reasoning, knowledge updates noticing when a fact about the user changes , and abstention knowing when it does not know . Its own paper reports that commercial assistants and long-context models lose around 30% accuracy on sustained memory. That last point is the whole story. A few honest notes about the result. We are still 1 on the original academic benchmarks. Other systems have since posted high numbers too, but they got there by pointing a stronger model at the problem and leaning on ever-larger context windows. At the top, everyone is near the ceiling of what these tests can even measure, so the raw number stops being interesting. What is interesting is how you got there. The difference is where the work happens. We solve memory at the message level. Memory is built as the conversation happens, fact by fact, then retrieved when relevant. We do not stuff a giant context window to paper over a memory architecture that cannot actually remember. A bigger context window is brute force, and the benchmarks already show brute force degrades on long horizons. Message-level memory is the thing the test is supposed to reward. Fixing problems with brute force isn't scalable over months or years, and it guides users to inflated token usage and higher spend. No thanks. We did not run these benchmarks ourselves. Third-party organizations did. We do not build for benchmarks and we do not tune to a leaderboard. We build the best memory product for our customers. It just happens to be the best. One more thing, and we will not name names: several of the top open-source memory projects on GitHub run on Backboard for their paid cloud offering. The thing people benchmark against us is, in some cases, us. We think that is funny. So we let the score sit quietly and we ship the product. Here is how to use it. The memory that tops these benchmarks is one parameter. Store it on the assistant with memory="Auto" , reuse the same assistant id , and facts carry across every conversation. pip install backboard-sdk python import asyncio from backboard import BackboardClient async def main : client = BackboardClient api key="YOUR API KEY" Conversation 1: a fact is extracted and stored at the message level await client.send message "My name is Sarah. I just moved from Chicago to Toronto.", assistant id="your-assistant-id", memory="Auto", Conversation 2: new thread, same assistant, memory recalled reply = await client.send message "Where do I live now?", assistant id="your-assistant-id", memory="Auto", print reply.content Toronto asyncio.run main js const send = body = fetch "https://app.backboard.io/api/threads/messages", { method: "POST", headers: { "X-API-Key": "YOUR API KEY", "Content-Type": "application/json", }, body: JSON.stringify body , } .then r = r.json ; await send { content: "My name is Sarah. I just moved from Chicago to Toronto.", assistant id: "your-assistant-id", memory: "Auto", } ; const reply = await send { content: "Where do I live now?", assistant id: "your-assistant-id", memory: "Auto", } ; console.log reply.content ; curl -X POST "https://app.backboard.io/api/threads/messages" \ -H "X-API-Key: YOUR API KEY" \ -H "Content-Type: application/json" \ -d '{"content": "My name is Sarah. I just moved from Chicago to Toronto.", "assistant id": "your-assistant-id", "memory": "Auto"}' curl -X POST "https://app.backboard.io/api/threads/messages" \ -H "X-API-Key: YOUR API KEY" \ -H "Content-Type: application/json" \ -d '{"content": "Where do I live now?", "assistant id": "your-assistant-id", "memory": "Auto"}' Each benchmark ability is just a memory mode in practice: memory="Auto" saves the new fact and supersedes the old one, no code from you. assistant id . memory="Auto" to memory pro="Auto" when precision matters more than cost. Readonly , the assistant recalls what it has and does not invent what it does not. Precision retrieval over everything the assistant knows response = await client.send message "What were my project deadlines?", assistant id="your-assistant-id", memory pro="Auto", The benchmark number says we are first. The architecture says why it will hold: memory at the message level, not a context window stretched to hide a weaker design. You do not have to take the leaderboard's word for it. Set memory="Auto" and feel the difference in your own app. Grab a key and try it: app.backboard.io https://app.backboard.io Memory docs: docs.backboard.io/concepts/memory https://docs.backboard.io/concepts/memory