{"slug": "adding-session-memory-without-building-a-preference-engine", "title": "Adding Session Memory Without Building a Preference Engine", "summary": "A developer added short-term conversational memory to Kino, an educational distributed systems project, enabling the LangGraph-based agent to carry forward search constraints and excluded titles across follow-up turns without building a long-term preference engine. The feature allows users to refine movie searches—such as requesting different results after rejecting initial picks—by maintaining structured context within the current thread rather than saving user preferences. Testing with Google's Gemini 3.1 Flash Lite model confirmed the lightweight architecture could reliably handle follow-up turns, reinforcing the design's trustworthiness.", "body_md": "I just added another feature to\n[ Kino](https://github.com/gryphon2411/kino/), my educational, cinema-themed distributed systems project. The\nLangGraph-based agent service is one part of that platform.\n\nThis time the feature was **memory**.\n\nBut not the vague, hyped kind.\n\nI did not want Kino to pretend it knows a user's taste forever.\n\nI wanted something much narrower and much more useful:\n\nShort-term conversational memory for follow-up turns that helps the agent continue the current search without pretending to know more than it actually does\n\nThat decision mattered a lot.\n\nIt kept the feature small enough to trust, but still visible enough to feel like real agent behavior.\n\nThe new capability is simple to describe.\n\nA user can start with:\n\n```\nDiscover exactly 3 comedy movies from 2010 onward from Kino's\ncatalog.\n```\n\nThen follow up with:\n\n`I didn't like them, please discover different ones.`\n\nAnd Kino can continue from the current thread instead of treating the second turn like a brand-new request.\n\nThat means it carries forward the latest search constraints, such as:\n\nAnd when the user explicitly rejects the previous picks, it excludes those exact title IDs from the next search.\n\nThe result is not “AI memory” in the broad sense.\n\nIt is **short-term conversational memory** used for follow-up\nturns.\n\nIn plain English, the memory works like this:\n\nThat is enough to support useful follow-ups like:\n\n`I didn't like them, please discover different ones.`\n\n`Movie only.`\n\n`Make it older.`\n\nThe two turns look like this:\n\n**Turn 1**\n\n``` php\nflowchart LR\n    U1[Turn 1 request] --> L1[LLM]\n    L1 --> S1[Catalog search]\n    S1 --> R1[Grounded titles]\n```\n\n**Turn 2**\n\n``` php\nflowchart LR\n    U2[Turn 2 follow-up] --> C[Short-term conversation context]\n    R1[Turn 1 titles] --> C\n    C --> L2[LLM]\n    L2 --> S2[Search + exclusions]\n    S2 --> R2[Different grounded titles]\n```\n\nWithout turning the system into a long-term preference engine.\n\nThat distinction is important.\n\nI did **not** want the project to jump straight into:\n\nKino is still a grounded discovery system that uses a structured catalog.\n\nThe memory feature had to match that architecture.\n\nIn Kino, that short-term memory stays inside the current thread instead of being saved as long-term preference memory.\n\nThe best way to explain the feature is to show the actual flow.\n\nThe first turn asks for:\n\n```\nDiscover exactly 3 comedy movies from 2010 onward from Kino's\ncatalog.\n```\n\nKino returns grounded titles such as:\n\n`The Wandering Soap Opera`\n\n`A Thin Life`\n\n`Joe Finds Grace`\n\nAnd the structured response now exposes an `activeContext`\n\nobject\nthat shows the effective search context used for that turn.\n\nThat context looked like this:\n\n`genres=[\"Comedy\"]`\n\n`titleType=\"movie\"`\n\n`minYear=2010`\n\n`excludedTitleIds=[]`\n\nThen the user says:\n\n`I didn't like them, please discover different ones.`\n\nThat is where the new behavior becomes useful.\n\nKino carries forward the current discovery context, increases the search\nwindow, and passes `exclude_ids`\n\nwith the titles it already showed.\n\nSo the second search still uses the same structured constraints, but now it looks for fresh candidates inside the same limited search scope.\n\nThe follow-up results included:\n\n`Blood Type`\n\n`Foodfight!`\n\n`Return to Babylon`\n\nThat is a much better user experience than forcing the user to restate the entire request every turn.\n\nThis feature also reinforced something practical for me: the quality did not come only from the model.\n\nI tested this flow with Gemini 3.1 Flash Lite, which Google positioned in March 2026 as its fastest and most cost-efficient Gemini 3 series model for high-volume workloads. That fit this feature well.\n\nIt gave me a few clear advantages:\n\nMore importantly, it was a useful architectural signal.\n\nIf this feature only worked on a much larger model, I would trust the design less. But when a lighter model can handle the follow-up turn, carry forward the right search constraints, and trigger the right tool call, it usually means the system design is doing more of the real work.\n\nThere is a strong temptation to call any multi-turn behavior “memory” and stop there.\n\nI think that is where a lot of agent features become unclear.\n\nUseful memory is not just about remembering more.\n\nIt is about remembering the **right thing at the right scope**.\n\nFor Kino, that scope is the current discovery thread.\n\nThat is why this feature is:\n\nAnd not:\n\nI would rather have a smaller memory feature that is easy to reason about than a bigger one that sounds impressive but is hard to trust.\n\nThe implementation only became useful after I enforced a few correctness rules.\n\nIf a follow-up turn is the newest turn, it has to be the one that counts.\n\nThat includes cases where the newest search returns fewer results, no results, or even an upstream error.\n\nIf Kino reused older successful results after a newer follow-up turn, the memory would feel fake.\n\nIt was not enough to filter repeated titles out at the very end.\n\nThe agent had to keep paging through the grounded catalog results until it found unseen candidates or ran out of pages.\n\nOtherwise the result window could still get burned on already seen titles.\n\nIf a user asks a no-tool question like “How does this work?”, the structured response should not pretend that a search failed.\n\nThat sounds obvious, but it is exactly the kind of detail that makes an agent feel either solid or sloppy.\n\n`activeContext`\n\nmatters\nOne of the most useful additions was exposing `activeContext`\n\nin\nthe structured response.\n\nThat object is not a magic memory profile.\n\nIt is a simple, inspectable view of the search context Kino used for the current turn.\n\nThat makes the feature easier to debug, easier to demo, and easier to explain.\n\nIt also keeps the project honest.\n\nInstead of saying “the agent remembers,” I can show exactly what the current turn carried forward:\n\nThat is much better than hiding the memory behavior behind vague claims.\n\nThis is useful memory, but it is still intentionally limited.\n\nIt is not:\n\nAnd that is fine.\n\nFor this project, the right first memory feature was not “remember everything.”\n\nIt was “help the next turn make sense.”\n\nThat is a much stronger starting point.\n\nIf you are adding memory to a small agent, I think there is a good lesson here:\n\n**Start with follow-up memory before you start talking about\npreferences**\n\nShort-term memory in one session is often enough to make the system feel much more capable.\n\nAnd if it is grounded, inspectable, and narrow in scope, it is also much easier to trust.\n\nThat is where Kino is now.\n\nNot a giant memory system.\n\nJust a better second turn.", "url": "https://wpnews.pro/news/adding-session-memory-without-building-a-preference-engine", "canonical_source": "https://eido-askayo.blogspot.com/2026/05/adding-session-memory-without-building.html", "published_at": "2026-05-15 11:33:18+00:00", "updated_at": "2026-06-04 13:18:25.084445+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "artificial-intelligence"], "entities": ["Kino", "LangGraph"], "alternates": {"html": "https://wpnews.pro/news/adding-session-memory-without-building-a-preference-engine", "markdown": "https://wpnews.pro/news/adding-session-memory-without-building-a-preference-engine.md", "text": "https://wpnews.pro/news/adding-session-memory-without-building-a-preference-engine.txt", "jsonld": "https://wpnews.pro/news/adding-session-memory-without-building-a-preference-engine.jsonld"}}