Adding Session Memory Without Building a Preference Engine

wpnews.pro

I just added another feature to Kino, my educational, cinema-themed distributed systems project. The LangGraph-based agent service is one part of that platform.

This time the feature was memory.

But not the vague, hyped kind.

I did not want Kino to pretend it knows a user's taste forever.

I wanted something much narrower and much more useful:

Short-term conversational memory for follow-up turns that helps the agent continue the current search without pretending to know more than it actually does

That decision mattered a lot.

It kept the feature small enough to trust, but still visible enough to feel like real agent behavior.

The new capability is simple to describe.

A user can start with:

Discover exactly 3 comedy movies from 2010 onward from Kino's
catalog.

Then follow up with:

I didn't like them, please discover different ones.

And Kino can continue from the current thread instead of treating the second turn like a brand-new request.

That means it carries forward the latest search constraints, such as:

And when the user explicitly rejects the previous picks, it excludes those exact title IDs from the next search.

The result is not “AI memory” in the broad sense.

It is short-term conversational memory used for follow-up turns.

In plain English, the memory works like this:

That is enough to support useful follow-ups like:

I didn't like them, please discover different ones.

Movie only.

Make it older.

The two turns look like this:

Turn 1

flowchart LR
    U1[Turn 1 request] --> L1[LLM]
    L1 --> S1[Catalog search]
    S1 --> R1[Grounded titles]

Turn 2

flowchart LR
    U2[Turn 2 follow-up] --> C[Short-term conversation context]
    R1[Turn 1 titles] --> C
    C --> L2[LLM]
    L2 --> S2[Search + exclusions]
    S2 --> R2[Different grounded titles]

Without turning the system into a long-term preference engine.

That distinction is important.

I did not want the project to jump straight into:

Kino is still a grounded discovery system that uses a structured catalog.

The memory feature had to match that architecture.

In Kino, that short-term memory stays inside the current thread instead of being saved as long-term preference memory.

The best way to explain the feature is to show the actual flow.

The first turn asks for:

Discover exactly 3 comedy movies from 2010 onward from Kino's
catalog.

Kino returns grounded titles such as:

The Wandering Soap Opera

A Thin Life

Joe Finds Grace

And the structured response now exposes an activeContext

object that shows the effective search context used for that turn.

That context looked like this:

genres=["Comedy"]

titleType="movie"

minYear=2010

excludedTitleIds=[]

Then the user says:

I didn't like them, please discover different ones.

That is where the new behavior becomes useful.

Kino carries forward the current discovery context, increases the search window, and passes exclude_ids

with the titles it already showed.

So the second search still uses the same structured constraints, but now it looks for fresh candidates inside the same limited search scope.

The follow-up results included:

Blood Type

Foodfight!

Return to Babylon

That is a much better user experience than forcing the user to restate the entire request every turn.

This feature also reinforced something practical for me: the quality did not come only from the model.

I tested this flow with Gemini 3.1 Flash Lite, which Google positioned in March 2026 as its fastest and most cost-efficient Gemini 3 series model for high-volume workloads. That fit this feature well.

It gave me a few clear advantages:

More importantly, it was a useful architectural signal.

If this feature only worked on a much larger model, I would trust the design less. But when a lighter model can handle the follow-up turn, carry forward the right search constraints, and trigger the right tool call, it usually means the system design is doing more of the real work.

There is a strong temptation to call any multi-turn behavior “memory” and stop there.

I think that is where a lot of agent features become unclear.

Useful memory is not just about remembering more.

It is about remembering the right thing at the right scope.

For Kino, that scope is the current discovery thread.

That is why this feature is:

And not:

I would rather have a smaller memory feature that is easy to reason about than a bigger one that sounds impressive but is hard to trust.

The implementation only became useful after I enforced a few correctness rules.

If a follow-up turn is the newest turn, it has to be the one that counts.

That includes cases where the newest search returns fewer results, no results, or even an upstream error.

If Kino reused older successful results after a newer follow-up turn, the memory would feel fake.

It was not enough to filter repeated titles out at the very end.

The agent had to keep paging through the grounded catalog results until it found unseen candidates or ran out of pages.

Otherwise the result window could still get burned on already seen titles.

If a user asks a no-tool question like “How does this work?”, the structured response should not pretend that a search failed.

That sounds obvious, but it is exactly the kind of detail that makes an agent feel either solid or sloppy.

activeContext

matters One of the most useful additions was exposing activeContext

in the structured response.

That object is not a magic memory profile.

It is a simple, inspectable view of the search context Kino used for the current turn.

That makes the feature easier to debug, easier to demo, and easier to explain.

It also keeps the project honest.

Instead of saying “the agent remembers,” I can show exactly what the current turn carried forward:

That is much better than hiding the memory behavior behind vague claims.

This is useful memory, but it is still intentionally limited.

It is not:

And that is fine.

For this project, the right first memory feature was not “remember everything.”

It was “help the next turn make sense.”

That is a much stronger starting point.

If you are adding memory to a small agent, I think there is a good lesson here:

Start with follow-up memory before you start talking about preferences

Short-term memory in one session is often enough to make the system feel much more capable.

And if it is grounded, inspectable, and narrow in scope, it is also much easier to trust.

That is where Kino is now.

Not a giant memory system.

Just a better second turn.

source & further reading

eido-askayo.blogspot.com — original article Claude Opus 5: What Happens When an AI Agent Can Keep Working Longer Than You Can? Does Claude Have a “Subconscious”? Anthropic Found a Limited Window Into Its Silent Reasoning. I Built PA-Trace: An On-Device MedGemma Workflow for Prior Authorization

Adding Session Memory Without Building a Preference Engine

Run your AI side-project on zahid.host