# Memoria - A Local AI Reading Companion Powered by Gemma 4

> Source: <https://dev.to/santhosh2312/memoria-a-local-ai-reading-companion-powered-by-gemma-4-46l3>
> Published: 2026-05-23 13:09:21+00:00

This is a submission for the Gemma 4 Challenge: Build with Gemma 4
Reading long books can be difficult even for people who love reading.
Readers forget characters, lose track of earlier events, struggle with dense prose, or return to a book after a break and feel disconnected from the story. For readers with ADHD, memory difficulties, cognitive fatigue, or accessibility needs, this becomes even harder.
Memoria is a local AI reading companion powered by Gemma 4 that helps readers stay connected to books through spoiler-safe recaps, contextual Q&A, character memory, speaker attribution, and text simplification — all while running locally on the user’s machine.
The app combines an EPUB reader with AI-powered reading support features including:
Everything runs locally using Gemma 4 through llama.cpp, so readers do not need a paid AI subscription or constant internet access.
GitHub Repository: https://github.com/Santhoshl2312/Gemma_book_reader
Memoria uses Gemma 4 as the core local reasoning engine for the entire reading experience.
I used the Gemma 4 E2B model through a local llama.cpp OpenAI-compatible server, allowing the application to run fully offline without relying on cloud APIs.
I specifically chose Gemma 4 E2B because it was the best fit for a responsive local reading assistant.
The project needed:
Gemma 4 E2B delivered the right balance between speed and capability, making it possible to provide near real-time responses for recaps, contextual Q&A, text simplification, and chapter processing while still running locally through llama.cpp.
This was especially important because the app performs many smaller AI tasks continuously in the background while the user reads.
Gemma summarizes chapter chunks into structured summaries and key events that help readers quickly reconnect with the story.
The model updates persistent character descriptions and remembers important events tied to each character across chapters.
Gemma helps identify ambiguous dialogue speakers when rule-based systems fail.
Readers can ask questions about the story, and Gemma answers using chapter-aware retrieval that avoids future spoilers.
Selected passages can be rewritten into clearer modern English while preserving meaning and tone.
The frontend is a lightweight EPUB reader built with vanilla HTML, CSS, and JavaScript. It handles book uploads, chapter navigation, reading controls, themes, typography settings, and the AI interaction panel.
The backend is built with FastAPI and SQLite. It manages books, chapters, summaries, embeddings, character memory, retrieval, and streaming responses.
The AI stack runs fully locally using llama.cpp:
The app processes books chapter-by-chapter instead of trying to load entire novels into context at once. Intermediate artifacts like summaries, character memory, embeddings, and speaker metadata are stored and reused throughout the reading experience.
This pipeline-first design makes the system faster, more grounded, and more practical for long-form reading.
One of the biggest design goals was preventing accidental spoilers.
When a reader asks a question, Memoria retrieves only information from chapters the user has already completed. The retrieval system filters vector search results using reading progress before sending context to Gemma 4.
This allows the app to help readers remember earlier story details without revealing future events.
Full novels are too large to send directly into a local model context window. I solved this by chunking chapters into smaller sections while carrying forward rolling summaries and character memory.
Local models sometimes wrap JSON outputs in extra formatting or explanations. To make the pipeline reliable, prompts were heavily constrained and the backend extracts valid JSON blocks safely before processing.
Dialogue attribution in fiction is difficult because speakers are often implied instead of explicitly named. I used a hybrid approach where rules handle obvious cases while Gemma handles ambiguous dialogue using broader context.
The project depends on multiple services including Gemma 4, embedding models, Python environments, and vector databases. I automated the setup process using launcher scripts so the app can be started locally with minimal manual configuration.
One of the main goals of this project was accessibility and digital equity.
Readers should not need:
By combining Gemma 4 with llama.cpp and local retrieval, Memoria creates a fully local AI reading companion that respects reader privacy while remaining accessible on consumer hardware.
This makes the project useful not only for individual readers, but also for classrooms, libraries, care settings, and offline learning environments.
Memoria demonstrates how Gemma 4 can power practical, privacy-friendly accessibility tools beyond chatbots.
Instead of replacing reading, the goal is to support readers — helping them stay connected to stories, remember context, and reduce cognitive load while preserving the experience of reading itself.
By combining Gemma 4 E2B, llama.cpp, retrieval, and structured processing pipelines, Memoria turns static EPUB books into adaptive reading experiences that can run entirely offline.