{"slug": "i-pointed-chrome-s-prompt-api-at-a-1-25-million-character-memoir-and-it-got-fast", "title": "I Pointed Chrome's Prompt API at a 1.25 Million Character Memoir, and It Got Interesting Fast", "summary": "Shrijith Venkatramana built Gemini Nano Book Lab, a Chrome extension that uses the browser's built-in Prompt API to answer questions about a 1.25-million-character memoir entirely on-device. The experiment exposes the underlying mechanics of Chrome's on-device language model, including how it handles context overflow and parameter adjustments when processing long-form text. By feeding Richard Wagner's *My Life* into the system, Venkatramana demonstrated that long inputs reveal concrete engineering tradeoffs that short prompts typically hide.", "body_md": "*Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.*\n\nA straightforward engineering question: what happens when you feed a long book to an on-device language model in Chrome and start adjusting the parameters?\n\nTo explore this, I built a small experiment called **Gemini Nano Book Lab**: a Chrome extension sidepanel that uses Chrome’s built-in **Prompt API** to answer questions about Richard Wagner’s *My Life*, while also exposing some of the underlying mechanics.\n\nThe response is only part of it. The experiment also captures:\n\nIf you’re an engineer interested in systems that have rough edges—and therefore teach you something—this is a useful area to explore.\n\nChrome’s Prompt API is part of the browser’s built-in AI features. Instead of sending prompts to a cloud endpoint, a web app or extension can request an on-device language model session and prompt it locally.\n\nResources:\n\nCore capabilities:\n\n`contextoverflow`\n\nThis makes it more than a simple text box—it becomes an environment for experimentation.\n\nLong inputs expose the interesting problems. Short prompts hide a lot; a paragraph‑long demo can make any model look magical. A long corpus forces concrete decisions:\n\nFor the first version, I used Project Gutenberg’s plain text of Richard Wagner’s *My Life*:\n\nThat gave a corpus of about **219,572 words** and **1,251,663 characters** in the run shown below.\n\nThe demo is a **Chrome extension sidepanel** rather than a normal web app. This was a deliberate choice. Extensions provide a more reliable built‑in AI surface in Chrome, and they allow a compact benchmark UI where controls, streamed output, and telemetry live side by side.\n\nThe extension has three tasks:\n\nThe benchmark starts simple. I didn’t begin with embeddings, vector databases, or sophisticated semantic retrieval. I wanted a baseline that is easy to reason about.\n\nThe first‑version controls are:\n\nThis provides enough surface to see the tradeoffs without making the experiment too complex.\n\nThe first question isn’t “What should I prompt?” but “Is the model available here?”\n\nHere’s the availability and session setup wrapper:\n\n``` js\nfunction getPromptApi(): PromptApi | null {\n    const maybePromptApi = (globalThis as typeof globalThis & {\n        LanguageModel?: PromptApi\n    }).LanguageModel\n    return maybePromptApi ?? null\n}\n\nexport async function inspectPromptApi(): Promise<PromptApiCapabilities> {\n    const promptApi = getPromptApi()\n\n    if (!promptApi) {\n        return {\n            supported: false,\n            availability: 'unavailable',\n            statusMessage:\n                'LanguageModel is unavailable in this browser context. Use a recent Chrome build with the Prompt API enabled.',\n            defaultTemperature: null,\n            maxTemperature: null,\n            defaultTopK: null,\n            maxTopK: null,\n        }\n    }\n\n    const availability = await promptApi.availability({\n        expectedInputs: [{ type: 'text', languages: ['en'] }],\n        expectedOutputs: [{ type: 'text', languages: ['en'] }],\n    })\n\n    return {\n        supported: true,\n        availability,\n        statusMessage:\n            availability === 'available'\n                ? 'Prompt API ready.'\n                : 'Model can be downloaded or is unavailable on this device.',\n        defaultTemperature: null,\n        maxTemperature: null,\n        defaultTopK: null,\n        maxTopK: null,\n    }\n}\n```\n\nThis may not look exciting, but it matters. One early lesson with built‑in AI is that **availability is part of your product surface**. Hardware support, model download state, and browser support determine whether your app works at all.\n\nAfter loading the book, I split it into overlapping chunks. The code tries to respect paragraph and sentence boundaries rather than slicing blindly at exactly `N`\n\ncharacters.\n\n```\nexport function buildChunks(\n    text: string,\n    chunkSize: number,\n    overlap: number,\n): CorpusChunk[] {\n    const safeChunkSize = Math.max(600, chunkSize)\n    const safeOverlap = clampOverlap(safeChunkSize, overlap)\n    const chunks: CorpusChunk[] = []\n\n    let startOffset = 0\n    let index = 0\n\n    while (startOffset < text.length) {\n        const desiredEnd = Math.min(text.length, startOffset + safeChunkSize)\n        const endOffset =\n            desiredEnd === text.length\n                ? text.length\n                : findBoundary(text, startOffset, desiredEnd)\n\n        const textSlice = text.slice(startOffset, endOffset).trim()\n\n        if (textSlice) {\n            index += 1\n            chunks.push({\n                id: `chunk-${String(index).padStart(3, '0')}`,\n                index,\n                text: textSlice,\n                startOffset,\n                endOffset,\n            })\n        }\n\n        if (endOffset >= text.length) {\n            break\n        }\n\n        startOffset = Math.max(endOffset - safeOverlap, startOffset + 1)\n    }\n\n    return chunks\n}\n```\n\nThis decision changes the system’s behavior. Small chunks improve precision but can break context apart. Large chunks preserve narrative structure but use more context budget. Overlap helps with boundaries but increases repeated text and token pressure. Engineering often comes down to choosing which kind of trade‑off you can accept.\n\nThe first retriever is lexical, not semantic. That keeps the failure modes visible. If retrieval is too smart too early, you skip an educational stage.\n\n```\nexport function rankChunks(\n    chunks: CorpusChunk[],\n    query: string,\n    maxChunks: number,\n): RankedChunk[] {\n    const queryTokens = tokenize(query)\n\n    return chunks\n        .map((chunk) => {\n            const { score, matchedTerms } = scoreChunk(chunk, queryTokens, query)\n            return {\n                ...chunk,\n                score,\n                matchedTerms,\n            }\n        })\n        .filter((chunk) => chunk.score > 0)\n        .sort((left, right) => right.score - left.score)\n        .slice(0, maxChunks)\n}\n```\n\nThis retriever scores term overlap between the question and chunk text. It is fast, explainable, and flawed—exactly what I wanted for a baseline.\n\nThe benchmark records more than whether the model answered correctly. It measures:\n\nThis is the core flow:\n\n``` js\nconst corpus = await loadWagnerCorpus()\nconst chunks = buildChunks(corpus.text, config.chunkSize, config.chunkOverlap)\nconst selectedChunks = rankChunks(chunks, query, config.retrievedChunks)\n\nconst session = await createPromptSession({\n    config,\n    onDownloadProgress(progress) {\n        downloadProgress.push(progress)\n    },\n})\n\nconst estimatedInputUsage = await measureContextUsage(session, input)\n\nconst { text, firstChunkMs } = await executePrompt({\n    session,\n    input,\n    streaming: config.streaming,\n    signal,\n    onChunk: callbacks.onChunk,\n})\n```\n\nAt this point the demo becomes less a “chatbot” and more an instrument panel.\n\nIn the run shown in the screenshots, the app reported approximately:\n\nSeveral observations stand out.\n\nLexical retrieval took **8.7 ms**. That is tiny compared to the **17.4 second** prompt time. For early‑stage RAG in the browser, this suggests a useful lesson: before over‑optimizing retrieval, understand your inference costs. In this setup, retrieval is not the bottleneck. Prompting is.\n\nThe first chunk arrived after about **7.2 seconds**. That number changes the perceived feel of the product. If the first token arrives quickly, the experience feels responsive. If it takes several seconds, users may wonder if it has hung or if they asked too much. A good benchmark should capture that moment, not just the final duration.\n\nThe run used about **3417** units of a **9216** context window. That sounds comfortable, but long‑form exploration can consume budget quickly. If you increase chunk size, overlap, or retrieved chunk count, the window fills with evidence before the model answers. That’s why the demo exposes chunk controls prominently.\n\nThe total was about **32.8 seconds**—notably higher than prompt time alone. That gap hides real product behavior: corpus loading, chunking, preparation work, model readiness, UI update overhead, and one‑time costs that don’t appear if you only look at `prompt()`\n\n. For engineers, this is an important shift: users experience the whole pipeline, not just the API call.\n\nThe Prompt API is interesting not because it’s limitless, but because its limits are visible and teach you something. Here are the main ones I encountered.\n\nYou cannot stuff an entire million‑character book into a prompt. Even when the corpus lives locally, context remains scarce. That pushes you toward retrieval, chunking, and prompt construction strategies sooner than you might expect.\n\nThe retrieved excerpts screenshot shows this clearly. Some selected chunks are relevant to the query “How does Wagner describe his early artistic ambitions?” But some are relevant mostly because they contain overlapping words like “early”, “artistic”, or “ambitions”, not because they are the best narrative evidence. That is a useful failure mode—it shows why better retrieval becomes necessary.\n\nThe Prompt API is not a universal browser primitive yet. It depends on Chrome support, device capability, model management, and the environment. Every serious app needs a plan for unsupported devices, first‑time model download, delayed readiness, and the possibility that the model is unavailable or removed.\n\nStreaming makes the wait feel more humane after generation starts, but it does not remove the wait before generation starts. A slow first‑token experience remains an issue.\n\nIn the current version, I can measure prompt timing and context usage cleanly. What I cannot claim cleanly is exact model memory consumption, the way I might with a dedicated server‑side runtime. Some metrics are authoritative; some are approximate. Good benchmarking should label the difference honestly.\n\nEven with those limits, building on a browser‑native AI surface has clear benefits. You ask the browser what is available. You create a session. You stream output. You inspect context pressure. You see download progress. You can build a real experiment around that.\n\nFor an engineer, that means you can learn about product design, retrieval systems, latency, UI feedback, and model constraints all within one project.\n\nObvious and useful extensions:\n\nThis becomes less about whether the model answered, and more about **why this configuration behaved the way it did**.\n\nThe Prompt API made me think less about “AI features” and more about systems behavior under constraints. That is why this experiment felt worth building. The model answered a question about Wagner—fine. But the more interesting outcome was watching the browser become a measurable inference environment with its own quirks, bottlenecks, and product tradeoffs.\n\nIf you are early in your engineering journey, this is the kind of project I would recommend: one that looks like a demo from a distance, but up close turns into a lesson about architecture. And that is usually where the real learning starts.\n\n*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.\n\ngit-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*\n\nAny feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.\n\n| [🇩🇰 Dansk](https://github.com/HexmosTech/git-lrc/readme/README.da.md) | [🇪🇸 Español](https://github.com/HexmosTech/git-lrc/readme/README.es.md) | [🇮🇷 Farsi](https://github.com/HexmosTech/git-lrc/readme/README.fa.md) | [🇫🇮 Suomi](https://github.com/HexmosTech/git-lrc/readme/README.fi.md) | [🇯🇵 日本語](https://github.com/HexmosTech/git-lrc/readme/README.ja.md) | [🇳🇴 Norsk](https://github.com/HexmosTech/git-lrc/readme/README.nn.md) | [🇵🇹 Português](https://github.com/HexmosTech/git-lrc/readme/README.pt.md) | [🇷🇺 Русский](https://github.com/HexmosTech/git-lrc/readme/README.ru.md) | [🇦🇱 Shqip](https://github.com/HexmosTech/git-lrc/readme/README.sq.md) | [🇨🇳 中文](https://github.com/HexmosTech/git-lrc/readme/README.zh.md) |\n\nAI agents write code fast. They also *silently remove logic*, change behavior, and introduce bugs -- without telling you. You often find out in production.\n\n** git-lrc fixes this.** It hooks into\n\n`git commit`\n\nand reviews every diff git-lrc-intro-60s.mp4See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements", "url": "https://wpnews.pro/news/i-pointed-chrome-s-prompt-api-at-a-1-25-million-character-memoir-and-it-got-fast", "canonical_source": "https://dev.to/shrsv/i-pointed-chromes-prompt-api-at-a-125-million-character-memoir-and-it-got-interesting-fast-2069", "published_at": "2026-05-29 18:36:12+00:00", "updated_at": "2026-05-29 19:12:08.850307+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "ai-tools", "ai-products", "natural-language-processing"], "entities": ["Shrijith Venkatramana", "Chrome", "Gemini Nano Book Lab", "Prompt API", "Richard Wagner", "My Life", "Project Gutenberg", "git-lrc"], "alternates": {"html": "https://wpnews.pro/news/i-pointed-chrome-s-prompt-api-at-a-1-25-million-character-memoir-and-it-got-fast", "markdown": "https://wpnews.pro/news/i-pointed-chrome-s-prompt-api-at-a-1-25-million-character-memoir-and-it-got-fast.md", "text": "https://wpnews.pro/news/i-pointed-chrome-s-prompt-api-at-a-1-25-million-character-memoir-and-it-got-fast.txt", "jsonld": "https://wpnews.pro/news/i-pointed-chrome-s-prompt-api-at-a-1-25-million-character-memoir-and-it-got-fast.jsonld"}}