{"slug": "gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b", "title": "gemma4-safe-agent: a tool-using research agent on Gemma 4 e2b", "summary": "Tool-using research agent built for the Gemma 4 DEV Challenge, which runs locally on the Gemma 4 e2b model via Ollama using roughly 200 lines of Node.js code. The agent accepts a question, selects between two tools to read a Wikipedia page, and returns a structured JSON answer with sources, requiring only 2 GB of RAM and no API key. The author notes that while the small Gemma 4 e2b model performs well at tool selection, the final answer step requires a \"cast\" loop to reliably produce clean JSON output.", "body_md": "Submission for the\n\n[Gemma 4 DEV Challenge], Build track. Companion to my Write-track post on the[five libs behind it].\n\n## What it is\n\nA tool-using research agent that runs locally on **Gemma 4 e2b** via Ollama, in around 200 lines of Node.\n\nYou give it a question. It picks between two tools, reads a Wikipedia page, then returns a structured JSON answer with sources. No API key. No rate limit. Two GB of RAM and an Ollama instance is the whole stack.\n\n```\nollama pull gemma4:e2b\ngit clone https://github.com/MukundaKatta/gemma4-safe-agent\ncd gemma4-safe-agent && npm install\nnpm run demo -- \"What is RLHF?\"\n{\n  \"final\": \"RLHF is a technique that uses human preferences as a reward signal to fine-tune language models.\",\n  \"sources\": [\"https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback\"],\n  \"steps\": 2\n}\n```\n\nRepo: [github.com/MukundaKatta/gemma4-safe-agent](https://github.com/MukundaKatta/gemma4-safe-agent)\n\n## Why Gemma 4 e2b specifically\n\nGemma 4 ships in four sizes: e2b and e4b for edge and mobile, a 26B Mixture-of-Experts model, and a 31B dense model for servers. I picked e2b on purpose.\n\nReasons:\n\n-\n**Runs anywhere.** Two GB of RAM, no network, no key. The agent works on a CI runner, a Raspberry Pi, an old MacBook. The bigger sizes do not. -\n**Hardest reliability case.** A 2B-class model makes more parse mistakes and more arg mistakes than a 26B. If the scaffolding holds at the 2B level, the bigger ones are a drop-in via`GEMMA_MODEL=gemma4:e4b`\n\n. -\n**Real product surface.** Cheap, fast, local agents are where on-device AI is going. e2b is the right target for the kind of agent you'd actually ship in a desktop app, a mobile shell, or a browser extension.\n\nThe same agent runs against any of the four Gemma 4 variants with one env var change.\n\n## How it works\n\nThe whole agent is a small loop:\n\n``` js\nfor (let step = 0; step < MAX_STEPS; step++) {\n  const fitted = fit(messages, { maxTokens: 4096, preserveSystem: true, preserveLastN: 2 });\n  const raw = await ollamaChat(fitted.messages);\n  const action = parseAction(raw);\n\n  if (action.kind === 'tool') {\n    const result = await TOOLS[action.tool].fn(action.args);\n    messages.push({ role: 'assistant', content: raw });\n    messages.push({ role: 'user', content: `tool_result: ${result}` });\n    continue;\n  }\n\n  return cast({ llm, validate, prompt: 'Restate as JSON: ...' });\n}\n```\n\nThe whole run is wrapped in an `agentguard.firewall`\n\nblock. Each tool is wrapped with `agentvet.vet`\n\nand `agentsnap.traceTool`\n\n. That gives me:\n\n-\n**Context budget management** so Gemma 4 e2b never blows its small window -\n**Network egress allowlist** so a prompt injection cannot redirect the agent to fetch an attacker URL -\n**Tool-arg validation** so a hallucinated`fetch_url({ url: 12345 })`\n\nnever runs -\n**Trace snapshots** so swapping models or tweaking prompts shows up as a CI diff, not a production surprise -\n**Final-answer JSON enforcement** with a validate-and-retry loop, which is the load-bearing piece for getting clean JSON out of a 2B model\n\nI wrote about the scaffolding in detail in the [Write-track companion post](https://dev.to/mukundakatta/making-gemma-4-e2b-production-safe-with-five-tiny-libraries-59k4). Here the focus is the agent and the demo.\n\n## What you can run\n\nThe repo ships three entry points:\n\n-\n`npm run demo -- \"...\"`\n\n: real run against your local Gemma 4 e2b -\n`npm run demo:mock`\n\n: same agent, with`fetch_url`\n\nreturning canned pages (no internet needed) -\n`AGENT_MOCK=1 node examples/run-stub.js`\n\n: deterministic stub LLM in place of Gemma 4, so the whole pipeline runs in CI without any model at all\n\nThe third one is the one I use for snapshot regression tests. It proves the agent's tool-use behavior is stable even with an LLM swapped out.\n\n## What surprised me\n\nTwo things.\n\n**Gemma 4 e2b picks the right tool more often than I expected.** The model is small but the tool-selection task is well-bounded (\"you have these two tools, here's the schema, return one JSON\"). When the surrounding scaffolding catches arg mistakes and JSON glitches, the model's reasoning is the part that doesn't need help.**The final-answer step is where the model really needs the cast loop.** Asking for \"JSON only, no prose\" still produced`Sure here you go: {...}`\n\nenough of the time that I would not trust the agent without`agentcast`\n\nwrapping that step. With it, the post-condition becomes a guarantee.\n\n## Try it\n\nRepo: [github.com/MukundaKatta/gemma4-safe-agent](https://github.com/MukundaKatta/gemma4-safe-agent) (MIT)\n\nIssues and PRs welcome. The five scaffolding libs are all on npm under `@mukundakatta/*`\n\nand are zero-dep, so you can pull them into your own Gemma 4 projects one at a time.\n\nIf you build something on top of this, drop me a link.\n\nHave fun with Gemma 4.", "url": "https://wpnews.pro/news/gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b", "canonical_source": "https://dev.to/mukundakatta/gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b-hhm", "published_at": "2026-05-19 06:50:55+00:00", "updated_at": "2026-05-19 07:03:19.264211+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "open-source", "developer-tools"], "entities": ["Gemma 4", "Ollama", "GitHub", "MukundaKatta", "Wikipedia", "Node", "RLHF", "e2b"], "alternates": {"html": "https://wpnews.pro/news/gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b", "markdown": "https://wpnews.pro/news/gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b.md", "text": "https://wpnews.pro/news/gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b.txt", "jsonld": "https://wpnews.pro/news/gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b.jsonld"}}