# gemma4-safe-agent: a tool-using research agent on Gemma 4 e2b

> Source: <https://dev.to/mukundakatta/gemma4-safe-agent-a-tool-using-research-agent-on-gemma-4-e2b-hhm>
> Published: 2026-05-19 06:50:55+00:00

Submission for the

[Gemma 4 DEV Challenge], Build track. Companion to my Write-track post on the[five libs behind it].

## What it is

A tool-using research agent that runs locally on **Gemma 4 e2b** via Ollama, in around 200 lines of Node.

You give it a question. It picks between two tools, reads a Wikipedia page, then returns a structured JSON answer with sources. No API key. No rate limit. Two GB of RAM and an Ollama instance is the whole stack.

```
ollama pull gemma4:e2b
git clone https://github.com/MukundaKatta/gemma4-safe-agent
cd gemma4-safe-agent && npm install
npm run demo -- "What is RLHF?"
{
  "final": "RLHF is a technique that uses human preferences as a reward signal to fine-tune language models.",
  "sources": ["https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback"],
  "steps": 2
}
```

Repo: [github.com/MukundaKatta/gemma4-safe-agent](https://github.com/MukundaKatta/gemma4-safe-agent)

## Why Gemma 4 e2b specifically

Gemma 4 ships in four sizes: e2b and e4b for edge and mobile, a 26B Mixture-of-Experts model, and a 31B dense model for servers. I picked e2b on purpose.

Reasons:

-
**Runs anywhere.** Two GB of RAM, no network, no key. The agent works on a CI runner, a Raspberry Pi, an old MacBook. The bigger sizes do not. -
**Hardest reliability case.** A 2B-class model makes more parse mistakes and more arg mistakes than a 26B. If the scaffolding holds at the 2B level, the bigger ones are a drop-in via`GEMMA_MODEL=gemma4:e4b`

. -
**Real product surface.** Cheap, fast, local agents are where on-device AI is going. e2b is the right target for the kind of agent you'd actually ship in a desktop app, a mobile shell, or a browser extension.

The same agent runs against any of the four Gemma 4 variants with one env var change.

## How it works

The whole agent is a small loop:

``` js
for (let step = 0; step < MAX_STEPS; step++) {
  const fitted = fit(messages, { maxTokens: 4096, preserveSystem: true, preserveLastN: 2 });
  const raw = await ollamaChat(fitted.messages);
  const action = parseAction(raw);

  if (action.kind === 'tool') {
    const result = await TOOLS[action.tool].fn(action.args);
    messages.push({ role: 'assistant', content: raw });
    messages.push({ role: 'user', content: `tool_result: ${result}` });
    continue;
  }

  return cast({ llm, validate, prompt: 'Restate as JSON: ...' });
}
```

The whole run is wrapped in an `agentguard.firewall`

block. Each tool is wrapped with `agentvet.vet`

and `agentsnap.traceTool`

. That gives me:

-
**Context budget management** so Gemma 4 e2b never blows its small window -
**Network egress allowlist** so a prompt injection cannot redirect the agent to fetch an attacker URL -
**Tool-arg validation** so a hallucinated`fetch_url({ url: 12345 })`

never runs -
**Trace snapshots** so swapping models or tweaking prompts shows up as a CI diff, not a production surprise -
**Final-answer JSON enforcement** with a validate-and-retry loop, which is the load-bearing piece for getting clean JSON out of a 2B model

I wrote about the scaffolding in detail in the [Write-track companion post](https://dev.to/mukundakatta/making-gemma-4-e2b-production-safe-with-five-tiny-libraries-59k4). Here the focus is the agent and the demo.

## What you can run

The repo ships three entry points:

-
`npm run demo -- "..."`

: real run against your local Gemma 4 e2b -
`npm run demo:mock`

: same agent, with`fetch_url`

returning canned pages (no internet needed) -
`AGENT_MOCK=1 node examples/run-stub.js`

: deterministic stub LLM in place of Gemma 4, so the whole pipeline runs in CI without any model at all

The third one is the one I use for snapshot regression tests. It proves the agent's tool-use behavior is stable even with an LLM swapped out.

## What surprised me

Two things.

**Gemma 4 e2b picks the right tool more often than I expected.** The model is small but the tool-selection task is well-bounded ("you have these two tools, here's the schema, return one JSON"). When the surrounding scaffolding catches arg mistakes and JSON glitches, the model's reasoning is the part that doesn't need help.**The final-answer step is where the model really needs the cast loop.** Asking for "JSON only, no prose" still produced`Sure here you go: {...}`

enough of the time that I would not trust the agent without`agentcast`

wrapping that step. With it, the post-condition becomes a guarantee.

## Try it

Repo: [github.com/MukundaKatta/gemma4-safe-agent](https://github.com/MukundaKatta/gemma4-safe-agent) (MIT)

Issues and PRs welcome. The five scaffolding libs are all on npm under `@mukundakatta/*`

and are zero-dep, so you can pull them into your own Gemma 4 projects one at a time.

If you build something on top of this, drop me a link.

Have fun with Gemma 4.
