# A cheaper and safer agentic AI workflow

> Source: <https://danuker.go.ro/a-cheaper-and-safer-agentic-ai-workflow.html>
> Published: 2026-06-21 18:39:21+00:00

I recently tried agentic coding for real. It cost $0.034 and finished in 3 minutes. It made two mistakes. In my personal human attempt, I took an hour, and made four mistakes.

[Cheaper model services](#cheaper-model-services)

I heard about GLM-5.2, and a lot of benchmarks are saying it's on par with the leading proprietary AIs of [just 3 months ago](https://artificialanalysis.ai/models/open-source#open-source-progress). On the same benchmark site I had discovered [GMI Cloud](https://www.gmicloud.ai/en/models), a model service.

I created an account and received $5 in free credits last year. I see the minimum deposit is $10 nowadays. That's fine for me too.

I create [an API key](https://console.gmicloud.ai/user-setting/api-keys) on their service.

I am not too keen on giving a Singaporean model hosted by a US company on data centers scattered throughout the world access to my private data. So I installed Debian in a VirtualBox image, and installed [pi](https://pi.dev/) and the Guest Additions on it. Then I shared a copy of my project as a Shared Folder. Nothing else.

I configured pi and unleashed GLM-5.2 on the folder. 5 minutes and $0.435 later, the agentic sanity test worked. I asked it to look through various data files of various formats and create an index.tsv with information of interest. It did a perfect job.

[Optimizing even further](#optimizing-even-further)

So did `Qwen3.6-35B-A3B-Q4_K_XL`

from Unsloth on my CPU, but it took more than an hour (and my time and interactivity is worth way more than $0.435 per hour). But how cheap could I go? Looking at [what else GMI has to offer](https://artificialanalysis.ai/providers/gmi), DeepSeek V4 Flash catches my eye. It looks like [it's a tiny bit more verbose than GLM-5.2](https://artificialanalysis.ai/models/deepseek-v4-flash-high?cost=cost-per-task#token-use), so the same number of tokens per task, but less than a 10th of the cost. Can it still perform my task?

I replace `zai-org/GLM-5.2-FP8`

with `deepseek-ai/DeepSeek-V4-Flash`

and rerun the test.

Done in 3 minutes and $0.034. It shows a tiny bit of imperfection: it made 2 mistakes. Some irregular data series are shown as "daily" though they've got 5-ish-day and 2-ish-day periods. But other than that it's fine. I also noticed `deepseek-ai/DeepSeek-V4-Pro`

, which is somewhere in the middle. Zero mistakes on my test, but took 2 mins 27s and $0.229. I think this is the one I will keep instead of GLM, but I will mostly use V4-Flash.

My `~/.pi/agent/models.json`

ended up like so:

```
{
  "providers": {
    "ollama": {
      "baseUrl": "https://api.gmi-serving.com/v1",
      "api": "openai-completions",
      "apiKey": "Almost free but not free. Very, very cheap.",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
            {
          "id": "deepseek-ai/DeepSeek-V4-Flash",
          "reasoning": true,
          "contextWindow": 262144
        },
            {
          "id": "deepseek-ai/DeepSeek-V4-Pro",
          "reasoning": true,
          "contextWindow": 262144
        }
      ]
    }
  }
}
```

Especially considering that I made 4 mistakes, and that it took me a bit more than an hour. Curse the mm/dd/yyyy format! It seems I have been thoroughly bested at that task. I feel like adjusting my career path and keeping up with the times.

Bonus: Go even cheaper: Every so often, my models stumbles into a huge one-line JSON, and runs up the token count filling up pi's 50KB `DEFAULT_MAX_BYTES`

limit. I changed that limit to 5KB, significantly reducing input token count. [There is a ticket](https://github.com/earendil-works/pi/issues/5935) to introduce this as a setting, but it was auto-closed. The files to modify (with the pi version as of writing this) are:

```
~/.local/share/pi-node/node-v22.22.3-linux-x64/lib/node_modules/@earendil-works/pi-coding-agent/node_modules/@earendil-works/pi-agent-core/dist/harness/utils/truncate.js
~/.local/share/pi-node/node-v22.22.3-linux-x64/lib/node_modules/@earendil-works/pi-coding-agent/dist/core/tools/truncate.js
```

I modified both (not sure if I needed to). Prompt tokens for DeepSeek-V4-Flash went from 604k to 431k, and total cost went from $0.034 to $0.026 for my particular test.

[The future](#the-future)

My work now changed significantly. No longer do I manually copy paste tiny code segments, instead I ask the agent what to do, then compare the agent's shared directory with the main one. I do this with PyCharm for its good diff directory interface, but you can do it with Meld as well.

So there you have it. I am reaping the AI rewards, while refusing to give in to vendor lock-in. I despise closed ecosystems and enshittification. When Anthropic started pushing for Claude Code exclusivity, I found that anticompetitive. Also, arbitrary and sudden price increases are reckless while open weights models are just a few months behind. They are desperately trying to extract value from rapidly devaluing models. If the breakneck pace slows down, their value evaporates. The moat they are trying to build is actually slowing themselves down.

The hardware is where it's at. Companies like GMI and DeepInfra provide flexible value. And the hardware depreciates in at least a few years, rather than 3 months. I don't know how to monetize models sustainably, however. Maybe crowdfunding? Non-profits? Public utilities like firefighters?

[TL;DR / summary](#tldr-summary)

- Get a model API — Create an account at
[GMI Cloud](https://www.gmicloud.ai/en/models)(or any provider like DeepInfra). - Create an API key at
[their console](https://console.gmicloud.ai/user-setting/api-keys). - Isolate the environment (warmly recommended, for privacy) — Install Debian in VirtualBox (or any other sandbox you prefer).
- Install the agent — Inside the VM, install
[pi](https://pi.dev/). Modify the hardcoded`DEFAULT_MAX_BYTES`

. - Configure the provider — Edit
`~/.pi/agent/models.json`

to add an OpenAI-compatible provider pointing to the API endpoint. - Run a sanity check — Give the agent a task and check results.
- Set up your review workflow - share a copy to the VM, and keep reviewing code between the two copies.
- ???
- Profit!
