A cheaper and safer agentic AI workflow

A developer tested agentic AI coding with DeepSeek V4 Flash on GMI Cloud, completing a data processing task in 3 minutes at $0.034 with two mistakes, compared to a human attempt taking an hour with four mistakes. The workflow used pi agent framework and Debian in a VirtualBox for privacy, highlighting cost and efficiency advantages of open-source models.

I recently tried agentic coding for real. It cost $0.034 and finished in 3 minutes. It made two mistakes. In my personal human attempt, I took an hour, and made four mistakes. Cheaper model services cheaper-model-services I heard about GLM-5.2, and a lot of benchmarks are saying it's on par with the leading proprietary AIs of just 3 months ago https://artificialanalysis.ai/models/open-source open-source-progress . On the same benchmark site I had discovered GMI Cloud https://www.gmicloud.ai/en/models , a model service. I created an account and received $5 in free credits last year. I see the minimum deposit is $10 nowadays. That's fine for me too. I create an API key https://console.gmicloud.ai/user-setting/api-keys on their service. I am not too keen on giving a Singaporean model hosted by a US company on data centers scattered throughout the world access to my private data. So I installed Debian in a VirtualBox image, and installed pi https://pi.dev/ and the Guest Additions on it. Then I shared a copy of my project as a Shared Folder. Nothing else. I configured pi and unleashed GLM-5.2 on the folder. 5 minutes and $0.435 later, the agentic sanity test worked. I asked it to look through various data files of various formats and create an index.tsv with information of interest. It did a perfect job. Optimizing even further optimizing-even-further So did Qwen3.6-35B-A3B-Q4 K XL from Unsloth on my CPU, but it took more than an hour and my time and interactivity is worth way more than $0.435 per hour . But how cheap could I go? Looking at what else GMI has to offer https://artificialanalysis.ai/providers/gmi , DeepSeek V4 Flash catches my eye. It looks like it's a tiny bit more verbose than GLM-5.2 https://artificialanalysis.ai/models/deepseek-v4-flash-high?cost=cost-per-task token-use , so the same number of tokens per task, but less than a 10th of the cost. Can it still perform my task? I replace zai-org/GLM-5.2-FP8 with deepseek-ai/DeepSeek-V4-Flash and rerun the test. Done in 3 minutes and $0.034. It shows a tiny bit of imperfection: it made 2 mistakes. Some irregular data series are shown as "daily" though they've got 5-ish-day and 2-ish-day periods. But other than that it's fine. I also noticed deepseek-ai/DeepSeek-V4-Pro , which is somewhere in the middle. Zero mistakes on my test, but took 2 mins 27s and $0.229. I think this is the one I will keep instead of GLM, but I will mostly use V4-Flash. My ~/.pi/agent/models.json ended up like so: { "providers": { "ollama": { "baseUrl": "https://api.gmi-serving.com/v1", "api": "openai-completions", "apiKey": "Almost free but not free. Very, very cheap.", "compat": { "supportsDeveloperRole": false, "supportsReasoningEffort": false }, "models": { "id": "deepseek-ai/DeepSeek-V4-Flash", "reasoning": true, "contextWindow": 262144 }, { "id": "deepseek-ai/DeepSeek-V4-Pro", "reasoning": true, "contextWindow": 262144 } } } } Especially considering that I made 4 mistakes, and that it took me a bit more than an hour. Curse the mm/dd/yyyy format It seems I have been thoroughly bested at that task. I feel like adjusting my career path and keeping up with the times. Bonus: Go even cheaper: Every so often, my models stumbles into a huge one-line JSON, and runs up the token count filling up pi's 50KB DEFAULT MAX BYTES limit. I changed that limit to 5KB, significantly reducing input token count. There is a ticket https://github.com/earendil-works/pi/issues/5935 to introduce this as a setting, but it was auto-closed. The files to modify with the pi version as of writing this are: ~/.local/share/pi-node/node-v22.22.3-linux-x64/lib/node modules/@earendil-works/pi-coding-agent/node modules/@earendil-works/pi-agent-core/dist/harness/utils/truncate.js ~/.local/share/pi-node/node-v22.22.3-linux-x64/lib/node modules/@earendil-works/pi-coding-agent/dist/core/tools/truncate.js I modified both not sure if I needed to . Prompt tokens for DeepSeek-V4-Flash went from 604k to 431k, and total cost went from $0.034 to $0.026 for my particular test. The future the-future My work now changed significantly. No longer do I manually copy paste tiny code segments, instead I ask the agent what to do, then compare the agent's shared directory with the main one. I do this with PyCharm for its good diff directory interface, but you can do it with Meld as well. So there you have it. I am reaping the AI rewards, while refusing to give in to vendor lock-in. I despise closed ecosystems and enshittification. When Anthropic started pushing for Claude Code exclusivity, I found that anticompetitive. Also, arbitrary and sudden price increases are reckless while open weights models are just a few months behind. They are desperately trying to extract value from rapidly devaluing models. If the breakneck pace slows down, their value evaporates. The moat they are trying to build is actually slowing themselves down. The hardware is where it's at. Companies like GMI and DeepInfra provide flexible value. And the hardware depreciates in at least a few years, rather than 3 months. I don't know how to monetize models sustainably, however. Maybe crowdfunding? Non-profits? Public utilities like firefighters? TL;DR / summary tldr-summary - Get a model API — Create an account at GMI Cloud https://www.gmicloud.ai/en/models or any provider like DeepInfra . - Create an API key at their console https://console.gmicloud.ai/user-setting/api-keys . - Isolate the environment warmly recommended, for privacy — Install Debian in VirtualBox or any other sandbox you prefer . - Install the agent — Inside the VM, install pi https://pi.dev/ . Modify the hardcoded DEFAULT MAX BYTES . - Configure the provider — Edit ~/.pi/agent/models.json to add an OpenAI-compatible provider pointing to the API endpoint. - Run a sanity check — Give the agent a task and check results. - Set up your review workflow - share a copy to the VM, and keep reviewing code between the two copies. - ??? - Profit