# Why stop gaming saved my tokens: Building my own local AI Lab

> Source: <https://dev.to/wizsebastian/why-stop-gaming-saved-my-tokens-building-my-own-local-ai-lab-4i06>
> Published: 2026-06-25 03:29:02+00:00

About a year ago, I turned my gaming PC into a local AI Lab. And yes, the most important word in that sentence is **LOCAL**. Let me tell you the story of how I sacrificed my gaming hours to build several tools, and now I'm going to tell you about this one that I use every single day.

Day to day, all of us developers who work with Artificial Intelligence share the same headache: tokens and *rate limits*. We're all victims of the high prices that come with constantly running inference with AI agents like Claude Code, Codex, or Gemini CLI (yeah, I love working from the terminal, I LOVE CLIs).

While I was building AI systems (agent orchestration, LLM *fine-tuning*), I was burning through way too many tokens. I tried tweaking the *prompts* and cleaning up the junk in my context, but the real devourer of my quota showed up when I had to learn a new tool.

I was implementing solutions in QGIS (QGIS is a free, open-source Geographic Information System (GIS) software that allows users to create, edit, visualize, analyze, and publish geospatial data on maps) for a project and I didn't know the interface 100%. Like any dev facing something new, I leaned on AI agents: I'd take a screenshot, send it over, and ask for explanations.

**Here's an important fact that hurt my wallet:**

I was eating through my hourly Claude allowance just doing visual queries, leaving me with no quota left to generate the actual code I really needed for my development.

One day, during a forced break thanks to a Claude *rate limit*, I looked over at my Gaming PC. I realized that instead of complaining about cloud costs, I could save tokens by running local models for visual extraction tasks.

My main work machine is my MacBook because it's so easy to move around with. But the Gaming PC had an extra 1 TB SSD and was running Pop!_OS, a distro where the NVIDIA drivers always stayed stable. So I decided to stop gaming and put it to work.

Setting everything up in an AI *homelab* was a challenge.

I needed a service I could send a screenshot to and get the context back. A traditional OCR extracts pure text at the code level, but that's useless when you need to understand an interface. The answer was in the **VLMs (Vision Language Models)**, which thanks to their pre-training don't just read, they *understand* the image.

I rolled up my sleeves and found the perfect model for my precious 12GB of VRAM: `qwen2.5-vl:7b`

. (Yes, with just 7B parameters you can get incredible results).

I built a small API that queries Ollama. Now I just paste the screenshot, the VLM parses the image, and another agent interprets the context. This whole process hands me back an accurate answer in about **8 seconds**, depending on the image, all private with no data leaving my LOCAL network.

Sacrificing a bit of *gaming* to put together my own *homelab* with pure code has been completely worth it. It's a simple solution, but it represents direct savings in money and technical resources.

This local infrastructure no longer just reads *screenshots*. In fact, I'm currently using this same ecosystem (my homelab) for a plant identification project on a farm, processing images captured from drone flights. *(If you're interested in how to orchestrate and do computer vision by training LLMs to analyze drone images, drop it in the comments and I'll put together the next post).*

*Building all the way from the friction of rate limits to having a local computer vision API is exactly the kind of challenge I enjoy solving.*

Here's the repository where I built the VLM API to get the parsing and context of my screenshots →

Extract text from any *screenshot* using a **Vision Language Model (VLM)** running **100% locally** on your GPU. Paste a capture with `Cmd+V`

and get the text back in ~8 seconds, without sending your data to any cloud.

It was born from a real pain point: every screenshot I sent to Claude/Codex/Gemini to explain an interface cost me ~1,500 tokens. Multiplied by dozens of images a month, that devoured my hourly quota and left me without tokens for what really mattered: generating code. The fix was to stop paying to "see" and move that task to my own AI homelab.

📖 Full story:

"Why stop gaming saved my tokens: Building my own local AI Lab"

```
Browser (paste/drag image)
      │  POST /parse  { image: base64 }
      ▼
server.py  (Flask, port 5000)
      │  POST http://localhost:11434/api/generate
      ▼
Ollama  →  qwen2.5vl:7b  (VLM on your GPU)
      │
      ▼
  Extracted
```

…A big hug, your dev friend Luis Sebastian Vasquez, use AI responsibly and safely.