A practical guide to getting real work done with AI coding tools without bleeding money_
Last month I watched a friend open his OpenRouter dashboard and wince. He’d burned through $47 in a single afternoon of heavy coding — and honestly, most of those tokens went to routine stuff. Renaming variables. Writing boilerplate tests. Generating commit messages. Frontier-model work on tasks that didn’t need a frontier model.
I have four kids. I notice avoidable expenses. So I’ve spent real time figuring out how to use Kilo Code effectively without the bill creeping up on me. You can get completely free if you want — and even the paid path doesn’t have to hurt.
The Completely Free Setup #
Kilo uses AI in three places: agentic chat (the main coding assistant), autocomplete (inline suggestions as you type), and background tasks (session titles, context summarization). Each one can be configured to cost zero dollars.
Auto Free for Agentic Chat
The fastest path to free is Auto Free (kilo-auto/free
). Select it from the model dropdown and you’re done. No API key. No configuration. No billing.
Behind the scenes, Kilo routes your requests to the best available free models on OpenRouter — splitting traffic across them and updating the mapping server-side as providers change their promotional periods. You always get whatever’s currently best without tracking it yourself.
A caveat: Auto Free may route to providers that log prompts and outputs for their own improvement purposes. Don’t paste your company’s proprietary code into it. For personal projects, open source work, or learning, it’s fine.
Free Autocomplete via Mistral BYOK
Autocomplete uses credits by default. But Mistral offers a free tier for their Codestral model, and Kilo lets you bring your own key.
The setup takes about two minutes:
Grab a free Codestral API key from
Mistral’s platformGo to BYOK (Bring Your Own Key) on the
Kilo GatewaySelect Codestral as the provider
Paste your key and save
After that, autocomplete runs through your Mistral key at zero cost on your Kilo balance. Codestral is genuinely good at code completion — it’s purpose-built for fill-in-the-middle tasks.
Free Background Tasks
Kilo uses a small model for things like generating session titles. By default it’s Auto Small, which consumes credits. Swap it to any free model:
In VS Code: Settings → Models → change the small model to any free model from the picker.
In the CLI: Set small_model
in ~/.config/kilo/config.json
:
{
"small_model": "your-preferred-free-model"
}
Type “free” in the model picker to filter — there are always several available.
Total Cost: $0
Auto Free for chat. Mistral BYOK for autocomplete. A free model for background tasks. That’s a fully functional AI coding setup that costs nothing.
Auto Efficient for Paid Work #
Free is great for learning and personal projects. But if you’re doing professional work and you want good results without frontier pricing, Auto Efficient (kilo-auto/efficient
) is the move I’d actually recommend.
Auto Efficient watches your coding session in context, classifies each request by difficulty, and routes to the cheapest model that’s been proven accurate enough for that specific task — based on benchmarks Kilo runs continuously.
In practice this means:
Routine refactoring, boilerplate, and simple Q&A → cheap model
Complex architecture decisions, tricky debugging → more capable model
If it can’t confidently classify → falls back to Balanced tier (so quality never drops below a solid baseline)
You’re not manually switching models. You’re not guessing which tasks need Claude Opus versus which ones are fine with something smaller. The system does it, and it does it based on actual benchmark data rather than guesswork.
My friend who burned $47 in an afternoon was mostly sending frontier-model tokens at tasks Auto Efficient would’ve routed to a $0.10/million-token model. The expensive stuff should be reserved for when it matters. You can even see some results of “one shots” with Auto Efficient here: https://kilo.ai/efficient-vs-frontier.
BYOK Deep Dive #
If you want more control — or specific models you trust — Kilo supports bring-your-own-key for several providers. These are the most relevant options:
Google AI Studio
Google offers a generous free tier for Gemini models that doesn’t require billing information. You get access to Gemini 2.5 Flash and Pro models with reasonable rate limits. For a lot of coding tasks, Flash is fast and capable enough. This is probably the best “free with your own key” option right now if you want something beyond what Auto Free routes to.
DeepSeek
DeepSeek’s API rates are remarkably low. Their V4-Flash model is fast and cheap — good for the kind of rapid-fire coding work where you’re iterating quickly and don’t need to overthink model selection. The community has been vocal about this one working well with Kilo.
Ollama / LM Studio (Local)
Zero API cost, full privacy, works offline. The trade-off is you need decent hardware — a good GPU makes a real difference — and local models still can’t match the largest cloud models on complex tasks. But for autocomplete-style work, simple refactoring, and code generation from clear specifications, they’re solid.
Kilo supports Ollama, LM Studio, and Atomic Chat out of the box. Point Kilo at localhost
and you’re running.
Local models make sense when you’re working on proprietary code you can’t send to the cloud, you have a beefy machine sitting idle, or you just like the idea of everything running on your hardware.
They don’t make sense when you’re on a laptop without a dedicated GPU, you need the best possible results on complex architecture tasks, or you value your time more than the API costs.
Where Kilo Pass Fits #
I should mention this briefly: Kilo Pass starts at $19/month for the Starter tier and includes bonus credits. It charges zero markup on model costs — you pay exactly what the providers charge. If you’re already spending $20+ per month on various API keys and want to simplify to one bill with one login, it’s a clean option. But it’s not necessary. Everything above works without it.
Practical Tips from the Community #
The Kilo Discord and Reddit communities have figured out some patterns worth stealing:
Be precise with your prompts. One user on r/kilocode put it well: “Try to give more precise restrictions, use ask mode to realize your demand and write markdown doc in agent mode, the last step will be easy to generate code with a cheap model.” Translation: spend your expensive tokens on understanding the problem, then let cheap models do the implementation once the spec is clear.
Match the model to the task manually when it matters. Use a reasoning model for architecture and planning. Switch to something fast and cheap for the actual code generation once you know what you want. Kilo’s modes (Code, Architect, Debug) make this natural.
Don’t underestimate DeepSeek for daily driving. Multiple HN commenters have noted they’re using DeepSeek V4 Pro in Kilo and finding it genuinely good for everyday work. Fast responses, low cost, solid code quality.
Auto Efficient handles this automatically — but if you’re on Free tier and want to be strategic about occasional paid usage, the community consensus is: plan with a smart model, build with a cheap one.
What I’d Choose #
Good AI coding assistance does not require $47 afternoons, and the free setup is viable if you’re willing to accept the trade-offs of free models.
The spectrum looks like this:
$0/month: Auto Free + Mistral BYOK + free small model. Fully functional, some quality trade-offs, providers may log your prompts.~$5-15/month: Auto Efficient or BYOK with cheap providers (DeepSeek, Google AI Studio free tier for lighter work). Professional quality on most tasks without frontier pricing.$20+/month: Frontier models when you need them, Kilo Pass for simplicity, or heavy usage on paid tiers.
Pick the level that matches your work and your budget. Kilo’s workflow stays the same across tiers; only the selected model changes.