Running Local LLMs for Coding: No API Keys, Full Control

A developer reports that running local LLMs for code completion is now faster and more private than using cloud APIs. Using Ollama with a 7B model on an M3 Max MacBook Pro, they achieved sub-second completions and saved significant time on boilerplate, debugging, and test writing. The setup requires at least 8GB VRAM and works best with a decent GPU.

You've probably noticed the code completion tools getting slower and more rate-limited. You've also probably gotten tired of explaining your entire codebase to an API that costs money per token. What if I told you could run your own LLM locally and get genuinely faster completions? I spent the last month setting up a local LLM workflow, and yeah, it's better than outsourcing to APIs. Here's what I actually use. Six months ago, local models were slow. Now? Not so much. Ollama + a decent GPU gets you sub-second completions for code tasks. That's faster than waiting for an API call half the time. The benefits are real: The downside: You need about 8GB of VRAM minimum. 16GB is comfortable. If you're on older hardware, this won't work. Hardware: MacBook Pro 16" with M3 Max 36GB unified memory . On Linux? Similar story — need a decent GPU or CPU with enough cores. Tool stack: Installation takes 10 minutes: Install Ollama brew install ollama or download from ollama.ai Start the server ollama serve In another terminal, pull a model ollama pull mistral That's it. Ollama runs on localhost:11434 by default. For Continue, I grabbed the VS Code extension and configured it: { "models": { "title": "Mistral 7B Local", "model": "mistral", "apiBase": "http://localhost:11434/api", "provider": "ollama" } } Now I use Ctrl+K or Cmd+K on Mac to trigger inline code generation. It works. Actually works. Example 1: Boilerplate Generation I needed a Redux reducer with a few specific actions. Mistral nailed it on the first try — structured correctly, no hallucinations, just gave me what I asked for. Saved 5 minutes of manual typing. Example 2: Bug Diagnosis Pasted a stack trace, asked what was happening. Got a correct answer with a fix. Not a wild guess — the actual issue was a missing async/await in a parent function. Saved me 20 minutes of debugging. Example 3: Test Writing Asked it to generate tests for a utility function. Generated decent test cases using Jest. Needed minor tweaks but 80% complete. Normal. This isn't a magic tool. Mistral 7B and other 7B models genuinely struggle with: For these, I still use Claude for serious thinking. Local models are for coding speed, not problem solving. On my M3 Max, inference takes 0.5-2 seconds for code completions. That's real-world, not benchmark. Sometimes slower, sometimes faster depending on what's running. Compare that to waiting 3-5 seconds for an API request to round-trip, and the local option wins. If you're: Then absolutely. Set aside an hour, get it running, see if it fits your workflow. If you're: Then stick with what you have. Local models are a productivity tool, not a replacement for serious infrastructure. Also — if you're building your own AI tooling, stay in the loop with LearnAI Weekly for deeper dives on local models, open-source tools, and what's actually worth your time. The future of coding tools is personal. Control yours.