qwen2.5-coder is too slow for Claude Code on a Mac. Here's the fix.

The article describes how to run Claude Code locally using Ollama on a Mac, enabling offline use on flights or in areas with poor connectivity. The author initially chose the recommended qwen2.5-coder:14b model but found it too slow for agentic tasks, with tool calls taking 25–52 seconds. Switching to the larger gemma4:26b model provided acceptable performance, achieving roughly 70% of normal Claude Code workflow speed for offline coding sessions.

Claude Code does not care where the model lives. Point it at a local model and it works with no network. I tested that at 35,000 feet, picked the wrong model first, and swapped mid-flight. qwen2.5-coder:14b . It was too slow for anything agentic. One tool call sat for 25 seconds, the next for 52.gemma4:26b . That one carried the session.Ollama runs an open-weights model on your laptop. Claude Code points at Ollama instead of Anthropic's servers. No network call leaves the machine. The cloud account is irrelevant for that session. The only real decision is which local model you run, and that decision is where I got it wrong the first time. Before the setup, the three objections I get every time: Local is the only setup that runs at 35,000 feet. It also runs on a train through a tunnel, in a cafe with broken wifi, and on the morning the OpenAI status page goes red. The flight is just the stress test. Install Ollama and pull a model: brew install ollama ollama pull qwen2.5-coder:14b Do this on home wifi the night before. The pull is around 9 GB. Airport wifi and hotspots will not cooperate, and finding that out at the gate is its own small tragedy. Confirm it landed: ollama list This was my mistake, so I will be blunt about it: I prepped qwen2.5-coder:14b because it is the model every "local LLM for coding" post recommends. Pull more than one. You will see why in Step 4. Start the Ollama server in one terminal: ollama serve Then in a new terminal, launch Claude Code against your local model: ollama launch claude --model qwen2.5-coder:14b Wrap that in two shell aliases so the rest of your workflow has named modes. Add these to ~/.zshrc : alias claude-local='ollama launch claude --model gemma4:26b' alias claude-cloud='claude' Then source ~/.zshrc . That is the entire switching layer. claude-local runs offline against Ollama. claude-cloud runs against the real Anthropic API. Two commands, one decision per session. Prove the setup works in airplane mode before you board anything. This is non-negotiable. Discovering a missing step at altitude is bad theater with no exits. ollama serve is running.claude-local and point it at a real file.If it loads your project and answers with wifi off, it will work on the plane. The best move I made was running the model without wifi on the ground first and measuring real performance. Every forum I read pointed at qwen2.5-coder . I trusted them. They were wrong for this job. File reads were fine. Short explanations were fine. Then the model tried anything agentic, and the wait times stopped being a rounding error. One tool call crunched for 25 seconds. An earlier step had sat at 52. For a single step in a loop that needs five or six of them, that is not a workflow. That is staring at a terminal while the person next to you finishes a movie. qwen2.5-coder:14b is a fine model for single-shot edits. For the multi-step tool loop that Claude Code actually runs, on this hardware, it could not keep up. The model every post recommends was the wrong call for the job I had. I had pulled a second model before the flight, exactly because I did not fully trust the first one. So I switched to gemma4:26b . Bigger model, 17 GB on disk, and on this MacBook it was the difference between a demo and a tool. The tool loop ran at a speed I would actually choose. The gap analysis completed. Multi-step reasoning held together instead of stalling halfway. Honest scorecard for the flight: roughly 70 percent of my normal Claude Code workflow worked on gemma4:26b . The 30 percent that did not was the heavy "go reason across the whole repo" pattern, which is cloud territory anyway. For six hours of focus on a known task, it was a real working setup, not a downgrade. Because I already had a tight context-engineering setup with optimised token consumption, it ran smoothly. The Mac started lagging briefly when I had Xcode and Antigravity open alongside, but closing those and cleaning up Chrome tabs sorted it. If you want the context-engineering side, the U-AMOS write-up is here: I spent 6 months losing fights with AI in React Native. Then I built U-AMOS. Practical tip: install the OneTab Chrome extension. Collapse open tabs into a list when you start a focus session. RAM frees up immediately and so does your attention. OneTab on the Chrome Web Store. The lesson from the flight changed my default. Here is the short list I keep now: Notice what is not on that list: qwen2.5-coder . That is not an accident. Pick a model that is RL-trained for tool use, not just code completion. Claude Code lives or dies on the tool loop. After running both for weeks, the rule is simple. Reach for claude-local when: Reach for claude-cloud when: You do not pick once and live there. The aliases exist so you can switch inside a single session. Draft offline, land, run claude-cloud for the high-stakes execution. The honest section, because AI-generated tutorials never have one. claude-cloud is the fix in the moment.This offline setup is one of three layers in a full AI-coding stack: cloud LLMs for heavy reasoning, local LLMs for offline and private work, and on-device LLMs for the mobile apps you ship to users. The on-device side for React Native is its own problem, covered in the Phi-3 Mini integration walkthrough. All three ship pre-wired in the AI Mobile Launcher AI Pro tier, so you are not assembling this from scratch. I packaged the rest of this into the Local LLM with Claude Code bundle: the paste-ready zshrc aliases plus a claude-status helper, the Ollama config tuned for Apple Silicon, the model-picker matrix, and a pre-flight checklist so the setup is never a surprise at altitude. Reply to the Code Meet AI newsletter and I will send it. Can I run Claude itself locally? No. Claude is closed-weight, so there is no local-runnable Claude. This setup uses Claude Code, the CLI, with an open-weights model like Gemma 4 or Devstral serving the inference. The CLI is the interface, the model is whatever endpoint you point it at. What is the best local LLM for coding with Claude Code? For the agentic tool loop Claude Code runs, pick a model RL-trained for tool use: Devstral Small, Qwen3-Coder, or Gemma 4. Avoid older completion-tuned models like qwen2.5-coder . They handle single edits fine but stall on multi-step work. Does Claude Code airplane mode actually work with no signal? Yes. With Claude Code pointed at local Ollama, no request leaves your laptop. I ran a full session at 35,000 feet with wifi off. The only requirement is pulling the model in advance. Why Ollama and not LM Studio or llama.cpp? Ollama wraps llama.cpp with a clean HTTP API on a known port. LM Studio works too but is GUI-first. Direct llama.cpp gives more control and more setup pain. Ollama is the path of least resistance for getting this running in under 30 minutes. Will I get the same code quality as cloud Claude? No. A good local model is excellent for syntax-level work: refactors, cleanup, rewriting a hook. For plan-heavy or reasoning-heavy tasks the gap is large. Use cloud for design, local for execution, or use local to draft and cloud to polish. Malik Chohra — 9 yrs software, 7 in React Native. Building Wire RN, AI Mobile Launcher, and Code Meet AI.