Running Claude Code with a local LLM

The article provides instructions for running Claude Code using a local large language model (LLM) instead of Anthropic's cloud-based models. It recommends downloading specific quantized Qwen3.6 models from the MLX community, such as the 35B or 27B parameter versions, with RAM requirements ranging from 36GB to 64GB+. The guide details configuration steps including enabling TurboQuant KV Cache, adjusting context and token limits, and adding a specific environment variable to disable attribution headers in the settings file.

https://github.com/jundot/omlx/releases Go to model downloader Multiple options, depending on your RAM 35B parameters with 3 billion active: unsloth/Qwen3.6-35B-A3B-UD-MLX-3bit - 17.4 GB 36GB+ RAM ideal unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit - 21.6 GB 48GB+ RAM ideal unsloth/Qwen3.6-35B-A3B-MLX-8bit - 37.7GB GB 64GB+ RAM ideal 27B billion parameters unsloth/Qwen3.6-27B-UD-MLX-4bit - 26.2GB 48GB+ RAM ideal unsloth/Qwen3.6-27B-UD-MLX-6bit - 30.5GB 64GB+ RAM ideal unsloth/Qwen3.6-27B-UD-MLX-8bit - 34.7GB 64GB+ RAM ideal - Go to model settings - Pin and default model to the downloaded one - Open the model's settings - Enable TurboQuant KV Cache in3.5-bit - Go to global settings - Turn on Fallback to Default Model - Set Hot Cache Limit In-Memory Cache to 10% - Set Cold Cache Limit SSD Cache to 10% - Increase Max Context Window to256000 - Increase Max Tokens to64000 - Save - Add "CLAUDE CODE ATTRIBUTION HEADER": "0" inenv key inside~/.claude/settings.json Ref Example: { "env": { "CLAUDE CODE ATTRIBUTION HEADER": "0" } }