{"slug": "running-claude-code-with-a-local-llm", "title": "Running Claude Code with a local LLM", "summary": "The article provides instructions for running Claude Code using a local large language model (LLM) instead of Anthropic's cloud-based models. It recommends downloading specific quantized Qwen3.6 models from the MLX community, such as the 35B or 27B parameter versions, with RAM requirements ranging from 36GB to 64GB+. The guide details configuration steps including enabling TurboQuant KV Cache, adjusting context and token limits, and adding a specific environment variable to disable attribution headers in the settings file.", "body_md": "https://github.com/jundot/omlx/releases\nGo to model downloader\nMultiple options, depending on your RAM\n35B parameters with 3 billion active:\nunsloth/Qwen3.6-35B-A3B-UD-MLX-3bit\n- 17.4 GB (36GB+ RAM ideal)unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit\n- 21.6 GB (48GB+ RAM ideal)unsloth/Qwen3.6-35B-A3B-MLX-8bit\n- 37.7GB GB (64GB+ RAM ideal)\n27B billion parameters\nunsloth/Qwen3.6-27B-UD-MLX-4bit\n- 26.2GB (48GB+ RAM ideal)unsloth/Qwen3.6-27B-UD-MLX-6bit\n- 30.5GB (64GB+ RAM ideal)unsloth/Qwen3.6-27B-UD-MLX-8bit\n- 34.7GB (64GB+ RAM ideal)\n- Go to model settings\n- Pin and default model to the downloaded one\n- Open the model's settings\n- Enable\nTurboQuant KV Cache\nin3.5-bit\n- Go to global settings\n- Turn on\nFallback to Default Model\n- Set\nHot Cache Limit (In-Memory Cache)\nto 10% - Set\nCold Cache Limit (SSD Cache)\nto 10% - Increase\nMax Context Window\nto256000\n- Increase\nMax Tokens\nto64000\n- Save\n-\nAdd\n\"CLAUDE_CODE_ATTRIBUTION_HEADER\": \"0\"\ninenv\nkey inside~/.claude/settings.json\n(Ref)Example:\n{ \"env\": { \"CLAUDE_CODE_ATTRIBUTION_HEADER\": \"0\" } }", "url": "https://wpnews.pro/news/running-claude-code-with-a-local-llm", "canonical_source": "https://gist.github.com/DiegoRBaquero/f53ab22ae978226c86158a60dad8199d", "published_at": "2026-04-21 22:48:43+00:00", "updated_at": "2026-05-22 22:06:36.393415+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "open-source"], "entities": ["Claude Code", "Qwen", "unsloth", "MLX"], "alternates": {"html": "https://wpnews.pro/news/running-claude-code-with-a-local-llm", "markdown": "https://wpnews.pro/news/running-claude-code-with-a-local-llm.md", "text": "https://wpnews.pro/news/running-claude-code-with-a-local-llm.txt", "jsonld": "https://wpnews.pro/news/running-claude-code-with-a-local-llm.jsonld"}}