# Running Claude Code with a local LLM

> Source: <https://gist.github.com/DiegoRBaquero/f53ab22ae978226c86158a60dad8199d>
> Published: 2026-04-21 22:48:43+00:00

https://github.com/jundot/omlx/releases
Go to model downloader
Multiple options, depending on your RAM
35B parameters with 3 billion active:
unsloth/Qwen3.6-35B-A3B-UD-MLX-3bit
- 17.4 GB (36GB+ RAM ideal)unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit
- 21.6 GB (48GB+ RAM ideal)unsloth/Qwen3.6-35B-A3B-MLX-8bit
- 37.7GB GB (64GB+ RAM ideal)
27B billion parameters
unsloth/Qwen3.6-27B-UD-MLX-4bit
- 26.2GB (48GB+ RAM ideal)unsloth/Qwen3.6-27B-UD-MLX-6bit
- 30.5GB (64GB+ RAM ideal)unsloth/Qwen3.6-27B-UD-MLX-8bit
- 34.7GB (64GB+ RAM ideal)
- Go to model settings
- Pin and default model to the downloaded one
- Open the model's settings
- Enable
TurboQuant KV Cache
in3.5-bit
- Go to global settings
- Turn on
Fallback to Default Model
- Set
Hot Cache Limit (In-Memory Cache)
to 10% - Set
Cold Cache Limit (SSD Cache)
to 10% - Increase
Max Context Window
to256000
- Increase
Max Tokens
to64000
- Save
-
Add
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0"
inenv
key inside~/.claude/settings.json
(Ref)Example:
{ "env": { "CLAUDE_CODE_ATTRIBUTION_HEADER": "0" } }
