{"slug": "how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private", "title": "How to Set Up a Local AI Coding Assistant in VS Code – Free & Private", "summary": "A developer has published a guide for setting up a local AI coding assistant in VS Code using Continue and Ollama, achieving tab autocomplete and code chat entirely on-device. The setup requires a GPU with 24GB+ VRAM for the 14B model, but smaller GPUs can use Qwen2.5 Coder 7B. The guide emphasizes zero cost, privacy, offline capability, and model flexibility.", "body_md": "Want a Cursor/Copilot-style coding assistant that runs entirely on your machine? Your code never leaves your computer and there's no subscription fee. Here's how to set it up with VS Code, Continue, and Ollama.\n\n##\nWhat You'll Build\n\n- Tab autocomplete (like Copilot) that suggests code as you type\n- Chat with your codebase - ask questions, generate functions, write tests\n- 100% local - zero data sent to any cloud service\n\n##\nPrerequisites\n\n- A GPU with 24GB+ VRAM (RTX 3090/4090 or better)\n- For smaller GPUs (8-12GB), use Qwen2.5 Coder 7B instead\n- Ollama installed (see ollama.com)\n- VS Code (free from code.visualstudio.com)\n\n##\nStep 1: Pull the Model\n\nOpen a terminal and pull a coding-focused model:\n\nThis takes a few minutes depending on your internet. The model is ~8GB at Q4 quantization.\n\n##\nStep 2: Install Continue\n\nIn VS Code:\n\n- Open Extensions (Ctrl+Shift+X)\n- Search for \"Continue\"\n- Click Install\n- Reload VS Code when prompted\n\n##\nStep 3: Configure\n\nCreate or edit `~/.continue/config.yaml`\n\n:\n\n##\nStep 4: Use It\n\n-\n**Autocomplete**: Start typing. Continue suggests completions in gray. Press Tab to accept.\n-\n**Chat**: Press Ctrl+L (or Cmd+L on Mac) to open the chat panel. Ask questions about your code.\n-\n**Edit**: Select code and press Ctrl+Shift+L to ask for changes.\n-\n**Inline**: Highlight code, press Ctrl+I, and describe what you want changed.\n\n##\nPerformance Notes\n\n| GPU |\nModel |\nSpeed |\nQuality |\n| RTX 3090 (24GB) |\nQwen2.5-Coder 14B |\n25-35 tok/s |\nExcellent |\n| RTX 4090 (24GB) |\nQwen2.5-Coder 14B |\n40-50 tok/s |\nExcellent |\n| RTX 3060 (12GB) |\nQwen2.5-Coder 7B |\n30-40 tok/s |\nGood |\n| RTX 4060 (8GB) |\nQwen2.5-Coder 7B (Q4) |\n20-30 tok/s |\nGood |\n\n##\nWhy Go Local?\n\n-\n**$0/month** vs $20/seat for Copilot or Cursor\n-\n**Privacy**: your proprietary code never touches a third-party server\n-\n**Offline**: works without internet\n-\n**Model choice**: swap models anytime, no vendor lock-in\n\n*Originally published on *[everylocalai.com](https://everylocalai.com/stack/local-cursor)", "url": "https://wpnews.pro/news/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private", "canonical_source": "https://dev.to/everylocalai/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private-2nkk", "published_at": "2026-06-18 09:02:36+00:00", "updated_at": "2026-06-18 09:21:43.286462+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "ai-tools", "generative-ai", "ai-products"], "entities": ["VS Code", "Continue", "Ollama", "Qwen2.5 Coder", "RTX 3090", "RTX 4090", "RTX 3060", "RTX 4060"], "alternates": {"html": "https://wpnews.pro/news/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private", "markdown": "https://wpnews.pro/news/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private.md", "text": "https://wpnews.pro/news/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private.txt", "jsonld": "https://wpnews.pro/news/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private.jsonld"}}