Want a Cursor/Copilot-style coding assistant that runs entirely on your machine? Your code never leaves your computer and there's no subscription fee. Here's how to set it up with VS Code, Continue, and Ollama.
#
What You'll Build
- Tab autocomplete (like Copilot) that suggests code as you type
- Chat with your codebase - ask questions, generate functions, write tests
- 100% local - zero data sent to any cloud service
#
Prerequisites
- A GPU with 24GB+ VRAM (RTX 3090/4090 or better)
- For smaller GPUs (8-12GB), use Qwen2.5 Coder 7B instead
- Ollama installed (see ollama.com)
- VS Code (free from code.visualstudio.com)
#
Step 1: Pull the Model
Open a terminal and pull a coding-focused model:
This takes a few minutes depending on your internet. The model is ~8GB at Q4 quantization.
#
Step 2: Install Continue
In VS Code:
- Open Extensions (Ctrl+Shift+X)
- Search for "Continue"
- Click Install
- Reload VS Code when prompted
#
Step 3: Configure
Create or edit ~/.continue/config.yaml
:
#
Step 4: Use It
Autocomplete: Start typing. Continue suggests completions in gray. Press Tab to accept. #
Chat: Press Ctrl+L (or Cmd+L on Mac) to open the chat panel. Ask questions about your code. #
Edit: Select code and press Ctrl+Shift+L to ask for changes. #
Inline: Highlight code, press Ctrl+I, and describe what you want changed.
#
Performance Notes
| GPU | Model | Speed | Quality | | RTX 3090 (24GB) | Qwen2.5-Coder 14B | 25-35 tok/s | Excellent | | RTX 4090 (24GB) | Qwen2.5-Coder 14B | 40-50 tok/s | Excellent | | RTX 3060 (12GB) | Qwen2.5-Coder 7B | 30-40 tok/s | Good |
| RTX 4060 (8GB) |
Qwen2.5-Coder 7B (Q4) |
20-30 tok/s | Good |
#
Why Go Local?
$0/month vs $20/seat for Copilot or Cursor #
Privacy: your proprietary code never touches a third-party server #
Offline: works without internet #
Model choice: swap models anytime, no vendor lock-in
*Originally published on *everylocalai.com