Using Visual Studio Code’s ‘air-gapped’ AI model mode

Microsoft's Visual Studio Code now supports using locally hosted LLMs for chat, tools, and MCP servers via a new 'air-gapped' mode, enabling offline AI workflows without GitHub sign-in. Users can integrate models like those hosted on LM Studio, though inline autocomplete still requires additional tooling.

Microsoft has been pushing hard to make Visual Studio Code https://www.infoworld.com/article/2335960/what-is-visual-studio-code-microsofts-extensible-code-editor.html a major way to consume its AI services, mostly in the form of GitHub Copilot https://www.infoworld.com/article/3609013/github-copilot-everything-you-need-to-know.html . GitHub Copilot’s deep integration with VS Code brings many conveniences — inline autocomplete, for instance — but it’s frustrating for those, like me, who would rather use another model provider, or even a locally hosted LLM, for those functions. Visual Studio Code 1.122 introduced a new feature, “ Use BYOK Bring Your Own Key without a GitHub sign-in https://code.visualstudio.com/updates/v1 122 use-byok-without-a-github-sign-in ,” that allows you to “use chat, tools, and MCP servers in air-gapped or restricted environments where GitHub sign-in isn’t possible.” More importantly, it “enables fully offline workflows with local models like Ollama.” In other words, you can now use locally hosted LLMs for chat, tools, and Model Context Protocol https://www.infoworld.com/article/4029634/what-is-model-context-protocol-how-mcp-bridges-ai-and-external-services.html servers inside Visual Studio Code. The one thing you still can’t do is use a local LLM for inline and next-edit suggestions — at least, not without additional tooling. If you want to use a local LLM with VS Code’s bring-your-own-model system, the first thing you need is a way to host the model. VS Code lacks a model-hosting mechanism of its own, although it’s conceivable that a VS Code extension may offer something like that in the future. That said, hosting models is complicated enough that a dedicated app is really needed for the job. One easy way to host models is via a product like LM Studio https://www.infoworld.com/article/4127250/first-look-run-llms-locally-with-lm-studio.html , a convenient GUI for standing up, serving, and managing LLMs on one’s own hardware. The model host does not have to be the same system you run VS Code on, either. It can be on a server box you control, or on a cloud instance. The choice of model is also important. Many models are powerful but won’t run well on commodity hardware because they’re simply too big. A good rule of thumb is to choose a model that fits into existing VRAM, along with the memory needed for a sizable token context the more, the better . Also, the model should be suited to coding and development work. Some models in this vein that fit comfortably into 8GB VRAM include: Once you have a model up and running, you can integrate it with Visual Studio Code. If you’ve disabled VS Code’s AI features, you will need to turn them on. Make sure the setting chat.disableAIFeatures is turned off. You can find it in Settings | Chat | Miscellaneous . Third-party language models are managed through Visual Studio Code’s language model list. Press Ctrl-Shift-P and type Manage Language Models to open the list of existing language models. Foundry First you will see a list of the built-in models, which are all externally hosted. To add a new model, select Add Models at the top right and select Custom Endpoint . You’ll then get a series of prompts: Chat Completions , Responses , and Messages . Most of the time you’ll want to use Responses , as it’s the most general-purpose option of the three.Once you finish providing those answers, you’ll be dropped into a modal editor for a JSON file that holds the details about the endpoint you’re configuring. Foundry You’ll need to provide a few more details by typing them into the labeled fields: id : A text field that uniquely identifies this particular entry. The choice of ID is pretty much arbitrary; if you’re using only a single model, the ID could be the model name. name : The name of the model that is used to identify it on the model server. In LM Studio, you can get this name by clicking on My Models in the main interface, then selecting the three-dot icon for the model in question and clicking Copy Default Identifier . For Qwen 2.5, for instance, name might be something like qwen2.5-coder-7b-instruct . url : The URL to the server’s endpoint. On LM Studio, this defaults to something like http://127.0.0.1:1234/v1 . The /v1 at the end is important because that endpoint is used for autodiscovery of models and their capabilities.The other fields generally don’t need editing. Most models have tool calling functionality. If you know for a fact that the model you’re using doesn’t have vision support, then set vision to false . Once you have these fields filled in, you can close the modal editor to save the changes. If you reload the Manage Language Models page, you’ll now see your new endpoint: Foundry You should now be able to launch the chat window and use the defined model for conversation and utilities: Foundry One current, and major, limitation of Visual Studio Code’s BYOK functionality is that it only works for chat and utility tasks. It doesn’t allow you to use a local model for inline suggestions or code completions. The only way to take advantage of local models for expanded functionality with VS Code https://www.infoworld.com/article/4144487/i-ran-qwen3-5-locally-instead-of-claude-code-heres-what-happened.html is to use a third-party tool like Continue https://marketplace.visualstudio.com/items?itemName=Continue.continue . It isn’t clear if Microsoft will eventually lift this restriction. GitHub Copilot integration in VS Code is a large part of how Copilot as a service reaches its target audience. For the time being, you can certainly use third-party and local models for a significant part of your AI-assisted development work in VS Code, and you can close the functionality gap with additional tooling.