# Using Visual Studio Code’s ‘air-gapped’ AI model mode

> Source: <https://www.infoworld.com/article/4186817/using-visual-studio-codes-air-gapped-ai-model-mode.html>
> Published: 2026-06-24 09:00:00+00:00

Microsoft has been pushing hard to make [Visual Studio Code](https://www.infoworld.com/article/2335960/what-is-visual-studio-code-microsofts-extensible-code-editor.html) a major way to consume its AI services, mostly in the form of [GitHub Copilot](https://www.infoworld.com/article/3609013/github-copilot-everything-you-need-to-know.html). GitHub Copilot’s deep integration with VS Code brings many conveniences — inline autocomplete, for instance — but it’s frustrating for those, like me, who would rather use another model provider, or even a locally hosted LLM, for those functions.

Visual Studio Code 1.122 introduced a new feature, “[Use BYOK [Bring Your Own Key] without a GitHub sign-in](https://code.visualstudio.com/updates/v1_122#_use-byok-without-a-github-sign-in),” that allows you to “use chat, tools, and MCP servers in air-gapped or restricted environments where GitHub sign-in isn’t possible.” More importantly, it “enables fully offline workflows with local models like Ollama.”

In other words, you can now use locally hosted LLMs for chat, tools, and [Model Context Protocol](https://www.infoworld.com/article/4029634/what-is-model-context-protocol-how-mcp-bridges-ai-and-external-services.html) servers inside Visual Studio Code. The one thing you still can’t do is use a local LLM for inline and next-edit suggestions — at least, not without additional tooling.

If you want to use a local LLM with VS Code’s bring-your-own-model system, the first thing you need is a way to host the model. VS Code lacks a model-hosting mechanism of its own, although it’s conceivable that a VS Code extension may offer something like that in the future. That said, hosting models is complicated enough that a dedicated app is really needed for the job.

One easy way to host models is via a product like [LM Studio](https://www.infoworld.com/article/4127250/first-look-run-llms-locally-with-lm-studio.html), a convenient GUI for standing up, serving, and managing LLMs on one’s own hardware. The model host does not have to be the same system you run VS Code on, either. It can be on a server box you control, or on a cloud instance.

The choice of model is also important. Many models are powerful but won’t run well on commodity hardware because they’re simply too big. A good rule of thumb is to choose a model that fits into existing VRAM, along with the memory needed for a sizable token context (the more, the better). Also, the model should be suited to coding and development work. Some models in this vein that fit comfortably into 8GB VRAM include:

Once you have a model up and running, you can integrate it with Visual Studio Code. If you’ve disabled VS Code’s AI features, you will need to turn them on. Make sure the setting `chat.disableAIFeatures`

is turned off. You can find it in `Settings | Chat | Miscellaneous`

.

Third-party language models are managed through Visual Studio Code’s language model list. Press `Ctrl-Shift-P`

and type `Manage Language Models`

to open the list of existing language models.

Foundry

First you will see a list of the built-in models, which are all externally hosted. To add a new model, select `Add Models`

at the top right and select `Custom Endpoint`

.

You’ll then get a series of prompts:

`Chat Completions`

, `Responses`

, and `Messages`

. Most of the time you’ll want to use `Responses`

, as it’s the most general-purpose option of the three.Once you finish providing those answers, you’ll be dropped into a modal editor for a JSON file that holds the details about the endpoint you’re configuring.

Foundry

You’ll need to provide a few more details by typing them into the labeled fields:

`id`

: A text field that uniquely identifies this particular entry. The choice of ID is pretty much arbitrary; if you’re using only a single model, the ID could be the model name.`name`

: The name of the model that is used to identify it on the model server. In LM Studio, you can get this name by clicking on `My Models`

in the main interface, then selecting the three-dot icon for the model in question and clicking `Copy Default Identifier`

. For Qwen 2.5, for instance, `name`

might be something like `qwen2.5-coder-7b-instruct`

.`url`

: The URL to the server’s endpoint. On LM Studio, this defaults to something like `http://127.0.0.1:1234/v1`

. The `/v1`

at the end is important because that endpoint is used for autodiscovery of models and their capabilities.The other fields generally don’t need editing. Most models have tool calling functionality. If you know for a fact that the model you’re using doesn’t have vision support, then set `vision`

to `false`

.

Once you have these fields filled in, you can close the modal editor to save the changes. If you reload the `Manage Language Models`

page, you’ll now see your new endpoint:

Foundry

You should now be able to launch the chat window and use the defined model for conversation and utilities:

Foundry

One current, and major, limitation of Visual Studio Code’s BYOK functionality is that it only works for chat and utility tasks. It doesn’t allow you to use a local model for inline suggestions or code completions. The only way to [take advantage of local models for expanded functionality with VS Code](https://www.infoworld.com/article/4144487/i-ran-qwen3-5-locally-instead-of-claude-code-heres-what-happened.html) is to use a third-party tool like [Continue](https://marketplace.visualstudio.com/items?itemName=Continue.continue).

It isn’t clear if Microsoft will eventually lift this restriction. GitHub Copilot integration in VS Code is a large part of how Copilot as a service reaches its target audience. For the time being, you can certainly use third-party and local models for a significant part of your AI-assisted development work in VS Code, and you can close the functionality gap with additional tooling.
