cd /news/large-language-models/how-to-run-deepseek-locally-on-your-… · home topics large-language-models article
[ARTICLE · art-38120] src=pub.towardsai.net ↗ pub= topic=large-language-models verified=true sentiment=· neutral

How to Run DeepSeek Locally on Your Own Computer, and the Catch Most Guides Skip

DeepSeek's open-source reasoning model, which matched top closed systems on math and coding benchmarks, can be run locally via Ollama, but most users will only access smaller distilled versions, not the full headline model requiring data-center hardware. The distilled models retain significant reasoning ability for ordinary computers, offering a practical trade-off for privacy and cost.

read11 min views1 publishedJun 24, 2026

DeepSeek became famous overnight as the open model that matched the best reasoning systems for a fraction of the cost. You can run it on your own machine, free and private, and watch it think through problems step by step. But there is a catch almost every excited tutorial glosses over, the version that made headlines is not the version you are actually running. Here is how to run it, which model truly fits your hardware, and the honest truth about what you get.

DeepSeek arrived in January 2025 and did something open models were not supposed to be able to do. It matched the best closed reasoning models on hard math and coding, it was released under a permissive license anyone could use commercially, and it had reportedly been trained for a tiny fraction of what its rivals spent. It became a genuine phenomenon, the rare AI story that escaped the tech world and made the evening news. And because it is open, you can download it and run it on your own computer, with no subscription and nothing sent to anyone’s servers.

Here’s the part the excited tutorials tend to skip, though, and it matters. The DeepSeek that made headlines, the full model that matched the frontier, is enormous, and you almost certainly can’t run it on your own machine. What you can run, and what nearly everyone actually runs, is a set of smaller versions that carry DeepSeek’s reasoning in a much lighter package. They are genuinely useful, and for most people they are the right answer, but they’re not the same thing as the model from the headlines. So this guide does two things, it shows you exactly how to run DeepSeek locally with real commands, and it’s honest about which version you are really getting and what that means.

First, the distinction the whole thing hinges on.

DeepSeek’s famous reasoning model comes in two very different forms, and understanding the difference is the key to a setup that does not disappoint you.

The full model is a giant. It uses a mixture-of-experts design with hundreds of billions of parameters, and running it requires data-center hardware, multiple high-end GPUs and hundreds of gigabytes of memory. This is the version that matched the best closed models in the benchmarks you read about. It’s not something you run on a laptop or even a powerful gaming desktop, and any guide implying otherwise is setting you up for frustration.

What you actually run at home are the distilled versions. DeepSeek took its big model’s reasoning behavior and trained it into a range of much smaller models, a process called distillation, where a small model learns to imitate the reasoning patterns of the large one. These distilled models come in sizes from tiny to fairly large, and they run comfortably on ordinary hardware. They keep a real portion of the reasoning ability, enough to be genuinely useful and to clearly outperform normal models of the same size on math and logic, but they’re not the full model and they fall short of it on the hardest problems. That’s the honest trade. You’re getting most of the reasoning behavior in a package you can actually run, not the headline model itself. Keep that framing and you’ll be happy with the result. Forget it, and you’ll wonder why your laptop isn’t matching the benchmarks.

With that settled, here is how to run one.

Running DeepSeek locally is genuinely easy, because a free tool called Ollama handles all the hard parts, the download, the memory management, the GPU acceleration, behind a single command. The whole process is four steps, and the only real decision, which size to download, is covered in the next section, so glance at that first to find your model name, then come back here.

First, install Ollama. Go to ollama.com, download the version for your system, and install it. On a Mac you drag it to your Applications folder, on Windows it installs like any normal program, and on Linux it is a single command in the terminal.

Second, open a terminal and confirm it works. On a Mac, open the Terminal app, found in Applications under Utilities or by searching Spotlight for Terminal. On Windows, open PowerShell from the Start menu. On Linux, open your usual terminal. Then type the command below, and a version number means you are ready.

ollama --version

Third, pull and run your DeepSeek model. Use the size you picked from the next section. As an example, the command below downloads and starts the 8-billion version, which is a good default for most laptops. The first run downloads the model, which takes a few minutes, then drops you straight into a chat where you simply type your question at the prompt.

ollama run deepseek-r1:8b

Fourth, test it and exit. Ask it a reasoning question, a math word problem is a good one, and watch it think through the answer. When you are done, type /bye to leave the chat. To start it again later, you run the same ollama run command, and since the model is already downloaded, it starts instantly.

To use the model from your own apps or scripts rather than the terminal, Ollama also runs a local server at http://localhost:11434 that anything on your machine can send requests to, which is how you would wire it into a code editor or a program.

That’s the whole mechanical process. The only real decision is which size to pull, so here is how to choose.

DeepSeek’s distilled models come in a wide range, and matching the size to your hardware is what separates a smooth experience from a sluggish one. Here’s the honest breakdown, with the command for each.

ollama run deepseek-r1:1.5b   # ~2 GB, runs on almost anything, for testingollama run deepseek-r1:8b     # ~8 GB RAM, the everyday pick for most laptopsollama run deepseek-r1:14b    # 12-16 GB, the value sweet spotollama run deepseek-r1:32b    # ~24 GB VRAM (RTX 3090/4090), best local qualityollama run deepseek-r1:70b    # ~40 GB, needs dual GPUs or a high-memory Mac

On a basic laptop or any modest machine, start with the small distills. The smallest version runs on almost anything, needing only a couple of gigabytes of memory, and it’s useful mostly for testing that your setup works. The 7-billion or 8-billion version is the realistic everyday pick, needing around 8 gigabytes of memory, and it produces reasoning that is meaningfully better than ordinary models its size. For most people on a normal laptop, the 8-billion model is the right starting point.

On a machine with 16 gigabytes of memory or a midrange graphics card, the 14-billion version is the value sweet spot. It needs roughly 12 to 16 gigabytes, produces consistently strong reasoning on math, coding, and logic, and is the size where the reasoning starts to feel genuinely capable rather than just better-than-small. If your hardware can hold it, the 14-billion model is the choice that gives most people the best balance.

On a desktop with a strong graphics card, around 24 gigabytes of video memory like an RTX 3090 or 4090, the 32-billion version is the local sweet spot for quality. It uses roughly 20 gigabytes, runs fast on that hardware, and delivers reasoning close to the full model on many tasks, which is about as good as local reasoning gets on a single consumer card. There are larger distills still, a 70-billion version that needs around 40 gigabytes and therefore dual GPUs or a high-memory Mac, but for almost everyone the 32-billion model is the practical ceiling and a genuinely powerful place to land.

The way to check whether you chose well is to run ollama ps in another terminal while the model is loaded. If it shows a high CPU percentage, the model is spilling out of your graphics memory onto your processor, which works but is slow, and the fix is to drop to the next size down. A fast smaller model beats a crawling larger one for nearly everything.

One thing will stand out the moment you run it, and it’s worth understanding because it’s the whole point of this model. Before DeepSeek gives you an answer, it shows its thinking, a stream of reasoning wrapped in think tags where it works through the problem, considers approaches, catches its own mistakes, and only then arrives at a final answer.

This isn’t a quirk, it’s the feature. Most models hand you a conclusion and hide how they got there. DeepSeek lets you watch the reasoning unfold, which is genuinely useful, because you can see where its logic is solid and where it went astray, and that transparency helps you trust or question the answer rather than taking it on faith. For debugging code or checking a math result, watching the step-by-step work is often more valuable than the answer itself. If you find the thinking output distracting for simple tasks, you can turn it off, in Ollama with the command /set parameter think false typed in the chat, but for anything that benefits from reasoning, leaving it on is the reason you are running this model in the first place.

This model behaves a little differently from a standard chatbot, and a few non-obvious settings make a real difference.

First, raise the context length. Ollama defaults to a fairly short context, and because DeepSeek’s reasoning chains can run long, that default will cut its thinking off mid-thought. The fix is one command typed inside the chat session, which raises the context for that session.

Set it to at least 8192, and 16384 is a safer choice that gives the model room to reason fully. Second, the prompting advice is the opposite of what you might expect. Do not tell DeepSeek to think step by step, because it already reasons internally and the instruction interferes with its native process. Do not lean on system prompts or worked examples either, since this model tends to do better with a direct question and all your instructions placed in the main message. A temperature setting around 0.6 is the recommended sweet spot, low enough to stay coherent and high enough to avoid repetitive output, and you can set it the same way with /set parameter temperature 0.6 inside the chat. And finally, if you are wiring the model into an automated workflow that expects clean structured output, be aware that the visible thinking tags appear before the answer and can confuse a program parsing the response, so you strip them out before passing the result downstream.

Because of where DeepSeek comes from, privacy questions come up often, and the honest answer depends entirely on how you use it.

The privacy concerns you may have read about relate to DeepSeek’s hosted service, its website and its cloud API, where your data is processed on the company’s servers and is subject to the laws and policies that govern them. That’s a legitimate consideration for the hosted product. But running the open model locally through Ollama is a completely different situation. You are using only the downloaded model weights on your own hardware, with no connection to the company’s servers at all, so nothing you type ever leaves your machine. This is, in fact, one of the strongest reasons to run it locally rather than use the app, you get the model’s capability with none of the data leaving home. For sensitive work, local is the answer, and it sidesteps the concern entirely.

Put it together and you have a capable reasoning model running privately on your own computer, for free. You can hand it a hard math problem, a logic puzzle, or a buggy piece of code, watch it reason through the steps, and get an answer you can actually inspect, all without sending anything to anyone. For a lot of people, that is a genuinely powerful tool to own outright.

Just hold on to the honest framing this guide opened with. You’re running a distilled version that carries most of DeepSeek’s reasoning, not the full headline model, and within that reality the experience is excellent, especially at the larger sizes your hardware can support. Pick the biggest distill your machine runs comfortably, give its reasoning room to breathe with a longer context, and resist the urge to over-instruct it. Do that, and you have one of the most capable open reasoning models thinking through your problems on your own machine, in about five minutes and at no cost. That is a striking thing to be able to say, and the only trick is being clear-eyed about which version you are actually getting.

This is one in a set of hands-on guides to running the major open models yourself. If you run DeepSeek locally, drop a comment with your hardware, the distill size you settled on, and how the reasoning held up on your real tasks. The honest experience of people actually running these models is worth more than any benchmark table.

How to Run DeepSeek Locally on Your Own Computer, and the Catch Most Guides Skip was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

── more in #large-language-models 4 stories · sorted by recency
── more on @deepseek 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-run-deepseek-…] indexed:0 read:11min 2026-06-24 ·