Stop Saying "It Works on My Machine": Docker for AI Engineers

wpnews.pro

You trained the model. The notebook runs. The demo works. You push it to a teammate, and forty minutes later you get the message every engineer dreads:

"Hey, I'm getting a CUDA error. And

torch

won't import. And what version of Python is this?"

And you say the seven words that have haunted software since the dawn of time:

"But it works on my machine."

Here's the uncomfortable truth: "it works on my machine" isn't a defense. It's a confession. It means your code depends on something living on your laptop that you never wrote down a Python version, a system library, a CUDA toolkit, a stray environment variable, a model file sitting in ~/Downloads

.

Docker is how you stop making that confession. Let's fix this.

A typical web app has a handful of dependencies. An AI project has layers of them, and each layer can betray you:

torch

, transformers

, numpy

, and the version conflicts between them.libgl1

or ffmpeg

that pip won't install for you.requirements.txt

captures one of those five layers. Docker captures all of them. That's the whole pitch.

Forget the whale logo and the buzzwords for a second.

A Docker image is a frozen snapshot of a complete computer: the operating system, Python, your packages, your code, and your config, all baked into one file. A container is a running copy of that snapshot.

The mental model that makes it click: a virtual machine simulates an entire computer including its own operating system kernel, which is heavy and slow. A container shares your machine's kernel and only packages everything above it. So it boots in seconds, not minutes, and a single image runs identically on your laptop, your teammate's laptop, and a cloud GPU server.

You write the recipe once. Everyone gets the exact same kitchen.

A Dockerfile

is just that recipe, a plain text file of instructions. Here's a real one for a PyTorch project, with every line explained:

FROM python:3.11-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    libgl1 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "predict.py"]

Two beginner mistakes this avoids:

1. Pin your versions. Your requirements.txt

should look like this, not just bare package names:

torch==2.3.1
transformers==4.41.2
fastapi==0.111.0
uvicorn==0.30.1
numpy==1.26.4

torch

without a version is a future outage waiting to happen. The whole point of Docker is reproducibility, don't undermine it by letting versions float.

2. Copy requirements.txt before your code. Docker builds in layers and caches each one. If you copy everything at once, changing a single line of code forces it to reinstall

torch

(a multi-minute download) every single build. By copying requirements first, Docker reuses the cached install layer and only re-runs steps that actually changed. Your build goes from minutes to seconds.To build and run it:

docker build -t my-model .
docker run my-model

That -t my-model

just names the image. The .

tells Docker to look for the Dockerfile

in the current folder. That's it, you now have a portable, reproducible model.

Beginners often COPY model.bin

straight into the image. Don't. A 5GB image is painful to build, push, and pull, and you'll rebuild it every time the weights change.

Instead, keep large files outside the image and mount them at runtime with a volume, a shared folder between your machine and the container:

docker run -v $(pwd)/models:/app/models my-model

This maps your local models/

folder to /app/models

inside the container. The weights live on disk, the image stays lean, and you can swap models without rebuilding anything.

Most of the time you don't just want to run a script, you want a model behind an endpoint your app can call. Here's a minimal FastAPI server, app.py

:

from fastapi import FastAPI
from pydantic import BaseModel
import torch

app = FastAPI()

model = torch.load("/app/models/model.pt", map_location="cpu")
model.eval()

class Request(BaseModel):
    text: str

@app.post("/predict")
def predict(req: Request):
    with torch.no_grad():
        result = model(req.text)
    return {"prediction": result}

@app.get("/health")
def health():
    return {"status": "ok"}

Notice the model loads once when the server boots, not inside predict()

. weights on every request will make your API crawl, a mistake that's easy to miss until production traffic hits.

Now adjust the Dockerfile's last line to launch the server instead of a script:

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

That --host 0.0.0.0

matters. Inside a container, the default 127.0.0.1

means "only reachable from inside this container", your requests from outside would bounce. Binding to 0.0.0.0

makes it reachable. Then map the port when you run it:

docker run -p 8000:8000 -v $(pwd)/models:/app/models my-model

-p 8000:8000

connects port 8000 on your machine to 8000 in the container. Hit http://localhost:8000/predict

and you're serving a model from a container.

Real AI apps rarely run alone. You've got your model API, plus maybe a Redis cache for results and a vector database for embeddings. Starting three containers by hand, with the right flags and in the right order, gets old fast.

docker-compose lets you define your whole stack in one docker-compose.yml

file:

services:
  model-api:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models
    depends_on:
      - redis

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  vector-db:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - ./qdrant_data:/qdrant/storage

Then the entire stack starts with one command:

docker compose up

One command, three services, wired together and talking to each other. And because services can reach each other by name, your API connects to Redis at the host redis:6379

, no IP addresses to chase down. Shut it all down with docker compose down

. This is the moment most people fall in love with Docker.

A short list of things worth doing from day one:

.dockerignore

file..gitignore

, it keeps junk out of your image. At minimum: __pycache__

, .git

, venv

, *.pt

, and data/

. Without it, you'll accidentally copy gigabytes of cache and datasets into your build.-slim

or official ML base images.python:3.11-slim

over the full image saves hundreds of megabytes. For GPU work, start from an official CUDA-enabled base like pytorch/pytorch

so the driver stack is handled for you.-e MY_KEY=...

or an .env

file), never hardcoded into the Dockerfile. Anyone with the image can read what's baked in.Go back to that teammate who couldn't run your model. With Docker, the entire conversation becomes:

git clone your-repo
docker compose up

Two commands. Same Python, same CUDA, same packages, same everything on their laptop, on the cloud GPU, on the production server. No "what version are you on?" No "did you install ffmpeg?" No 40-minute debugging session.

You don't need to master Kubernetes or become a DevOps engineer to get this. You just need a Dockerfile

, a pinned requirements.txt

, and maybe a docker-compose.yml

. Start with the PyTorch example above, get one model running in a container today, and build from there.

The next time someone asks if your project works on their machine, you'll already know the answer.

It works on every machine.

Found this useful? Drop a comment with the trickiest "works on my machine" bug you've hit

source & further reading

dev.to — original article Top AI Papers on Hugging Face - 2026-08-03 Beyond the Hype: Why 'Cognitive Debt' and LSP Integration Are the Real Bottlenecks in the AI-Coding Era Bringing an External CRM's Chats into Firestore for AI Search: Vector Search, Webhooks, and a Stubborn Bundling Error

Stop Saying "It Works on My Machine": Docker for AI Engineers

Run your AI side-project on zahid.host