How to Tell if Your Python Mock Is Actually Working

wpnews.pro

Working on something challenging? I coach developers 1:1 on the judgment behind the code, not just the syntax. How it works →

A test can pass for the wrong reason. When you're mocking a third-party API call, the test might look green because the real API happened to return an error, not because your mock did anything at all.

This came up in a recent session in our agentic AI cohort where we were looking at a test to verify that converting to an invalid currency raised an exception. The test passed. But something felt off.

The test that passed for the wrong reason #

The code under test calls the ExchangeRate API and raises CurrencyConversionError

when the response signals failure:

def convert_currency(amount: Decimal, from_currency: str, to_currency: str) -> Decimal:
    if from_currency == to_currency:
        return amount
    response = requests.get(
        f"https://v6.exchangerate-api.com/v6/{EXCHANGE_RATE_API_KEY}/pair/{from_currency}/{to_currency}"
    )
    data = response.json()
    if data["result"] != "success":
        raise CurrencyConversionError(f"{data['error-type']}")
    return Decimal(data["conversion_rate"]) * amount

The test set up a mock_response

, patched requests.get

to return it (mock_get.return_value = mock_response

), but configured it as a successful response:

mock_response.json.return_value = {
    "result": "success",   # <-- this will never raise CurrencyConversionError
    "conversion_rate": 1.5,
}

If the mock was intercepting, the function would return normally and pytest.raises

would fail. But the test was passing. That meant the mock wasn't intercepting at all: the real API was being hit, and it was returning an error for the bogus "CTM" code.

Proving the mock actually intercepted #

My instinct was to add print("calling external api")

before requests.get

. That proves the code reached that line. It does not prove whether the mock intercepted the call or the real network was hit.

At this point you can put a breakpoint()

in the actual requests.get

code in your venv, but there is a better way: mock_get.assert_called_once()

:

with pytest.raises(CurrencyConversionError):
    convert_currency(
        amount=Decimal("1.00"),
        from_currency="CAD",
        to_currency="CTM",  # Canadian Tire Money, not a real currency
    )
mock_get.assert_called_once()

If the mock was never called, this assertion fails and tells you directly: your patch didn't intercept the request. If the mock was called, the assertion passes and you know for sure that the test is relying on the mock, not the real API.

Running the test with this assertion in place settled it. Once the patch targeted the right name (the fix in the next section), the mock intercepted the call and pytest.raises

failed with DID NOT RAISE

. That flip is the proof: a real call for "CTM" would have raised, so a non-raising run means the mock was in control. The earlier green had been the real API answering, never the mock. With the success response still in place, nothing raised. Fixing the response to signal an error made the test pass for the right reason, and assert_called_once()

then confirmed the call went through the mock and not the network:

mock_get.return_value.json.return_value = {
    "result": "error",
    "error-type": "unknown-code",
}

Patch where the name is used, not where it's defined #

The currency module does import requests

then calls requests.get(...)

, so patching expenses_ai_agent.utils.currency.requests.get

targets the call site. With this import requests

style, patching requests.get

happens to work too, since both names point at the same module object. The rule bites when a module does from requests import get

: now get

is a local name in the currency module, and you must patch expenses_ai_agent.utils.currency.get

, not requests.get

. Patching the wrong location is a common mistake that leads to the mock not intercepting and the real API being called.

The cleaned-up test with pytest-mock #

Once the mock response was correct and interception was verified, the test got two more improvements. First, the intermediate mock_response

variable is unnecessary: chain directly off mock_get.return_value

, as in the snippet above. Second, pytest-mock

(added with uv add --dev pytest-mock

) replaces the nested with patch(...)

context managers with a mocker

fixture. The result is flatter and easier to scan. Annotated:

def test_bad_currency_conversion_raises(self, mocker):
    """Converting to a non-existing currency should raise an exception."""
    mock_get = mocker.patch("expenses_ai_agent.utils.currency.requests.get")
    mock_get.return_value.json.return_value = {
        "result": "error",
        "error-type": "unknown-code",
    }

    with pytest.raises(CurrencyConversionError):
        convert_currency(
            amount=Decimal("1.00"),
            from_currency="CAD",
            to_currency="CTM",
        )
    mock_get.assert_called_once()

mocker

also handles teardown automatically via the fixture lifecycle, so you don't need with

to ensure cleanup.

Another reason to mock: forcing a collision #

So far the mock has stood in for a network call. That's not the only reason to reach for one. Here's a test from my simple CRM that stores contacts as files on disk:

def create_contact(
    name: str, email: str = "", company: str = "", product: str = ""
) -> str:
    contacts_dir().mkdir(parents=True, exist_ok=True)
    code = next_code(name)
    path = contact_path(code)
    if path.exists():
        raise FileExistsError(f"Contact {code} already exists")
    path.write_text(...)
    return code

next_code

generates a unique code from the name. To test that creating two contacts with the same code raises FileExistsError

, you need both calls to produce the same code. That's nondeterministic by design, so you patch next_code

to pin it:

@patch("crm.data.next_code")
def test_cannot_create_contact_with_same_code(mock_next_code):
    mock_next_code.return_value = "jd1"
    data.create_contact("Jane Doe")
    with pytest.raises(FileExistsError):
        data.create_contact("Jane Doe")

Note the patch target again: crm.data.next_code

, where the function is used. Same rule as before. And note that's the only mock here.

Isolation matters as much as the mock, but it doesn't belong in this test. An autouse fixture already points the data dir at a fresh tmp_path

:

@pytest.fixture(autouse=True)
def crm_data(tmp_path, monkeypatch):
    monkeypatch.setenv("CRM_DATA", str(tmp_path))
    (tmp_path / "contacts").mkdir()
    return tmp_path

create_contact

calls path.write_text(...)

, so the first call writes a real jd1

file. Because every test runs against a fresh tmp_path

, that file lives only for the test: the collision can only come from the second call, nothing leaks between runs, and the test fails solely when the duplicate guard fires. Without that isolation, a leftover jd1

from a previous run makes the first call raise, pytest.raises

still passes, and you've tested nothing.

Update: I later dropped this mock for an explicit override parameter. Instead of patching next_code

, I gave create_contact

an optional code

parameter (keyword-only, so it can't be passed by accident):

def create_contact(name: str, *, email: str = "", company: str = "",
                    product: str = "", code: str | None = None) -> str:
    ...
    code = code if code is not None else next_code(name)

The test pins the code through the public surface, no patching:

def test_cannot_create_contact_with_same_code():
    data.create_contact("Jane Doe")
    with pytest.raises(FileExistsError):
        data.create_contact("Jane Doe", code="jd1")

One naming caveat, since this post points to Harry Percival's "Stop Using Mocks" below: this isn't dependency injection, tempting as it is to call it that. DI would pass next_code

itself in and let the test swap a fake. Here I pass the value the dependency would have produced, so it's really an explicit override parameter, the simpler tool. Real DI, with an injected collaborator, comes up at the end of this post.

The trade-off is worth being honest about: I added a production parameter partly to make the test simpler. That's the "test-induced design damage" critics of mocking warn about: a seam that exists only to serve tests. I think it's justified here because code

doubles as a real feature: an explicit-code escape hatch for imports or restoring from backup. The test just happens to use it. If the parameter was only added for the test, I'd consider leaving the mock.

Unit vs integration: where does this test belong? #

All this then led to a related question:

How should you organize tests that hit real external services?

The convention that holds up in practice:

tests/
├── unit/        # fast, fully mocked, no network, no secrets
└── integration/ # slower, hits real DB / LLM / API endpoints

The currency test above belongs in unit/

: it mocks requests.get

and never touches the network. A test that actually calls the ExchangeRate API to verify end-to-end behavior belongs in integration/

.

A @pytest.mark.integration

marker is a lighter-weight way to get the same split without moving files. Register it in pyproject.toml

, then skip those tests in CI with pytest -m 'not integration'

.

Both work, but the directory structure makes the distinction obvious at a glance. Explicit is better than implicit.

The practical rule: if your test needs an environment variable or some external service to do its real work, it's an integration test. Mock that dependency out and it becomes a unit test. Or put it at the boundary so you can inject a fake in unit tests and the real thing in integration tests (if still needed).

For a practical example of test organization, see this video: Python Unit vs. Functional Testing: Understanding the Difference + Practical Example.

When mocks are the wrong tool #

There's a broader point underneath all this. Every time you patch requests.get

you're writing a test that's tightly coupled to one import path. Change import requests

to from requests import get

and every patch breaks. The tests test implementation, not behavior.

I highly recommend watching Harry Percival's PyCon talk "Stop Using Mocks". He makes the case for alternatives: build an adapter class that owns the external call, write a fake in-memory implementation of it, and use dependency injection to pass it in. The repository pattern is the same idea: your test passes in a fake, your production code passes in the real thing, and neither needs patching.

Mocks are still the right choice here: we want to test one small unit whose only external dependency is well contained.

Keep reading #

Tutorials teach syntax. Courses teach patterns. AI gives unvetted code. None of them review your decisions on your code. That's what 1:1 coaching is for. Here's how it works →

source & further reading

belderbos.dev — original article Building an AI Agent in 6 Weeks (and Understanding How They Work) AI Doesn't Change What Software Engineering Is Build the Simplest Thing That Works