Working on something challenging? I coach developers 1:1 on the judgment behind the code, not just the syntax. How it works →
A test can pass for the wrong reason. When you're mocking a third-party API call, the test might look green because the real API happened to return an error, not because your mock did anything at all.
This came up in a recent session in our agentic AI cohort where we were looking at a test to verify that converting to an invalid currency raised an exception. The test passed. But something felt off.
The test that passed for the wrong reason #
The code under test calls the ExchangeRate API and raises CurrencyConversionError
when the response signals failure:
def convert_currency(amount: Decimal, from_currency: str, to_currency: str) -> Decimal:
if from_currency == to_currency:
return amount
response = requests.get(
f"https://v6.exchangerate-api.com/v6/{EXCHANGE_RATE_API_KEY}/pair/{from_currency}/{to_currency}"
)
data = response.json()
if data["result"] != "success":
raise CurrencyConversionError(f"{data['error-type']}")
return Decimal(data["conversion_rate"]) * amount
The test set up a mock_response
, patched requests.get
to return it (mock_get.return_value = mock_response
), but configured it as a successful response:
mock_response.json.return_value = {
"result": "success", # <-- this will never raise CurrencyConversionError
"conversion_rate": 1.5,
}
If the mock was intercepting, the function would return normally and pytest.raises
would fail. But the test was passing. That meant the mock wasn't intercepting at all: the real API was being hit, and it was returning an error for the bogus "CTM" code.
Proving the mock actually intercepted #
My instinct was to add print("calling external api")
before requests.get
. That proves the code reached that line. It does not prove whether the mock intercepted the call or the real network was hit.
At this point you can put a breakpoint()
in the actual requests.get
code in your venv, but there is a better way: mock_get.assert_called_once()
:
with pytest.raises(CurrencyConversionError):
convert_currency(
amount=Decimal("1.00"),
from_currency="CAD",
to_currency="CTM", # Canadian Tire Money, not a real currency
)
mock_get.assert_called_once()
If the mock was never called, this assertion fails and tells you directly: your patch didn't intercept the request. If the mock was called, the assertion passes and you know for sure that the test is relying on the mock, not the real API.
Running the test with this assertion in place settled it. Once the patch targeted the right name (the fix in the next section), the mock intercepted the call and pytest.raises
failed with DID NOT RAISE
. That flip is the proof: a real call for "CTM" would have raised, so a non-raising run means the mock was in control. The earlier green had been the real API answering, never the mock. With the success response still in place, nothing raised. Fixing the response to signal an error made the test pass for the right reason, and assert_called_once()
then confirmed the call went through the mock and not the network:
mock_get.return_value.json.return_value = {
"result": "error",
"error-type": "unknown-code",
}
Patch where the name is used, not where it's defined #
The currency module does import requests
then calls requests.get(...)
, so patching expenses_ai_agent.utils.currency.requests.get
targets the call site. With this import requests
style, patching requests.get
happens to work too, since both names point at the same module object. The rule bites when a module does from requests import get
: now get
is a local name in the currency module, and you must patch expenses_ai_agent.utils.currency.get
, not requests.get
. Patching the wrong location is a common mistake that leads to the mock not intercepting and the real API being called.
The cleaned-up test with pytest-mock #
Once the mock response was correct and interception was verified, the test got two more improvements. First, the intermediate mock_response
variable is unnecessary: chain directly off mock_get.return_value
, as in the snippet above. Second, pytest-mock
(added with uv add --dev pytest-mock
) replaces the nested with patch(...)
context managers with a mocker
fixture. The result is flatter and easier to scan. Annotated:
def test_bad_currency_conversion_raises(self, mocker):
"""Converting to a non-existing currency should raise an exception."""
mock_get = mocker.patch("expenses_ai_agent.utils.currency.requests.get")
mock_get.return_value.json.return_value = {
"result": "error",
"error-type": "unknown-code",
}
with pytest.raises(CurrencyConversionError):
convert_currency(
amount=Decimal("1.00"),
from_currency="CAD",
to_currency="CTM",
)
mock_get.assert_called_once()
mocker
also handles teardown automatically via the fixture lifecycle, so you don't need with
to ensure cleanup.
Another reason to mock: forcing a collision #
So far the mock has stood in for a network call. That's not the only reason to reach for one. Here's a test from my simple CRM that stores contacts as files on disk:
def create_contact(
name: str, email: str = "", company: str = "", product: str = ""
) -> str:
contacts_dir().mkdir(parents=True, exist_ok=True)
code = next_code(name)
path = contact_path(code)
if path.exists():
raise FileExistsError(f"Contact {code} already exists")
path.write_text(...)
return code
next_code
generates a unique code from the name. To test that creating two contacts with the same code raises FileExistsError
, you need both calls to produce the same code. That's nondeterministic by design, so you patch next_code
to pin it:
@patch("crm.data.next_code")
def test_cannot_create_contact_with_same_code(mock_next_code):
mock_next_code.return_value = "jd1"
data.create_contact("Jane Doe")
with pytest.raises(FileExistsError):
data.create_contact("Jane Doe")
Note the patch target again: crm.data.next_code
, where the function is used. Same rule as before. And note that's the only mock here.
Isolation matters as much as the mock, but it doesn't belong in this test. An autouse fixture already points the data dir at a fresh tmp_path
:
@pytest.fixture(autouse=True)
def crm_data(tmp_path, monkeypatch):
monkeypatch.setenv("CRM_DATA", str(tmp_path))
(tmp_path / "contacts").mkdir()
return tmp_path
create_contact
calls path.write_text(...)
, so the first call writes a real jd1
file. Because every test runs against a fresh tmp_path
, that file lives only for the test: the collision can only come from the second call, nothing leaks between runs, and the test fails solely when the duplicate guard fires. Without that isolation, a leftover jd1
from a previous run makes the first call raise, pytest.raises
still passes, and you've tested nothing.
Update: I later dropped this mock for an explicit override parameter. Instead of patching next_code
, I gave create_contact
an optional code
parameter (keyword-only, so it can't be passed by accident):
def create_contact(name: str, *, email: str = "", company: str = "",
product: str = "", code: str | None = None) -> str:
...
code = code if code is not None else next_code(name)
The test pins the code through the public surface, no patching:
def test_cannot_create_contact_with_same_code():
data.create_contact("Jane Doe")
with pytest.raises(FileExistsError):
data.create_contact("Jane Doe", code="jd1")
One naming caveat, since this post points to Harry Percival's "Stop Using Mocks" below: this isn't dependency injection, tempting as it is to call it that. DI would pass next_code
itself in and let the test swap a fake. Here I pass the value the dependency would have produced, so it's really an explicit override parameter, the simpler tool. Real DI, with an injected collaborator, comes up at the end of this post.
The trade-off is worth being honest about: I added a production parameter partly to make the test simpler. That's the "test-induced design damage" critics of mocking warn about: a seam that exists only to serve tests. I think it's justified here because code
doubles as a real feature: an explicit-code escape hatch for imports or restoring from backup. The test just happens to use it. If the parameter was only added for the test, I'd consider leaving the mock.
Unit vs integration: where does this test belong? #
All this then led to a related question:
How should you organize tests that hit real external services?
The convention that holds up in practice:
tests/
├── unit/ # fast, fully mocked, no network, no secrets
└── integration/ # slower, hits real DB / LLM / API endpoints
The currency test above belongs in unit/
: it mocks requests.get
and never touches the network. A test that actually calls the ExchangeRate API to verify end-to-end behavior belongs in integration/
.
A @pytest.mark.integration
marker is a lighter-weight way to get the same split without moving files. Register it in pyproject.toml
, then skip those tests in CI with pytest -m 'not integration'
.
Both work, but the directory structure makes the distinction obvious at a glance. Explicit is better than implicit.
The practical rule: if your test needs an environment variable or some external service to do its real work, it's an integration test. Mock that dependency out and it becomes a unit test. Or put it at the boundary so you can inject a fake in unit tests and the real thing in integration tests (if still needed).
For a practical example of test organization, see this video: Python Unit vs. Functional Testing: Understanding the Difference + Practical Example.
When mocks are the wrong tool #
There's a broader point underneath all this. Every time you patch requests.get
you're writing a test that's tightly coupled to one import path. Change import requests
to from requests import get
and every patch breaks. The tests test implementation, not behavior.
I highly recommend watching Harry Percival's PyCon talk "Stop Using Mocks". He makes the case for alternatives: build an adapter class that owns the external call, write a fake in-memory implementation of it, and use dependency injection to pass it in. The repository pattern is the same idea: your test passes in a fake, your production code passes in the real thing, and neither needs patching.
Mocks are still the right choice here: we want to test one small unit whose only external dependency is well contained.
Keep reading #
Tutorials teach syntax. Courses teach patterns. AI gives unvetted code. None of them review your decisions on your code. That's what 1:1 coaching is for. Here's how it works →