Understanding PyTorch’s Test Infrastructure

wpnews.pro

Featured projects

TL;DR

PyTorch tests are often generated at import time, so CI failures may show device/dtype-specific names that differ from the source template.
For local debugging, pytest -k and test/run_test.py are usually the fastest ways to reproduce generated test failures.
Device-generic tests, operator metadata through OpInfos, and CI sharding are the key pieces to understand when contributing or debugging PyTorch tests.

PyTorch tests are often generated dynamically across devices and dtypes, which is why test names in CI may look different from the class and method names in the source file. This post explains how device-generic tests, OpInfos, instantiate_device_type_tests(), and CI sharding fit together, and how contributors can run and debug PyTorch tests more effectively.

Why PyTorch Testing Feels Different

If you have ever opened a pull request against PyTorch, watched a generated test like TestLinalgCUDA.test_matmul_cuda_float32 fail in CI, and wondered where that name came from – or tried running a test by its source name and got “no tests collected” – this guide is for you.

PyTorch’s test infrastructure is built for scale. Depending on the decorators used and operator metadata provided through OpInfos, a single test method can expand across multiple devices, dtypes, and operators automatically. That is what lets PyTorch validate thousands of combinations without thousands of handwritten tests. But it also means the test you write in the source file is not always the exact test that CI runs, which can be confusing the first time you encounter it.

Note: Many helpers discussed in this guide live under torch.testing._internal, which is PyTorch’s internal test infrastructure. If you are testing your own project, use public APIs like pytest and torch.testing.assert_close instead.

The Naming Mystery: Why “No Tests Collected”?

One of the first confusing moments for new PyTorch contributors is trying to run a test by the class and method name they see in the source file:

pytest test/test_torch.py::TestTorch::test_matmul

In many PyTorch test files, this may return “no tests collected.” That is usually not because the test is missing. It is because the class in the source file is a template, not the final class that the test runner sees.

When the file is imported, instantiate_device_type_tests() expands the template into concrete device-specific classes such as TestTorchCPU, TestTorchCUDA, or TestTorchMPS. If the test is also parameterized by dtype, the generated method name may include the device and dtype as well, for example test_matmul_cuda_float32. These generated classes are built from the original template class and PyTorch’s device-specific test bases, so they still inherit the shared behavior provided by PyTorch’s internal TestCase.

For local debugging, it is usually easier to filter by the generated test name pattern instead of targeting the original template class directly:

pytest test/test_torch.py -k "test_matmul"
pytest test/test_torch.py -k "test_matmul_cuda_float32"

Once you know that PyTorch generates the runnable test names during import, CI failures become much easier to map back to the source test.

How Device-Generic Tests Work

PyTorch runs across devices such as CPU, CUDA, MPS, and XPU, and many tests need to validate behavior across float16, float32, float64, bfloat16, integer, and other dtypes. Writing a separate test for every device and dtype combination would quickly become a maintenance nightmare.

So PyTorch uses test templates. You write one test method with device and dtype parameters:

def test_basic(self, device, dtype):
    ...

When Python imports the test file, instantiate_device_type_tests() expands that template across the selected device types and dtypes. For example, one template class can produce generated classes such as TestMatmulCPU, TestMatmulCUDA, and TestMatmulMPS, with generated methods such as test_basic_cuda_float32.

Figure 1: Test Class Hierarchy & Instantiation Flow

The generated names follow this pattern:

<ClassName><DEVICE>.<method>_<device>_<dtype>

So a template like TestMatmul.test_basic may become TestMatmulCUDA.test_basic_cuda_float32. The device appears in uppercase in the class name and lowercase in the method name.

This is why CI failures show generated names instead of only the template name you wrote. The generated name tells you exactly which device and dtype failed.

The Architecture at a Glance

PyTorch’s test infrastructure is easier to understand as a set of connected layers. Contributors typically interact with the middle layers: device instantiation, parametrization decorators, OpInfos, and test utilities. CI orchestration sits above them, while base utilities provide the shared foundation.

Figure 2: PyTorch Testing Architecture at a Glance

Key files contributors often encounter

File	What it does

torch/testing/_internal/common_device_type.py torch/testing/_internal/opinfo/core.py torch/testing/_internal/common_methods_invocations.py test/run_test.py### OpInfos: Testing Operators Through Metadata

OpInfos are metadata entries that describe how a PyTorch operator should be tested. Instead of writing a separate test for every operator, PyTorch uses generic test templates that read OpInfo metadata and run the same checks across many operators.

An OpInfo can define things like the operator name, variants, supported dtypes, sample inputs, expected skips, decorators, and tolerance rules. Generic tests in files such as test_ops.py then consume op_db through @ops(…), which passes the selected op, device, and dtype into the test.

This is how one operator entry can participate in many kinds of coverage: forward correctness, dtype and device behavior, gradient checks, compile-related paths, and Meta/FakeTensor-style validation – depending on the test and the operator metadata.

So when you see a generated test such as TestCommonCUDA.test_variant_consistency_eager_torch_matmul_cuda_float32, it usually means a generic OpInfo-based test is running against the torch.matmul OpInfo for a specific device and dtype.

The @ops(…) decorator is one example of PyTorch’s broader parametrization pattern. For non-operator cases, tests can also use @parametrize(…) to generate variants over custom values such as modes, shapes, layouts, or configuration flags. PyTorch also provides @modules for module-specific tests. The idea across these patterns is the same: keep one test body and let the test infrastructure generate the useful combinations.

For example, a test can use @parametrize(…) to run the same test body across multiple custom values:

@parametrize("reduction", ["mean", "sum"])
def test_loss(self, device, dtype, reduction):
    ...

Running Tests Locally

For day-to-day debugging, start with pytest -k. It works well with generated test names and avoids relying on template class names that may no longer be directly discoverable after test instantiation.

pytest test/test_torch.py -k "test_matmul"

pytest test/test_torch.py -k "test_matmul_cuda_float32" -x

For CI-like runs, use test/run_test.py. It is the PyTorch test runner used for running test files, affected-test selection, and CI-related behavior such as sharding.

python test/run_test.py test_torch

python test/run_test.py -h

Environment variables are also useful when you want CI-like behavior locally. For example, PYTORCH_TESTING_DEVICE_ONLY_FOR narrows tests to selected device types, PYTORCH_TEST_WITH_SLOW=1 includes tests marked with @slowTest, and PYTORCH_TEST_WITH_DYNAMO=1 runs regular PyTorch tests with TorchDynamo coverage.

Debugging CI Failures

When a PyTorch CI job fails, the most useful details are usually the generated test name and its device/dtype suffix, and the shard. When a PyTorch CI job fails, the most useful details are usually the generated test name and its device/dtype suffix. Since tests are expected to be atomic, start by reproducing the specific generated test locally with pytest -k. Shard information can help locate the CI job, but the generated test name is usually the key detail for reproduction.

Dr. CI and Failure Triage

On PyTorch pull requests, contributors may also see automated Dr. CI comments. Dr. CI helps summarize failing jobs, group recurring failure patterns, and point contributors toward relevant logs. It is not a replacement for reading the full CI output, but it is often a useful starting point for triage.

A practical debugging flow is: start with the Dr. CI summary, open the failing job logs on hud.pytorch.org, identify the generated test name and shard, then reproduce the failure locally with pytest -k or run_test.py.

Figure 3: PyTorch CI Testing Pipeline Flow

Common CI-only failures usually come from environment differences, test pollution, sharding assumptions, or numeric precision differences. Tests should not depend on execution order or global state left behind by another test.

Common Pitfalls

Targeting template names directly: Use pytest -k filters or generated class and method names instead. After test templates are instantiated, the original template name may not be directly discoverable.Using torch.randn in dtype-generic tests: torch.randn works for floating-point and complex inputs but fails on integer and boolean dtypes. Prefer make_tensor for dtype-generic tests as it handles all dtype categories while still requiring explicit device and dtype.Hardcoding devices: Use the device argument provided by the generated test instead of constants like device=”cuda”. This keeps the test portable across device types.

Quick Reference

Decorators and helpers

Decorator / helper	Purpose

@ops(…)@onlyCUDA/@onlyCPU @onlyAccelerator @skipIfTorchDynamo(“reason”)@toleranceOverride({…})load_tests#### Useful environment variables

Variable	Purpose
PYTORCH_TESTING_DEVICE_ONLY_FOR	Run tests only for a selected device
PYTORCH_TEST_WITH_SLOW	Include slow tests
PYTORCH_TEST_WITH_DYNAMO	Run tests under torch.compile coverage
EXPECTTEST_ACCEPT	Update expected output snapshots

Summary

PyTorch’s testing infrastructure is designed to make a very large test suite manageable. Device-generic templates become concrete tests through instantiate_device_type_tests(), OpInfos centralize operator metadata, and CI sharding splits test execution across workers.

The key idea is that the test name you see in CI is often a generated name, not just the source-level template you wrote. Once you learn to read that generated name – the test, device, dtype, and sometimes operator – debugging becomes much easier.

Further Reading

For more details, refer to the official PyTorch testing documentation and source files:

PyTorch Wiki: Running and Writing Tests

Contributor guide for running tests, selecting generated tests, and understanding PyTorch’s test workflow. This document lists the environment variables used by the test system.

Public Testing APIs: torch.testing documentation

Public testing APIs such as torch.testing.assert_close for projects outside the PyTorch repository.

Device-Generic Test Infrastructure: common_device_type.py

Device-generic testing utilities, including instantiate_device_type_tests(), @dtypes, @ops, and device-specific test instantiation.

OpInfo Core Definitions: opinfo/core.py

Core OpInfo definitions, sample input metadata, dtype support, skips, decorators, and tolerance configuration.

Operator Registry: common_methods_invocations.py

The op_db registry that collects OpInfo entries used by generic operator tests.

CI Test Runner: test/run_test.py

PyTorch’s CI-style test runner, including test selection, sharding, and affected-test execution.

source & further reading

pytorch.org — original article Building the Future of On-Device AI at the ExecuTorch Hackathon Shopify Joins the PyTorch Foundation as a Platinum Member Miles: A PyTorch-Native Stack for Large-Scale LLM RL Post-Training

Understanding PyTorch’s Test Infrastructure

Featured projects

Key files contributors often encounter

Dr. CI and Failure Triage

Decorators and helpers

Run your AI side-project on zahid.host