Understanding PyTorch’s Test Infrastructure

PyTorch's test infrastructure dynamically generates test names across devices and dtypes, causing CI failures to show names like TestLinalgCUDA.test_matmul_cuda_float32 that differ from source templates. The system uses device-generic tests, OpInfos, and CI sharding to validate thousands of combinations automatically. For local debugging, pytest -k and test/run_test.py are recommended for reproducing generated test failures.

Featured projects TL;DR - PyTorch tests are often generated at import time, so CI failures may show device/dtype-specific names that differ from the source template. - For local debugging, pytest -k and test/run test.py are usually the fastest ways to reproduce generated test failures. - Device-generic tests, operator metadata through OpInfos, and CI sharding are the key pieces to understand when contributing or debugging PyTorch tests. PyTorch tests are often generated dynamically across devices and dtypes, which is why test names in CI may look different from the class and method names in the source file. This post explains how device-generic tests, OpInfos, instantiate device type tests , and CI sharding fit together, and how contributors can run and debug PyTorch tests more effectively. Why PyTorch Testing Feels Different If you have ever opened a pull request against PyTorch, watched a generated test like TestLinalgCUDA.test matmul cuda float32 fail in CI, and wondered where that name came from – or tried running a test by its source name and got “no tests collected” – this guide is for you. PyTorch’s test infrastructure https://github.com/pytorch/pytorch/wiki/Running-and-writing-tests is built for scale. Depending on the decorators used and operator metadata provided through OpInfos, section1 a single test method https://github.com/pytorch/pytorch/blob/main/test/test ops.py can expand across multiple devices, dtypes, and operators automatically. That is what lets PyTorch validate thousands of combinations without thousands of handwritten tests. But it also means the test you write in the source file is not always the exact test that CI runs, which can be confusing the first time you encounter it. Note: Many helpers discussed in this guide live under torch.testing. internal, which is PyTorch’s internal test infrastructure. If you are testing your own project, use public APIs like pytest and torch.testing.assert close https://docs.pytorch.org/docs/2.12/testing.html instead. The Naming Mystery: Why “No Tests Collected”? One of the first confusing moments for new PyTorch contributors is trying to run a test by the class and method name they see in the source file: pytest test/test torch.py::TestTorch::test matmul In many PyTorch test files, this may return “no tests collected.” That is usually not because the test is missing. It is because the class in the source file is a template, not the final class that the test runner sees. When the file is imported, instantiate device type tests expands the template into concrete device-specific classes such as TestTorchCPU, TestTorchCUDA, or TestTorchMPS. If the test is also parameterized by dtype, the generated method name may include the device and dtype as well, for example test matmul cuda float32. These generated classes are built from the original template class and PyTorch’s device-specific test bases, so they still inherit the shared behavior provided by PyTorch’s internal TestCase. For local debugging, it is usually easier to filter by the generated test name pattern instead of targeting the original template class directly: pytest test/test torch.py -k "test matmul" pytest test/test torch.py -k "test matmul cuda float32" Once you know that PyTorch generates the runnable test names during import, CI failures become much easier to map back to the source test. How Device-Generic Tests Work PyTorch runs across devices such as CPU, CUDA, MPS, and XPU, and many tests need to validate behavior across float16, float32, float64, bfloat16, integer, and other dtypes. Writing a separate test for every device and dtype combination would quickly become a maintenance nightmare. So PyTorch uses test templates. You write one test method with device and dtype parameters: python def test basic self, device, dtype : ... When Python imports the test file, instantiate device type tests expands that template across the selected device types and dtypes. For example, one template class can produce generated classes such as TestMatmulCPU, TestMatmulCUDA, and TestMatmulMPS, with generated methods such as test basic cuda float32. Figure 1: Test Class Hierarchy & Instantiation Flow The generated names follow this pattern: <ClassName <DEVICE .<method <device <dtype So a template like TestMatmul.test basic may become TestMatmulCUDA.test basic cuda float32. The device appears in uppercase in the class name and lowercase in the method name. This is why CI failures show generated names instead of only the template name you wrote. The generated name tells you exactly which device and dtype failed. The Architecture at a Glance PyTorch’s test infrastructure is easier to understand as a set of connected layers. Contributors typically interact with the middle layers: device instantiation, parametrization decorators, OpInfos, and test utilities. CI orchestration sits above them, while base utilities provide the shared foundation. Figure 2: PyTorch Testing Architecture at a Glance Key files contributors often encounter | File | What it does | |---|---| | torch/testing/ internal/common device type.py https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common device type.py torch/testing/ internal/opinfo/core.py https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/opinfo/core.py torch/testing/ internal/common methods invocations.py https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common methods invocations.py test/run test.py https://github.com/pytorch/pytorch/blob/main/test/run test.py OpInfos: Testing Operators Through Metadata OpInfos are metadata entries that describe how a PyTorch operator should be tested. Instead of writing a separate test for every operator, PyTorch uses generic test templates that read OpInfo metadata and run the same checks across many operators. An OpInfo can define things like the operator name, variants, supported dtypes, sample inputs, expected skips, decorators, and tolerance rules. Generic tests in files such as test ops.py then consume op db through @ops … , which passes the selected op, device, and dtype into the test. This is how one operator entry can participate in many kinds of coverage: forward correctness, dtype and device behavior, gradient checks, compile-related paths, and Meta/FakeTensor-style validation – depending on the test and the operator metadata. So when you see a generated test such as TestCommonCUDA.test variant consistency eager torch matmul cuda float32, it usually means a generic OpInfo-based test is running against the torch.matmul OpInfo for a specific device and dtype. The @ops … decorator is one example of PyTorch’s broader parametrization pattern. For non-operator cases, tests can also use @parametrize … to generate variants over custom values such as modes, shapes, layouts, or configuration flags. PyTorch also provides @modules for module-specific tests. The idea across these patterns is the same: keep one test body and let the test infrastructure generate the useful combinations. For example, a test can use @parametrize … to run the same test body across multiple custom values: python @parametrize "reduction", "mean", "sum" def test loss self, device, dtype, reduction : ... Running Tests Locally For day-to-day debugging, start with pytest -k. It works well with generated test names and avoids relying on template class names that may no longer be directly discoverable after test instantiation. Run tests matching a generated name pattern pytest test/test torch.py -k "test matmul" Run a specific generated device/dtype case pytest test/test torch.py -k "test matmul cuda float32" -x For CI-like runs, use test/run test.py. It is the PyTorch test runner used for running test files, affected-test selection, and CI-related behavior such as sharding. Run a test file through the PyTorch test runner python test/run test.py test torch Check available options, including shard-related flags python test/run test.py -h Environment variables are also useful when you want CI-like behavior locally. For example, PYTORCH TESTING DEVICE ONLY FOR narrows tests to selected device types, PYTORCH TEST WITH SLOW=1 includes tests marked with @slowTest, and PYTORCH TEST WITH DYNAMO=1 runs regular PyTorch tests with TorchDynamo coverage. Debugging CI Failures When a PyTorch CI job fails, the most useful details are usually the generated test name and its device/dtype suffix, and the shard. When a PyTorch CI job fails, the most useful details are usually the generated test name and its device/dtype suffix. Since tests are expected to be atomic, start by reproducing the specific generated test locally with pytest -k. Shard information can help locate the CI job, but the generated test name is usually the key detail for reproduction. Dr. CI and Failure Triage On PyTorch pull requests, contributors may also see automated Dr. CI comments. Dr. CI helps summarize failing jobs, group recurring failure patterns, and point contributors toward relevant logs. It is not a replacement for reading the full CI output, but it is often a useful starting point for triage. A practical debugging flow is: start with the Dr. CI summary, open the failing job logs on hud.pytorch.org https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per page=50 , identify the generated test name and shard, then reproduce the failure locally with pytest -k or run test.py. Figure 3: PyTorch CI Testing Pipeline Flow Common CI-only failures usually come from environment differences, test pollution, sharding assumptions, or numeric precision differences. Tests should not depend on execution order or global state left behind by another test. Common Pitfalls Targeting template names directly: Use pytest -k filters or generated class and method names instead. After test templates are instantiated, the original template name may not be directly discoverable. Using torch.randn in dtype-generic tests: torch.randn works for floating-point and complex inputs but fails on integer and boolean dtypes. Prefer make tensor for dtype-generic tests as it handles all dtype categories while still requiring explicit device and dtype. Hardcoding devices: Use the device argument provided by the generated test instead of constants like device=”cuda”. This keeps the test portable across device types. Quick Reference Decorators and helpers | Decorator / helper | Purpose | |---|---| | @ops … https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common device type.py L1257 @onlyCUDA https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common device type.py L1874 / @onlyCPU https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common device type.py L1870 @onlyAccelerator https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common device type.py L1899 @skipIfTorchDynamo “reason” https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common utils.py L1917 @toleranceOverride {…} https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common device type.py L1782 load tests https://github.com/pytorch/pytorch/blob/main/torch/testing/ internal/common utils.py L5444 Useful environment variables | Variable | Purpose | |---|---| | PYTORCH TESTING DEVICE ONLY FOR | Run tests only for a selected device | | PYTORCH TEST WITH SLOW | Include slow tests | | PYTORCH TEST WITH DYNAMO | Run tests under torch.compile coverage | | EXPECTTEST ACCEPT | Update expected output snapshots | Summary PyTorch’s testing infrastructure is designed to make a very large test suite manageable. Device-generic templates become concrete tests through instantiate device type tests , OpInfos centralize operator metadata, and CI sharding splits test execution across workers. The key idea is that the test name you see in CI is often a generated name, not just the source-level template you wrote. Once you learn to read that generated name – the test, device, dtype, and sometimes operator – debugging becomes much easier. Further Reading For more details, refer to the official PyTorch testing documentation and source files: PyTorch Wiki: Running and Writing Tests Contributor guide for running tests, selecting generated tests, and understanding PyTorch’s test workflow. This document lists the environment variables used by the test system. Public Testing APIs: torch.testing documentation Public testing APIs such as torch.testing.assert close for projects outside the PyTorch repository. Device-Generic Test Infrastructure: common device type.py Device-generic testing utilities, including instantiate device type tests , @dtypes, @ops, and device-specific test instantiation. OpInfo Core Definitions: opinfo/core.py Core OpInfo definitions, sample input metadata, dtype support, skips, decorators, and tolerance configuration. Operator Registry: common methods invocations.py The op db registry that collects OpInfo entries used by generic operator tests. CI Test Runner: test/run test.py PyTorch’s CI-style test runner, including test selection, sharding, and affected-test execution.