Good type hints lead to code that is more maintainable, easier to understand, and with fewer bugs. If you'd like a quick, general intro into why, see this article, but suffice it to say that types give us a way to automatically check assumptions and invariants [1]. There are ways to go further (see "Formal Methods", including the
What's that?
Ah, I see...
I come from a background where I used TypeScript frequently. TypeScript has, in my opinion, the best type system of any mainstream programming language by far. Python's type system isn't as good, but it isn't horrible either. We have the tools to do better than this! And to be clear, some AI Safety libraries do this well! Inspect is a great example. More should follow their lead.
Most common objections to static typing are well addressed in the article I referenced earlier, but there are a couple objections specific to AI Safety:
The idea here is that since AI can understand much larger sections of the codebase, we no longer need to ourselves for the shape of our data in the absence of types to tell us. We can just have the AI do it for us! But there is some evidence pointing in the opposite direction. A 2024 paper by Blinn et al. argues that "AIs need IDEs too", and that AI agents using static type checkers get better results. Types can "tame hallucinations" and provide the hill-climbing feedback that LLMs need to be successful at coding. Some have found that type hints lead to easier code reviews and more maintainable AI-generated code.
The pushback here is in two parts:
For number 1, what if you're just hacking something together that isn't going to make it into the final published repo? Won't types just slow you down then? In that case, yes, you may decide that full, well-specified types aren't worth it. But if you're planning to reuse any of the code, really at all, you'll probably end up being faster in the long run if you add good types.
For number 2, published research code-bases shouldn't be though of as one-off. Wolter and Veeramacheneni argue that good software engineering practices would benefit the ML research community through easier reproduction, and I would add, extension. Good types make it much easier for researchers that come after you (or even yourself, a few months later, or your coding agent) to understand what is going on in the codebase and reuse what you've done. Otherwise, we risk wasting a lot of valuable researcher time! The ultimate example of this are packages that are explicitly designed to be reused. If nothing else, these packages should be well typed!
While HuggingFace's libraries aren't AI Safety specific per se, they are very commonly used in AI Safety research, and I've found them to have particularly bad type hints. For example, HuggingFace's Dataset
class isn't generic. It doesn't tell us anything about the shape of the data in the class! Some parts of the Dataset
interface are difficult to type correctly with Python's type system (such as indexing on a column name), but others are relatively straightforward (such as indexing on a row, iterating the dataset, or using .map
, mentioned earlier). I was frustrated enough by this that I've created a small package that wraps some common functions and methods from datasets
, making them generic over a row TypedDict
: https://github.com/Plyb/typed-datasets. It provides an escape hatch of going back to plain hf datasets for cases that aren't easily handled, but this is much better than nothing!
Implementing good type hints for your code will speed up AI Safety research and make it more trustworthy. We are doing ourselves a huge disservice when we leave this powerful tool on the table. What can you do? At the bare minimum, annotate function parameters with basic types. Is this a dict
or a tuple
? Going further would be to specify the contents of compound types (see TypedDict
) or making your functions and classes generic. Finally, if you're planning on anyone else using your code in the future (including yourself!), include types that are as specific as you can get them (such as using @overload
keyed on Literal
flag parameters).