A Call for Better Type Hints in AI Safety Tooling

Researchers and developers in AI safety are calling for improved type hinting in Python-based AI safety tooling, citing evidence that static typing reduces bugs and improves code maintainability. A 2024 study by Blinn et al. found that AI agents using static type checkers achieve better results, with types helping to "tame hallucinations" in LLM-generated code. The push for better type hints comes as AI safety libraries like Inspect demonstrate effective implementation, while others such as TransformerLens and HuggingFace datasets suffer from underspecified type annotations that create footguns for beginners.

Good type hints lead to code that is more maintainable, easier to understand https://link.springer.com/article/10.1007/s10664-013-9289-1 , and with fewer bugs https://blog.acolyer.org/2017/09/19/to-type-or-not-to-type-quantifying-detectable-bugs-in-javascript/ . If you'd like a quick, general intro into why, see this article https://itnext.io/typescript-static-or-dynamic-64bceb50b93e , but suffice it to say that types give us a way to automatically check assumptions and invariants 1 https://www.lesswrong.com/feed.xml fn-1 . There are ways to go further see "Formal Methods", including the What's that? TransformerLens from https://github.com/TransformerLensOrg/TransformerLens/blob/59a828a98bda340f11429038f4fdda10706303bc/transformer lens/model bridge/bridge.py L2154def run with hooks self, input: Union str, List str , torch.Tensor , fwd hooks: List Tuple Union str, Callable , Callable = , Both fwd hooks and bwd hooks actually expect a HookFunction a type which TransformerLens defines , not just a Callable , but that isn't declared here. Easy footgun as a beginner if you don't create the correct hook function shapes bwd hooks: List Tuple Union str, Callable , Callable = , .... - Any: MACHIAVELLI Benchmark from https://github.com/aypan17/machiavelli/blob/ebb44e197f663603427882c609dbc9d5fb883d63/machiavelli/game/player.pyclass Player: def init self, game, data=None, print text=False, buffered print=False, watch history=False : What is game ? data ? If I wanted to construct a Player , how would I do so? HuggingFace datasets from https://github.com/huggingface/datasets/blob/8474a918565b6d55c7c81b39a487d1b79127f7ea/src/datasets/arrow dataset.py L3214def map self, function: Optional Callable = None, Callable here is underspecified. It's shape depends on with indices and batched , but we could use @overload and at least specify the number of parameters and whether they are integers, dicts, or dicts of lists with indices: bool = False, batched: bool = False, .... - "Dataset": Ah, I see... I come from a background where I used TypeScript frequently. TypeScript has, in my opinion, the best type system of any mainstream programming language by far. Python's type system isn't as good, but it isn't horrible either. We have the tools to do better than this And to be clear, some AI Safety libraries do this well Inspect https://github.com/UKGovernmentBEIS/inspect ai is a great example. More should follow their lead. Most common objections to static typing are well addressed in the article I referenced earlier https://itnext.io/typescript-static-or-dynamic-64bceb50b93e , but there are a couple objections specific to AI Safety: The idea here is that since AI can understand much larger sections of the codebase, we no longer need to ourselves for the shape of our data in the absence of types to tell us. We can just have the AI do it for us But there is some evidence pointing in the opposite direction. A 2024 paper by Blinn et al. https://arxiv.org/pdf/2409.00921 argues that "AIs need IDEs too", and that AI agents using static type checkers get better results. Types can "tame hallucinations" https://medium.com/@tl 99311/why-i-choose-typescript-for-llm-based-coding-19cbb19f3fa2 and provide the hill-climbing feedback that LLMs need to be successful at coding. Some https://yuv.ai/blog/why-ai-is-settling-the-typed-vs-untyped-debate-for-us have found that type hints lead to easier code reviews and more maintainable AI-generated code. The pushback here is in two parts: For number 1, what if you're just hacking something together that isn't going to make it into the final published repo? Won't types just slow you down then? In that case, yes, you may decide that full, well-specified types aren't worth it. But if you're planning to reuse any of the code, really at all , you'll probably end up being faster in the long run if you add good types. For number 2, published research code-bases shouldn't be though of as one-off. Wolter and Veeramacheneni https://arxiv.org/abs/2502.00902v1 argue that good software engineering practices would benefit the ML research community through easier reproduction, and I would add, extension. Good types make it much easier for researchers that come after you or even yourself, a few months later, or your coding agent to understand what is going on in the codebase and reuse what you've done. Otherwise, we risk wasting a lot of valuable researcher time The ultimate example of this are packages that are explicitly designed to be reused. If nothing else, these packages should be well typed While HuggingFace's libraries aren't AI Safety specific per se, they are very commonly used in AI Safety research, and I've found them to have particularly bad type hints. For example, HuggingFace's Dataset class isn't generic. It doesn't tell us anything about the shape of the data in the class Some parts of the Dataset interface are difficult to type correctly with Python's type system such as indexing on a column name , but others are relatively straightforward such as indexing on a row, iterating the dataset, or using .map , mentioned earlier . I was frustrated enough by this that I've created a small package that wraps some common functions and methods from datasets , making them generic over a row TypedDict : https://github.com/Plyb/typed-datasets https://github.com/Plyb/typed-datasets . It provides an escape hatch of going back to plain hf datasets for cases that aren't easily handled, but this is much better than nothing Implementing good type hints for your code will speed up AI Safety research and make it more trustworthy. We are doing ourselves a huge disservice when we leave this powerful tool on the table. What can you do? At the bare minimum, annotate function parameters with basic types. Is this a dict or a tuple ? Going further would be to specify the contents of compound types see TypedDict or making your functions and classes generic. Finally, if you're planning on anyone else using your code in the future including yourself , include types that are as specific as you can get them such as using @overload keyed on Literal flag parameters .