{"slug": "record-type-inference-for-dummies", "title": "Record type inference for dummies", "summary": "A developer explains the basics of type inference for anonymous records, arguing that good type inference for such records is a key missing feature in statically typed languages. The post introduces type theory notation and inference rules for record types, aiming to bridge the gap between programming language experts and lay programmers.", "body_md": "The reason I'm writing this post is because I actually wanted to write a more advanced post on type inference for anonymous records, but then I realized that most of my readers wouldn't understand the latter post in isolation. So I figured I would write this introductory post to teach people new to type theory the basics.\n\nThe reason I'm writing *both* posts is because I believe that good type inference for anonymous records is one of the big things holding back statically typed languages 1 and not enough people appreciate this or understand why. I also think that there is a large disconnect between what programming language experts understand is possible and what lay programmers are comfortable or familiar with. To make a play on\n\n[XKCD #2501](https://xkcd.com/2501/):\n\nSo this post (and the next one) are going to be an exposition of where the field of type theory was at **over three decades ago**, which our industry still hasn't really caught up to, yet.\n\n## Anonymous records\n\nI mentioned that this is a post about type inference for *anonymous records* and I've met some programmers who either don't know what anonymous records are or at least don't recognize them by that name, so I'll briefly touch upon them.\n\nAn anonymous record is a record that doesn't require an associated datatype declaration, and they're quite common in dynamically typed languages. For example, JavaScript calls them \"objects\":\n\n```\n{ name: \"Alice\", age: 25 }\n```\n\nPython calls them \"dictionaries\":\n\n```\n{ \"name\": \"Alice\", \"age\": 25 }\n```\n\nRuby calls them \"hashes\":\n\n``` js\n{ :name => \"Alice\", :age => 25 }\n\n# … or equivalently:\n{ name: \"Alice\", age: 25 }\n```\n\n…, and Nix calls them \"attribute sets\":\n\n```\n{ name = \"Alice\"; age = 25; }\n```\n\nIf you've ever worked with JSON then you've worked with anonymous records because JSON objects (just like JavaScript objects) are anonymous records.\n\n## Static typing\n\nA smaller number of statically typed programming languages support anonymous records because statically typed languages usually prefer named datatypes.\n\nFor example, Haskell does not support anonymous records and requires a datatype declaration for all records, like this:\n\n```\ndata Person = Person{ name :: Text, age :: Integer }\n\nexample :: Person\nexample = Person{ name = \"Alice\", age = 25 }\n```\n\n… but there are still some statically typed programming languages that do support anonymous records, like TypeScript:\n\n```\n{ name: \"Alice\", age: 25 } : { name: string, age: number }\n```\n\n… or C# (which calls them \"anonymous types\"):\n\n```\nnew { Name = \"Alice\"; Age = 25 }\n```\n\n… or PureScript (which calls them records):\n\n```\n{ name: \"Alice\", age: 25 } :: { name :: String, age :: Int }\n```\n\nIf all we needed was language support for record literals without any operations on records then it's fairly easy to infer their types. To do so, though, I'm going to introduce some type theory notation.\n\nLet's start by defining a basic abstract syntax tree that says that an expression (\n\n- a\n(e.g. ) - a\n(e.g. ) - a record containing 0 or more fields (e.g.\n)\n\nThe way we would write that 2 formally is:\n\n… and the equivalent Haskell code would be something like:\n\n```\ntype Identifier = Text\n\ndata Expression\n    = Boolean Bool\n    | String Text\n    | Number Double\n    | Record (Map Identifier Expression)\n```\n\nFields can store arbitrary expressions, which means you can nest records, like:\n\nWe'll also need to define an abstract syntax tree for our inferred types, which can be either:\n\n- the\ntype - the\ntype - the\ntype - a record type containing 0 or more fields (e.g.\n)\n\nThe notation we'd use for that is:\n\nThe equivalent Haskell code would be something like:\n\n```\ndata Type\n    = BooleanType\n    | StringType\n    | NumberType\n    | RecordType (Map Identifier Type)\n```\n\nNow that we have a syntax for expressions and types we can define some type inference rules:\n\nand always have type - a\n(like ) always has type - a\n(like ) always has type\n\nWe write out those rules using this notation:\n\nIf you've never seen this notation before, you can think of it as mathematical pseudocode for how to implement a type inference function. The equivalent Haskell function would be something like:\n\n```\ninfer\n    :: [(Identifier, Type)]\n    -- ^ context, a.k.a. \"Γ\" (currrently unused)\n    -> Expression\n    -- ^ input expression\n    -> Either Text Type\n    -- ^ output inferred type (currently never fails)\ninfer context (Boolean _) = return BooleanType\ninfer context (String _) = return StringType\ninfer context (Number _) = return NumberType\n```\n\nNote:You can find the complete Haskell code in the Appendix.\n\nNow suppose that we wanted to infer the type of a record literal like this one:\n\nNormally we'd reason about the expression's type by hand like this:\n\n-\nto infer the type of\n\nwe need to infer the type of each field: A. first, infer the type of\n\n(which is ) B. then, infer the type of\n\n(which is ) -\nnow combine those into the final record type:\n\nType theorists have a notation for that sort of reasoning process, which looks like this:\n\nThat is known as a \"typing derivation\" and the way it works is that \"outer\" reasoning steps (like steps 1 and 2) go on the bottom and \"inner\" reasoning steps (like steps A and B) go on top.\n\nIf we take that reasoning process and generalize it to all records we might write something like this:\n\n… which you can read as saying \"if you want to infer the type of a record then infer the type of each field and replace each field with its inferred type\". In Haskell this would be:\n\n```\ninfer context (Record fields) = do\n    fieldTypes <- traverse (infer context) fields\n    \n    return (RecordType fieldTypes)\n```\n\n… and we can verify this all works in the Haskell REPL:\n\n```\nghci> :set -XOverloadedStrings -XOverloadedLists\nghci> infer [] (Record [(\"name\", String \"Alice\"), (\"age\", Number 25)])\nRight (RecordType (fromList [(\"age\",NumberType),(\"name\",StringType)]))\n```\n\n## Field access\n\nAny programming language worth its salt 3 will also support record field access using something like dot notation (e.g.\n\nFor example, in Nix that would look like this:\n\n```\nnix-repl> { name = \"Alice\"; age = 25; }.name\n\"Alice\"\n```\n\nSo we'll add a type inference rule for field access, but first we need to extend our expression syntax to support dot notation:\n\n… and now we can add this type inference rule:\n\nThis says that we can access a field named\n\nThe equivalent Haskell code would be something like:\n\n```\ndata Expression =\n    …\n    | FieldAccess Expression Identifier\n\n…\n\ninfer context (FieldAccess expression field) = do\n    expressionType <- infer context expression\n    \n    case expressionType of\n        RecordType fieldTypes ->\n            case Map.lookup field fieldTypes of\n                Just fieldType ->\n                    return fieldType\n                Nothing ->\n                    Left \"missing field\"\n        _ ->\n            Left \"not a record\"\n```\n\nThis is the first type inference rule we've added that can fail. If we were to infer the type of an expression like\n\nHowever, the rule would also reject a field access if the field is missing, too. If we were to infer the type of *also* doesn't match our type inference rule so we would reject the expression with a type error (\"missing field\").\n\nBefore we move on, let's test out this type inference rule on an example. Suppose that we want to infer the type of this expression:\n\nOur reasoning process might go something like:\n\n- to know the type of the field access (\n) I need the type of the record - to know the type of the record I need the type of the\nfield - the type of the\nfield (set to ) is\n\n- the type of the\n- the type of the record is\n\n- to know the type of the record I need the type of the\n- the type of the field access (\n) is\n\n… and the equivalent formal derivation would be:\n\n… and we can confirm in Haskell that the inferred type is indeed\n\n```\nexampleRecord :: Expression\nexampleRecord = Record [(\"name\", String \"Alice\"), (\"age\", Number 25)]\n\nexampleAccess :: Expression\nexampleAccess = FieldAccess exampleRecord \"age\"\nghci> infer [] exampleAccess\nRight NumberType\n```\n\n## Variables\n\nYou might wonder why we don't just write the rule like this:\n\nIn other words, why don't we consult the expression's *value* instead of the expression's *type* when inferring the field's type? After all, that would make our reasoning process much more direct for that last example:\n\n- to know the type of the field access (\n) I need the type of the field (set to ) - the type of\nis\n\n- the type of\n- the type of the field access (\n) is\n\n… and the equivalent formal derivation would also be simpler:\n\nConsulting the value instead of the type would work for our current (incredibly simple) programming language, but would no longer work once we add support for variables because then an expression like this would be rejected:\n\nYou can read that as assigning an expression (\n\nWe can't consult *could* evaluate our expression first to get *before* you evaluate the expression in order to catch mistakes before evaluation begins 4.\n\nOn that note, let's go ahead and add variables and variable assignment to our very minimal language:\n\nNow\n\n… which evaluates to\n\nNote:I'm not going to spell out the rules for evaluation in this post since I'm just focused on explaining type inference.\n\nThe type inference rule for\n\nThis says that in order to infer the type of a\n\n```\ndata Expression =\n    …\n    | Let [(Identifier, Expression)] Expression\n\n…\n\ninfer context (Let [] expression) = do\n    infer context expression\n\ninfer context (Let ((x, assignment) : assignments) expression) = do\n    assignmentType <- infer context assignment\n    \n    infer ((x, assignmentType) : context) (Let assignments expression)\n```\n\nThis rule pairs with the type inference rule for variables, which is:\n\n… which you can read as saying \"to infer the type of a variable named\n\n```\ndata Expression =\n    …\n    | Variable Identifier\n\ninfer context (Variable identifier) = do\n    case lookup identifier context of\n        Just assignmentType ->\n            return assignmentType\n        Nothing ->\n            Left \"unbound variable\"\n```\n\nArmed with those two new rules we can now write out a typing derivation to infer the type of our earlier example:\n\nThis is essentially saying:\n\nhas type because… has type because … has type\n\n- … therefore\nhas type\n\n- … therefore\nhas type\n\n… and we can confirm that in Haskell, too:\n\n```\nexampleLet :: Expression\nexampleLet =\n    Let [(\"r\", Record [(\"x\", Number 1)])]\n        (FieldAccess (Variable \"r\") \"x\")\nghci> infer [] exampleLet\nRight NumberType\n```\n\n## Functions\n\nIf that's all our programming language needed then type inference for anonymous records would be easy and most statically typed programming languages would support anonymous records. However, every programming language also supports functions and that's where type inference begins to get tricky.\n\nTo see why, consider this TypeScript function:\n\n``` js\nconst getName = person => person.name;\n```\n\n… which in the lambda calculus would be written as\n\n```\ngetName = person: person.name;\n```\n\nWhat type would we infer for the function\n\nIn order to answer that question we need to first extend our syntax to use\n\n… and we also need to add a new syntax for function types:\n\n… where\n\nThen we can add a type inference rule for functions:\n\nThis is the first type inference rule we've written that can't just be directly translated to Haskell code as written (it *is* possible to codify using [unification](https://en.wikipedia.org/wiki/Unification_(computer_science)#Application:_type_inference) or similar, but that's outside the scope of this post). However, you can read that rule as saying that\n\nEven without code, we can still use that rule to reason through what the type for\n\nIn other words,\n\nHowever, this is not *exactly* the type most programming languages would infer or expect. Specifically, most programming languages do not support an ellipsis like\n\n### Subtyping\n\nSome statically typed programming languages (like TypeScript) treat a record type like\n\nUnder this approach the ellipsis is redundant because all record types have an implied ellipsis.\n\nThat means that in TypeScript you can write:\n\n``` js\nconst getName = (person: { name: string }) => person.name;\n```\n\n… and `getName`\n\nwill still work even if you call it on a larger record (e.g. `{ name: \"Alice\", age: 25 }`\n\n) 5.\n\nThis approach works okay, but there is an even better approach for handling extra record fields, which brings us to:\n\n### Row polymorphism\n\nThe second approach is that instead of *dropping* the ellipsis we *name* the ellipsis, meaning that instead of this:\n\n… we write something like this:\n\nThis is the approach taken by languages like PureScript and Elm, where PureScript will write that type like this:\n\n```\n{ name :: String | other }\n```\n\n… and Elm uses a similar syntax:\n\n```\n{ other | name : String }\n```\n\nThis has some nice upsides, which we'll get to in the next section, but first let's formalize what it means to \"name the ellipsis\".\n\nFirst, we'd create a new class of identifier (known as a \"row variable\"):\n\n\"Row\" is an antiquated name for \"a set of fields\". Remember that the research into this sort of type inference occurred a long time ago, before JSON even existed. However, the upside of the name is that type theorists get to be a little cheeky and use the greek letter\n\n(\"rho\") to represent a \"row\".\n\nThen we'd change the abstract syntax for record types to permit an optional row variable:\n\n… and update our type inference rule for field access to use a row variable instead of an ellipsis:\n\nThis approach of using named ellipses is known as \"row polymorphism\" because it lets us abstract (be \"polymorphic\") over the set of other fields (\"row\").\n\n## Record extension\n\nYou might wonder: why do we want to abstract over the set of other fields? In\nparticular, why do we need to give a *name* to these other fields?\n\nOne reason why is to support our next record operator: record extension. We'll use this syntax:\n\n… which you can read as \"extend the record\n\nSo let's add that to our abstract syntax for expressions:\n\nNow what type would we infer for a record update? Our first stab at type inference rule might look like this:\n\n… but that is **not** correct because the field", "url": "https://wpnews.pro/news/record-type-inference-for-dummies", "canonical_source": "https://haskellforall.com/2026/06/record-type-inference-for-dummies", "published_at": "2026-06-23 12:46:26+00:00", "updated_at": "2026-06-24 00:51:18.282894+00:00", "lang": "en", "topics": ["developer-tools", "machine-learning"], "entities": ["Haskell", "TypeScript", "C#", "PureScript", "Nix", "JavaScript", "Python", "Ruby"], "alternates": {"html": "https://wpnews.pro/news/record-type-inference-for-dummies", "markdown": "https://wpnews.pro/news/record-type-inference-for-dummies.md", "text": "https://wpnews.pro/news/record-type-inference-for-dummies.txt", "jsonld": "https://wpnews.pro/news/record-type-inference-for-dummies.jsonld"}}