{"slug": "type-inference-part-1", "title": "Type Inference (Part 1)", "summary": "A tutorial series on type inference begins, covering the Damas-Hindley-Milner type system, unification, bidirectional type-checking, Algorithm J, row polymorphism, newtype declarations, type annotations, mutually recursive declarations, and side-effects/value restriction. The series assumes familiarity with abstract syntax trees, type systems, and OCaml, with code available in a companion repository.", "body_md": "# Type Inference (Part 1)\n\n[June 25, 2026](#june-25-2026)\n\nWelcome to part 1 of my tutorial series on type inference!\n\nThis post will cover\n\n- The Damas-Hindley-Milner type system\n- Unification\n- Bidirectional type-checking\n- Algorithm J\n- Row polymorphism\n- Newtype declarations\n- Type annotations\n- Mutually recursive declarations\n- Side-effects / value restriction\n\nWe assume you understand\n\n- Abstract Syntax trees\n- Vaguely understand what a type system is.\n- Have some familiarity with reading OCaml.\n\nAll the code in this article can be found in the companion repository [https://github.com/smasher164/hm_tut](https://github.com/smasher164/hm_tut). Each file under the lib directory corresponds to the features added in each section of this article.\n\nType inference is the process of taking some expression that represents (part of) a program and returning its type.\n\nIf the expression is invalid, i.e. it does some invalid operation according to the rules of your language (like adding a `bool`\n\nto a `string`\n\n), type inference will fail.\n\nIf the expression lacks the sufficient information to return a type, i.e. it is missing a binding in its scope or type information for a binding, type inference will fail.\n\nIn this way, type inference is a superset of type-checking. Another term that is used synonymously is type reconstruction.\n\nWhat type inference is useful for:\n\n- Reducing the number of type annotations you have to write.\n- Validating that a program is type-safe, i.e. the program is safe to compile and execute and won't encounter a TypeError.\n- Using types to generate code. For example, knowing the types of a struct's fields can help you determine the struct's layout.\n\n## Aside\n\nThe formal definition of type safety is that \"well-typed programs do not get stuck\". What this means is if you have an interpreter for a language, and you pass it a program that's been successfully type-checked, it will never reach an unexpected state where it doesn't know how to evaluate something. For example, a multiplication operator in a VM might expect there to be two operands on the stack. Having fewer than two operands would be unexpected.\n\nDifferent kinds of type systems have different requirements and limitations to type inference.\n\nFor instance, subtyping might allow a `List<Square>`\n\nto be passed into any function that expects a `List<Rectangle>`\n\n. Overloading might require some automatic resolution scheme (based on the number of parameters, types, etc...) to find the correct overload for a function. A borrow checker might want to infer the lifetime of a function parameter based on the lifetime of another. In the case of this article, we'll be covering an ML-like type system.\n\nML has the unique property of being able to infer the type of an expression in a program without *any* annotations. What's more is that the types it infers for an expression are as general as they can be.\n\n## Aside\n\nThis is known as the principal type property. If an expression's principal type is P and the expression can also be given the type T, you can always substitute some of the type variables in P to get T.\n\nFor most expressions, ML will return types based on the constructors used. For example, an expression like `true && false`\n\nwould be inferred as `bool`\n\n. A lambda like `fun x -> x`\n\nwhen applied to `true`\n\nwould be given the type `bool -> bool`\n\n. However, for `let`\n\nbindings, ML will perform a process called *let generalization*. What happens here is that the type of the variable bound by the `let`\n\ndeclaration will be made as polymorphic as possible.\n\nSo for example, in\n\n``` php\nlet id = fun x -> x\nin id true\n```\n\nthe variable named `id`\n\nwill be given the type `forall 'a. 'a -> 'a`\n\n. For folks more familiar with languages like Java, it'd be like if we declared `id`\n\nas\n\n```\nT id<T>(T x) {\n    return x;\n}\n```\n\nWhen `id`\n\ngets applied to `true`\n\n, its type gets *instantiated* such that `bool`\n\n(the type of `true`\n\n) can be substituted into it, resulting in a concrete instance of `id`\n\nwhose type is `bool -> bool`\n\n, where the resulting type of the expression `id true`\n\nends up being `bool`\n\n.\n\nHow do we actually do this process of *generalization* and *instantiation*? How are we able to invoke this generic function without passing in a type argument? How does the type of a function get inferred based on the type of its argument? There are certain rules, known under *Damas-Hindley-Milner* (typically shortened to HM for Hindley-Milner) type inference, that we have to follow to make inferring types like this possible.\n\nThe specific implementation of HM we will be covering in this series is called Algorithm J. Another approach to implementing HM is called Algorithm W, but we won't be covering its implementation here, as we'll expand on later. The key difference between Algorithm J and Algorithm W is that the former uses mutable references, while the latter takes a constraint generation/rewriting approach.\n\nWe'll start by addressing the question of how we infer the type of a function based on the argument it's applied to. Already, this kind of inference is more powerful than the kind present in languages like Java or Go.\n\nLet's start with our signature for `infer`\n\n.\n\n``` js\nlet rec infer (env : env) (exp : exp) : texp = ...\n```\n\n`infer`\n\ntakes the environment in the form of an `env`\n\nand an expression in the form of an `exp`\n\n, returning a typed expression in the form of `texp`\n\n.\n\n```\ntype env = (id * bind) list\n```\n\nThis is where our variable bindings will go during the type-checking process. It will grow as we introduce new bindings.\n\n`id`\n\nis just a type alias for string.\n\n```\n(* Represents identifiers like variables, type names, and type variables. *)\ntype id = string\n```\n\nWe'll declare `bind`\n\nto be a sum type, since we'll introduce other kinds of bindings (like type declarations) later.\n\n```\ntype bind =\n    | VarBind of ty (* A variable binding maps to a type. *)\n```\n\nSo for example, if the user wrote\n\n``` php\nfun x -> foo x\n```\n\nthe body of the lambda needs to see `x`\n\n. In this case, the top of `env`\n\nwould look like\n\n```\n...| \"x\": Bool |\n```\n\n`exp`\n\nrepresents an expression in our program. This is basically our abstract syntax tree (AST). For now, we'll just focus on function application, variables, and booleans.\n\n```\ntype exp =\n    | EBool of bool (* true/false *)\n    | EVar of id (* x *)\n    | ELam of id * exp (* fun x -> x *)\n    | EApp of exp * exp (* f arg *)\n```\n\n`texp`\n\nrepresents a typed expression, a.k.a an expression that holds a `ty`\n\n. There's a number of ways a type-checker could choose to define this. You can simply return a type and keep around a map from `exp |=> ty`\n\n. You could parameterize the definition of `exp`\n\nso that it can be extended with more variants or fields. In our case, we're going to take a more straightforward approach of defining another typed AST that just duplicates all the variants of our untyped AST.\n\n```\ntype texp =\n    | TEBool of bool * ty\n    | TEVar of id * ty\n    | TELam of id * texp * ty\n    | TEApp of texp * texp * ty\n```\n\nAnd the types held by `ty`\n\ncan be\n\n```\n(* A type *)\ntype ty =\n    | TyBool (* Bool *)\n    | TyArrow of ty * ty (* Function type: T1 -> T2 *)\n    | TyVar of tv ref (* Type variable: held behind a mutable reference. *)\n```\n\nWe will discuss `tv`\n\n's role shortly.\n\n[Unification](#unification)\n\nHere's the expression we're trying to type-check:\n\n``` php\n(fun x -> x) true\n```\n\nAs an AST, this looks like\n\n```\nEApp(ELam(\"x\", EVar \"x\"), EBool true)\n```\n\nWe'd like to infer types for this expression so that we get\n\n```\nTEApp(TELam(\"x\", TEVar(\"x\", ?0), ?1), TEBool(true, ?2), ?3)\n```\n\nwhere each `?`\n\nholds the type for that subexpression.\n\nHow do we fill in those holes? To start off, let's try and fill in the information we already know.\n\n- The type of\n`EBool true`\n\nshould be`TyBool`\n\n. - The return type of the lambda should be the same as its parameter's type, i.e. the type of\n`x`\n\n. - The parameter type of the lambda should be the same as the type of the lambda's argument, i.e. the type of\n`EBool true`\n\n. - The type of the application should be the same as the return type of the lambda.\n\nIf we wrote these constraints out symbolically, it would look like\n\n```\n?2 = TyBool\n?1 = TyArrow(?0, ?0)\n?1 = TyArrow(?2, ?3)\n```\n\nThe types represented by a `?`\n\nare called type *variables*. If we successfully solve for all of these type variables, we'll have type-checked our program.\n\nThe above constraints are basically a system of equations involving types. If you've taken an algebra class, you might recall that one way of solving a system of equations is substitution. That is, you get a variable on its own to one side, and plug the other side in wherever that variable shows up in the other equations.\n\nSo far, our set of solutions contains `{?2 = TyBool}`\n\n.\n\nWe can start by plugging in `TyBool`\n\nwherever we see `?2`\n\n. We then get\n\n```\n?1 = TyArrow(?0, ?0)\n?1 = TyArrow(TyBool, ?3)\n```\n\nNow we have two equations set to the same variable. Let's set them equal to each other.\n\n```\nTyArrow(?0, ?0) = TyArrow(TyBool, ?3)\n```\n\nAt this point, we recurse down and solve the corresponding types in the arrow.\n\n```\n?0 = TyBool\n?0 = ?3\n```\n\nWe can expand our solution set to `{?2 = TyBool, ?0 = TyBool, ?1 = TyArrow(TyBool, TyBool)}`\n\nSubstituting `TyBool`\n\nfor `?0`\n\n, we get\n\n```\nTyBool = ?3\n```\n\nand our final solution set is `{?2 = TyBool, ?0 = TyBool, ?1 = TyArrow(TyBool, TyBool), ?3 = TyBool}`\n\n.\n\nAnd that's it, we've inferred all the types for this expression.\n\nWhat does this look like in the failure case?\n\nLet's take the following example:\n\n``` php\n(fun f -> f true) true\n```\n\nAs an AST it looks like\n\n```\nEApp(ELam(\"f\", EApp(EVar \"f\", EBool true)), EBool true)\n```\n\nAt a glance, we can tell this shouldn't type-check. If evaluated, it's going to try applying a bool like it's a lambda.\n\nLet's generate our type equations to see what happens when we try to solve them. (Since we know that `TEBool`\n\nis going to have the type `TyBool`\n\n, we don't need an extra equation.)\n\n```\nTEApp(\n  TELam(\"f\",\n    TEApp(\n      TEVar(\"f\", ?0),\n      TEBool(true, TyBool),\n      ?1\n    ),\n    ?2,\n  ),\n  TEBool(true, TyBool),\n  ?3\n)\n```\n\nLet's try to generate type equations for this example. Here's the information we know:\n\n- The parameter type of the lambda is the same as its argument type, a\n`TyBool`\n\n. - The parameter type of the lambda is the type of\n`f`\n\n. - The type of the application is the same as the return type of the lambda.\n- The parameter type of the lambda needs to be a\n`TyArrow`\n\nto be applied to`true`\n\n. - The return type of the lambda is the return type of\n`f`\n\n.\n\nWriting these constraints out more precisely:\n\n```\n?2 = TyArrow(TyBool, ?3)\n?2 = TyArrow(?0, ?1)\n?0 = TyArrow(TyBool, ?1)\n```\n\nTo solve this system of equations, we can start by setting the two definitions of `?2`\n\nequal to each other.\n\n```\nTyArrow(TyBool, ?3) = TyArrow(?0, ?1)\n?0 = TyArrow(TyBool, ?1)\n```\n\nNow let's substitute `TyArrow(TyBool, ?1)`\n\nfor every occurrence of `?0`\n\n.\n\n```\nTyArrow(TyBool, ?3) = TyArrow(TyArrow(TyBool, ?1), ?1)\n```\n\nNow we start to recurse down both sides and...\n\n```\nTyBool = TyArrow(TyBool, ?1)\n```\n\nAnd immediately, we reach a contradiction. `TyBool`\n\nis not an arrow type -- it is just `TyBool`\n\n.\n\nSince this equation would have to hold in order for this program to type-check, the program does not type-check.\n\nThis process of solving equations on types is called *unification*. When we unify two types, we are trying to make them equal to each other by solving any type variables in them. If there's no solution to those type variables such that the two types can be made equal, then like with the previous example, the program will not type-check.\n\nThe Algorithm W that was mentioned earlier is just doing unification by building up a solution set, which is substituted in for all the type variables.\n\nHowever, building up a solution set this way gets a bit unwieldy and slow-performing, since we have to rewrite our entire set of equations every time we want to perform a substitution. That may be fine for visualizing these small examples, but in a large program there can be many thousands of constraints.\n\nWe can speed up this substitution process by observing that rewriting the equations is essentially broadcasting to the world the solution to some type variable. And \"broadcasting updates to some data\" is exactly the problem that mutable references were created to solve. We make each of those type variables a mutable reference, and any equation that mentions that variable automatically sees updates to that type variable. This is the essence of Algorithm J. When we solve that variable, we simply update the type at that reference.\n\nThis is what the `tv ref`\n\nis meant to be.\n\n```\n(* A type variable *)\ntype tv =\n    | Unbound of id\n      (* Unbound type variable: Holds the type variable's unique name. *)\n    | Link of ty (* Link type variable: Holds a reference to a type. *)\n```\n\nAn unsolved type variable is what we consider `Unbound`\n\n, as in unbound to any type. It has a unique identifier (like `\"?0\"`\n\n, `\"?42\"`\n\n).\n\nA solved type variable is what we consider \"bound\". Here, we call it `Link`\n\n, as in \"linked\" to a type. Why \"link\" and not \"bound\"? Well you can imagine a type like `ref TyVar(Link (ref TyVar(Link TyBool)))`\n\n, i.e. a chain of links that end up at some type. Linking is a convenient way to make a type equal to another one. We simply update its type variable to be a `Link`\n\nto the `other_type`\n\nlike so\n\n```\ntv := Link(other_type)\n```\n\n[Try unification](#try-unification)\n\nHere's a small tool for visualizing the process of unification. Enter a few constraints of the form `T1 = T2`\n\n(one per line), then click Step to watch each one break down and link type variables to their solutions.\n\nNow let's consider what the implementation of `infer`\n\nactually looks like. Given some `exp`\n\n, we want to return a `texp`\n\n(its typed version). Let's start by matching on the `exp`\n\n``` php\nlet rec infer (env : env) (exp : exp) : texp =\n  match exp with\n  | EBool b -> ...\n  | EVar name -> ...\n  | ELam (param, body) -> ...\n  | EApp (fn, arg) -> ...\n```\n\nWe'll take it case-by-case.\n\nThe `EBool`\n\ncase is trivially known, since its type is `TyBool`\n\n.\n\n``` php\n  | EBool b -> TEBool (b, TyBool) (* A true/false value is of type Bool. *)\n```\n\n## Aside\n\nThe formation rule for booleans looks like\n\nThis is basically an axiom that says `Bool`\n\nis a known type.\n\nThe typing rule for booleans would look like\n\nThis basically says given `b`\n\nwhich is one of `true`\n\nor `false`\n\n, `b`\n\nis of type `Bool`\n\nunder the typing context `Γ`\n\n.\n\nFor `EVar`\n\n, there is some mention of a variable -- the `x`\n\nbeing passed as an argument to `foo`\n\nin `fun x -> foo x`\n\n, for example. We want to return the type of that variable. In order to do that, this variable must have already been declared somewhere before this occurrence, like as the parameter to a lambda. Because of that, we know it should be in the environment. We just have to search our environment from the innermost scope to outermost for this variable, and grab its type. Let's write that function\n\n``` js\n(* Lookup a variable's type in the environment. *)\nlet lookup_var_type name (e : env) : ty =\n    match List.Assoc.find e ~equal name with\n    | Some (VarBind t) -> t\n    | _ -> raise (undefined_error \"variable\" name)\n```\n\nWe just call a function from the `List.Assoc`\n\nmodule and search an association list for a `VarBind`\n\nthat matches `name`\n\n.\n\nAfter invoking this helper on our environment, we get a type associated with the variable name and return it up.\n\nSo all in all, the `EVar`\n\ncase looks like\n\n``` php\n| EVar name ->\n    (* Variable is being used. Look up its type in the environment, *)\n    let var_ty = lookup_var_type name env in\n    TEVar (name, var_ty)\n```\n\n## Aside\n\nThe typing rule for variables would look like\n\nThis basically says that if `x`\n\nhas a binding to the type `T`\n\ninside the context (environment) `Γ`\n\n, then we can assume that under the context `Γ`\n\n, `x`\n\nhas the type `T`\n\n. Bewildering, I know.\n\nThe next case is `ELam`\n\n. Given some lambda like `fun param -> body`\n\n, we want to return its type, which will be a `TyArrow`\n\n. First, we need the type for `param`\n\n. If you recall from the example, `param`\n\n's type gets inferred based on the argument of the lambda. What we need to do is associate it with a fresh `Unbound`\n\ntype variable (call it `ty_param`\n\n), so then when the argument's type *is* available, the type variable will get bound to it (they will be unified).\n\nLet's take a look at how `fresh_unbound_var`\n\nlooks.\n\n`fresh_unbound_var`\n\ncreates a fresh type variable with a unique id. The unique id is based on an increasing counter variable called `gensym_counter`\n\nthat gets incremented every time we call this function. We just convert it to a string and make it the basis of the id.\n\n```\n(* Global state that stores a counter for generating fresh unbound type variables. *)\nlet gensym_counter = ref 0\n\n(* Generate a fresh unbound type variable. *)\nlet fresh_unbound_var () =\n    let n = !gensym_counter in\n    Int.incr gensym_counter;\n    let tvar = \"?\" ^ Int.to_string n in\n    TyVar (ref (Unbound tvar))\n```\n\nAfter creating a fresh type variable for the `param`\n\n, we add an entry to our `env`\n\nironment for the `param`\n\nand this type variable. Then we can type-check the body of the lambda with that new environment.\n\nFinally, the resultant type will be the arrow type going from `ty_param`\n\nwith whatever the type of the lambda's body is.\n\nSo ultimately, the `ELam`\n\ncase looks like\n\n``` php\n| ELam (param, body) ->\n    (* Instantiate a fresh type variable for the lambda parameter, and\n        extend the environment with the param and its type. *)\n    let ty_param = fresh_unbound_var () in\n    let env' = (param, VarBind ty_param) :: env in\n    (* Typecheck the body of the lambda with the extended environment. *)\n    let body = infer env' body in\n    (* Return a synthesized arrow type from the parameter to the body. *)\n    TELam (param, body, TyArrow ( ty_param, typ body ))\n```\n\nI'll briefly mention that `typ`\n\njust takes a `texp`\n\nand returns its `ty`\n\nfield, since we'll sometimes need to grab the `ty`\n\nfrom the result of a recursive call to `infer`\n\n.\n\n``` js\n(* Get the type of a typed expression. *)\nlet typ (texp : texp) : ty =\n  match texp with\n  | TEBool _ -> TyBool\n  | TEVar (_, ty) -> ty\n  | TEApp (_, _, ty) -> ty\n  | TELam (_, _, ty) -> ty\n```\n\n## Aside\n\nThe formation rule for function types looks like\n\nThis says that if `A`\n\nand `B`\n\nare types, then `A -> B`\n\nis a type.\n\nThe typing rule for lambdas looks like\n\nIf under the context `Γ`\n\n, `x`\n\n(the parameter)'s type being `A`\n\nlets us infer that `e`\n\n(the body)'s type is `B`\n\n, then we can assume that the lambda's type is `A -> B`\n\n.\n\nThe final case is `EApp`\n\n. Given some function `fn`\n\nand an argument `arg`\n\n, we want to return the type of the result after applying `fn`\n\nto `arg`\n\n. We'll first want the types of `fn`\n\nand `arg`\n\n. We can simply call `infer`\n\nrecursively on `fn`\n\nand `arg`\n\n, respectively. `fn`\n\n's type should be some `TyArrow(?A, ?B)`\n\n. `arg`\n\n's type will be some `?C`\n\n.\n\nWe know from the discussion on constraints before, that `TyArrow(?A, ?B) = TyArrow(?C, ?D)`\n\n, where `?D`\n\nis some unbound type variable. `?D`\n\nis the type we want to return for this `EApp`\n\n.\n\nTo accomplish this, we create a fresh unbound type variable to represent `?D`\n\n, a.k.a the return type of the `TyArrow`\n\n. Then we synthesize a `TyArrow(?C, ?D)`\n\ntype and unify it with `fn`\n\n's type. This is the first case where unification gets involved. The fresh type variables created by the `ELam`\n\ncase can get resolved here.\n\nAll in all, the final case of `infer`\n\n, `EApp`\n\n, looks like\n\n``` php\n| EApp (fn, arg) ->\n    (* To typecheck a function application, first infer the types of the\n        function and the argument. *)\n    let fn = infer env fn in\n    let arg = infer env arg in\n    (* Instantiate a fresh type variable for the result of the application,\n       and synthesize an arrow type going from the argument to the\n       result. *)\n    let ty_res = fresh_unbound_var () in\n    let ty_arr = TyArrow (typ arg, ty_res ) in\n    (* Unify it with the function's type. *)\n    unify (typ fn) ty_arr;\n    (* Return the result type. *)\n    TEApp (fn, arg, ty_res)\n```\n\n## Aside\n\nThe typing rules for lambda application look like\n\nIf under the context `Γ`\n\n, `f`\n\n(the lambda)'s type is `A -> B`\n\nand `x`\n\n(the argument)'s type is `A`\n\n, then `f x`\n\n(`f`\n\napplied to `x`\n\n)'s type is `B`\n\n.\n\nThe crux of implementing this case is in how `unify`\n\nworks, so let's discuss that now.\n\n``` js\n(* Unify two types. If they are not unifiable, raise an exception. *)\nlet rec unify (t1 : ty) (t2 : ty) : unit =\n    (* Follow all the links. If we see any type variables, they will only be\n       Unbound. *)\n    let t1, t2 = (force t1, force t2) in\n    match (t1, t2) with\n    | _ when equal t1 t2 -> ...\n    | TyArrow (f1, d1), TyArrow (f2, d2) -> ...\n    | TyVar tv, ty | ty, TyVar tv -> ...\n    | _ -> ...\n```\n\nSo to start off, we accept two types that we're going to try to equate.\n\nWe `force`\n\nthem and then match on both. `force`\n\njust dereferences all the `Link`\n\ns in a type variable.\n\n``` js\nlet rec force (ty : ty) : ty =\n  match ty with\n  | TyVar { contents = Link ty } -> force ty\n  | ty -> ty\n```\n\nNow, we know that if we match on a type variable, that it's definitely `Unbound`\n\n, and won't have to do any dereferencing inside the body of `unify`\n\n.\n\nLet's take `unify`\n\ncase-by-case. First, we just try checking if the two types are equal, according to OCaml's definition of equality. Note: we could've explicitly written out this case as `TyBool, TyBool -> ()`\n\n, but structural equality handles this for us.\n\n``` php\n| _ when equal t1 t2 ->\n    () (* The types are trivially equal (e.g. TyBool). *)\n```\n\nNext, we'll deal with the `TyArrow`\n\ncase. If both types are `TyArrow`\n\n, a.k.a we're trying to equate `TyArrow(A, B) = TyArrow(C, D)`\n\n, we should try to `unify A C`\n\nand `unify B D`\n\n. This corresponds to our example earlier where we recursed down the corresponding types.\n\n``` php\n| TyArrow (f1, d1), TyArrow (f2, d2) ->\n    (* If both types are function types, unify their corresponding types\n       with each other. *)\n    unify f1 f2;\n    unify d1 d2;\n```\n\nFinally, we get to the interesting case--when one of the types is a type variable.\n\nYou might be wondering why this case is interesting. After all, if one of the types is a type variable, isn't all we have to do just to bind it to the other type, like `tv := Link ty`\n\n?\n\nYou're right that we have to bind it, but there's also the question of what will happen if we try to unify two types that can form cycles?\n\nLet's trace through the following example and see what happens:\n\n```\nunify TyArrow(?0, ?0) TyArrow(TyArrow(TyBool, ?0), ?0)\n  unify ?0 TyArrow(TyBool, ?0)\n    ?0 := Link(TyArrow(TyBool, ?0))\n  unify ?0 ?0\n    unify TyArrow(TyBool, ?0) TyArrow(TyBool, ?0)\n      unify TyBool TyBool\n      unify ?0 ?0 <-- uh-oh! cycle\n```\n\nWe created a cycle in the first part of the unification (by binding `?0`\n\nto `TyArrow(TyBool, ?0)`\n\n).\n\nThen the second part of unification (`unify ?0 ?0`\n\n) dereferences them and recursively processes the corresponding types in the arrow. When it does that, it eventually hits `unify ?0 ?0`\n\nagain, resulting in infinite recursion.\n\nThis happens because the type pointed to by the type variable `?0`\n\nmentions `?0`\n\n, a.k.a a cycle. For this reason, before we bind a type variable to another type, we want to ensure that the type variable is not mentioned in that other type.\n\nOne might ask, \"aren't recursive types useful for things like trees and lists?\" They are, but that is a different kind of recursive type than the one we are talking about. A recursive tree like\n\n```\ntype tree = \n| Leaf\n| Branch of int * tree * tree\n```\n\ncan always be compared by its name `tree`\n\n.\n\nHowever, with the recursive types we're talking about disallowing, the `tree`\n\ntype is an infinite structure that needs to be compared while memoizing cycles.\n\n## Aside\n\nOne can think of cyclic types like these as recursive type aliases or anonymous recursive types. The formal name for these is *equirecursive types*, and they exist in some languages like OCaml with the `-rectypes`\n\nflag, and MLScript, which implements algebraic subtyping. A tree type would be expressed as `μT. Leaf | Branch(int, T, T)`\n\n. Notice how the recursion is extracted out as the parameter `T`\n\n. `μ`\n\nis basically a fixed-point combinator for types. When unifying a type variable with another type, we can normalize both types into this `μ`\n\nconstructor, and then compare them. With nominal types and pattern matching, we explicitly fold and unfold types. This is what's called an *iso-recursive type*.\n\nThis process of checking for a cycle before binding a type variable is called an *occurs* check, since we are checking that a type variable does not *occur* in the type we are binding to.\n\nHere is how an occurs check looks. We again match on the forced type, recursively call `occurs`\n\nif it's a `TyArrow`\n\n, and if we encounter a `TyVar`\n\nthat's the same reference as the `src`\n\n, we have a cycle.\n\n```\n(* Occurs check: check if a type variable occurs in a type. If it does, raise\n   an exception. *)\nlet rec occurs (src : tv ref) (ty : ty) : unit =\n  (* Follow all the links. If we see a type variable, it will only be\n     Unbound. *)\n  match force ty with\n  | TyVar tgt when phys_equal src tgt ->\n    (* src type variable occurs in ty. *)\n    raise OccursCheck\n  | TyArrow(from, dst) ->\n    (* Check that src occurs in the arrow type. *)\n    occurs src from;\n    occurs src dst;\n  | _ -> ()\n```\n\nWith that, the `TyVar`\n\ncase of unification can be written as\n\n``` php\n| TyVar tv, ty | ty, TyVar tv ->\n    (* If either type is a type variable, ensure that the type variable does\n       not occur in the type. *)\n    occurs tv ty;\n    (* Link the type variable to the type. *)\n    tv := Link ty\n```\n\nOur last case for unification covers any pair of types that don't match the previous cases, and hence don't unify.\n\n``` php\n| _ ->\n    (* Unification has failed. *)\n    raise (unify_failed t1 t2)\n```\n\nwhere `unify_failed`\n\nis just a helper that constructs an exception that prints out the types that failed to unify:\n\n``` js\nlet unify_failed t1 t2 =\n  UnificationFailure\n    (Printf.sprintf \"failed to unify type %s with %s\" (ty_debug t1) (ty_debug t2))\n```\n\nWith that, unification is done and so is our implementation of type inference.\n\nLet's test it out with some examples. In our case, a program is just an expression. We'll define an alias for it, since we'll change this as we add extensions to the language.\n\n```\ntype prog = exp\n```\n\nYou can find and run these examples in [lib/one.ml](lib/one.ml).\n\n[Examples](#examples)\n\n``` php\ntypecheck_prog\n    (* (fun x -> x) true *)\n    (EApp(ELam(\"x\", EVar \"x\"), EBool true))\n```\n\nOutput: `bool`\n\n``` php\ntypecheck_prog\n    (* (fun f -> f true) true *)\n    (EApp(ELam(\"f\", EApp(EVar \"f\", EBool true)), EBool true))\n```\n\nOutput: `UnificationFailure \"failed to unify type bool -> ?1 with bool\"`\n\n[Simple extensions](#simple-extensions)\n\nLet's extend our language with some simple extensions, namely `if`\n\nexpressions for branching on a `bool`\n\n, `let`\n\nbindings with type annotations, mutually recursive `let rec`\n\nbindings, and nominal type declarations.\n\n[If expressions](#if-expressions)\n\nTo start, `if`\n\nexpressions require us to add a new variant to our `exp`\n\n```\ntype exp =\n    ...\n    | EIf of exp * exp * exp (* if <exp> then <exp> else <exp> *)\n```\n\nand `texp`\n\ntypes.\n\n```\ntype texp =\n    ...\n    | TEIf of texp * texp * texp * ty\n```\n\nas well as update our `typ`\n\nhelper to extract the `ty`\n\nfield\n\n``` php\nlet typ (texp : texp) : ty =\n  match texp with\n  ...\n  | TEIf (_, _, _, ty) -> ty\n```\n\nIn terms of type inference, we only need to add one case to `infer`\n\nto handle `EIf`\n\n, since we haven't added any new types.\n\nFirst, we must ensure that the condition is of type `bool`\n\n, by calling `infer`\n\non the condition and `unify`\n\ning the resultant type with `TyBool`\n\n.\n\nThen, we must ensure that the then branch and else branch have the same types, by calling `infer`\n\non both branches, and `unify`\n\ning their types together.\n\nThe resultant type of the `if`\n\ncan be either one of the types of those branches, since they're the same.\n\n``` php\n| EIf (cond, thn, els) ->\n    (* Check that the type of condition is Bool. *)\n    let cond = infer env cond in\n    unify (typ cond) TyBool;\n    (* Check that the types of the branches are equal to each other. *)\n    let thn = infer env thn in\n    let els = infer env els in\n    unify (typ thn) (typ els);\n    (* Return the type of one of the branches. (we'll pick the \"then\"\n        branch) *)\n    TEIf (cond, thn, els, typ thn)\n```\n\n## Aside\n\nThe typing rules for `if`\n\nexpressions are\n\nThis says if under the context `Γ`\n\n, `cond`\n\n(the expression in the condition)'s type is `Bool`\n\n, `e1`\n\n(the expression in the `then`\n\nbranch)'s type is `T`\n\n, and `e2`\n\n(the expression in the `else`\n\nbranch)'s type is `T`\n\n, then `if cond then e1 else e2`\n\n's type is `T`\n\n.\n\nThat was pretty quick! Let's test it out.\n\nYou can find and run these examples in [lib/two.ml](lib/two.ml).\n\n[Examples](#examples-2)\n\n``` php\n(* if true then false else (fun x -> x) true *)\nEIf(EBool true, EBool false, EApp(ELam(\"x\", EVar \"x\"), EBool true))\n```\n\nOutput: `bool`\n\n``` php\n(* if true then false else (fun x -> x) *)\nEIf(EBool true, EBool false, ELam(\"x\", EVar \"x\"))\n```\n\nOutput: `UnificationFailure \"failed to unify type bool with ?0 -> ?0\"`\n\n[Let bindings](#let-bindings)\n\nA let binding is an expression like `let x = true in f x`\n\nor `let x : bool = true in f x`\n\n. Basically, it's a way of binding an identifier (with an optional annotation) to the result of evaluating some expression, and using that binding in the evaluation of another expression. Apart from the optional annotation, `let x = exp in body`\n\nis basically sugar for `(fun x -> body) exp`\n\n. However, since we want to handle type annotations, and set ourselves up for generalization later on, we will handle this case independently.\n\nTo start, we'll need to add an `ELet`\n\nvariant to our `exp`\n\ntype. We separate out the binding portion (as `let_decl`\n\n) and body portion (as `exp`\n\n) for readability and easier destructuring.\n\n``` js\ntype exp =\n    ...\n    | ELet of let_decl * exp (* let x : <type-annotation> = <exp> in <exp> *)\n\nand let_decl = id * ty option * exp\n```\n\nWe similarly update our `texp`\n\n.\n\n```\ntype texp =\n    ...\n    | TELet of tlet_decl * texp * ty\n\nand tlet_decl = id * ty option * texp\n```\n\nLet's look at type inference for our `ELet`\n\ncase.\n\n``` php\n| ELet ((id, ann, rhs), body) ->\n    ...\n```\n\nFirst, we want to `infer`\n\nthe type of the right-hand-side of the binding. If there is a type annotation, we want to check that the inferred type unifies with the annotation. This pattern of comparing an inferred type with a declared type will come up again and again in our type-checker. We can extract it out into a helper called `check`\n\n.\n\n``` js\n  let rec check env ty exp =\n    let texp = infer env exp in\n    (try\n        unify ty (typ texp);\n        texp\n    with UnificationFailure _ ->\n        raise (type_error ty))\n```\n\nThis is actually the *checking mode* of bidirectional type-checking. `infer`\n\nis what's called *inference mode* (or sometimes *synthesis mode*): given an expression, we produce its type. `check`\n\nis its dual: given an expression *and* an expected type, we verify the expression has that type. We use checking mode when the expected type is known from the context (like an annotation), and inference mode when it isn't.\n\nNow we can match against the annotation and `check`\n\nthat `rhs`\n\nsatisfies it.\n\n``` php\nlet rhs =\n    match ann with\n    | Some ann -> check env ann rhs\n    | None -> infer env rhs\nin\n```\n\nWe then extend our environment with the binding and its type.\n\n``` js\nlet env = (id, VarBind (typ rhs)) :: env in\n```\n\nFinally, we `infer`\n\nthe body of the binding (the part after the `in`\n\n) with the extended environment. This is the type we return up.\n\n``` js\nlet body = infer env body in\nTELet ((id, ann, rhs), body, typ body)\n```\n\nOur `ELet`\n\ncase ends up looking like\n\n``` php\n| ELet ((id, ann, rhs), body) ->\n    let rhs =\n        match ann with\n        | Some ann -> check env ann rhs\n        | None -> infer env rhs\n    in\n    let env = (id, VarBind (typ rhs)) :: env in\n    let body = infer env body in\n    TELet ((id, ann, rhs), body, typ body)\n```\n\n## Aside\n\nHere are the typing rules for `ELet`\n\n. We split them into two rules, one for a let binding without an annotation and one for a let binding with an annotation.\n\nThis says that under the context, if `rhs`\n\ncan be inferred to be the type `A`\n\n, and the context extended with `x`\n\nhaving the type `A`\n\nlets us give `body`\n\nthe type `B`\n\n, then the entire expression `let x = rhs in body`\n\ncan be given the type `B`\n\n.\n\nThis says that if `A`\n\nis a valid type, `rhs`\n\ncan be inferred to be the type `A`\n\n, and the context extended with `x`\n\nannotated with the type `A`\n\nlets us give `body`\n\nthe type `B`\n\n, then the entire annotated expression `let x: A = rhs in body`\n\ncan be given the type `B`\n\n.\n\nNote that the only thing that's really changed here is that we need to make sure that `A`\n\nis a well-formed annotation. Other than that, the first rule is deriving `A`\n\nand in the second, `A`\n\nis an annotation supplied by the programmer.\n\nNow let's test it out!\n\nYou can find and run these examples in [lib/three.ml](lib/three.ml).\n\n[Examples](#examples-3)\n\n``` js\ntypecheck_prog\n    (* let x = true in if x then false else true *)\n    (ELet((\"x\", None, EBool true), EIf(EVar \"x\", EBool false, EBool true)))\n```\n\nOutput: `bool`\n\n``` php\ntypecheck_prog\n    (* let x : bool = fun y -> y in x *)\n    (ELet((\"x\", Some TyBool, ELam(\"y\", EVar \"y\")), EVar \"x\"))\n```\n\nOutput: `TypeError \"expression does not have type bool\"`\n\n[(Mutually) recursive definitions](#mutually-recursive-definitions)\n\nIf we wanted to write a recursive `factorial`\n\nfunction or mutually recursive `is_even`\n\n/`is_odd`\n\nfunctions, we need to add a `let rec`\n\nconstruct. To make type inference work in this situation, we need `factorial`\n\nand `is_even`\n\n/`is_odd`\n\nin the environments of the function bodies when type-checking them. We also want to make sure there aren't any duplicate definitions.\n\nTo start, we'll add an `ELetRec`\n\nvariant to our `exp`\n\ntype.\n\n``` js\ntype exp =\n    ...\n    | ELetRec of let_decl list * exp (* let rec <decls> in <exp> *)\n```\n\nWe similarly update our `texp`\n\n.\n\n```\ntype texp =\n    ...\n    | TELetRec of tlet_decl list * texp * ty\n```\n\nLet's take a look at type inference for `ELetRec`\n\n.\n\n``` php\n| ELetRec (decls, body) ->\n    ...\n```\n\nFirst, we'd like to extend the environment with all of the declarations in the recursive let binding. We map over the declarations, creating a `VarBind`\n\nfor each one, using an annotation if it exists--otherwise just a fresh type variable.\n\n``` php\nlet env_decls = List.map decls ~f:(fun (id, ann, _) ->\n    let ty_decl =\n        match ann with\n        | Some ann -> ann\n        | None -> fresh_unbound_var()\n    in (id, VarBind ty_decl)\n) in\nlet env = env_decls @ env in\n```\n\nNext, we use the extended environment to check that the right-hand-side of each let binding matches its corresponding type in the environment.\n\nWe zip over the `VarBind`\n\nlist we created earlier as well as the let declarations themselves, checking that the right-hand-side's type matches the type in the `VarBind`\n\n. We return up a `tlet_decl`\n\ncorresponding to the typed let declaration, ultimately giving us a `tlet_decl list`\n\n, which is what's needed in the `TELetRec`\n\n.\n\n``` js\nlet decls = List.map2_exn env_decls decls ~f:(\n    fun (id, VarBind ty_bind) (_, ann, rhs) ->\n        let trhs = check env ty_bind rhs in\n        (id, ann, trhs))\nin\n```\n\nFinally, we `infer`\n\nthe body of the `let rec`\n\nusing the extended environment, returning up that type.\n\n``` js\nlet body = infer env body in\nTELetRec (decls, body, typ body)\n```\n\nOverall, our `ELetRec`\n\ncase looks like\n\n``` php\n| ELetRec (decls, body) ->\n    let env_decls = List.map decls ~f:(fun (id, ann, _) ->\n        let ty_decl =\n            match ann with\n            | Some ann -> ann\n            | None -> fresh_unbound_var()\n        in (id, VarBind ty_decl)\n    ) in\n    let env = env_decls @ env in\n    let decls : tlet_decl list = List.map2_exn env_decls decls ~f:(\n        fun (id, VarBind ty_bind) (_, ann, rhs) ->\n            let trhs = check env ty_bind rhs in\n            (id, ann, trhs))\n    in\n    let body = infer env body in\n    TELetRec (decls, body, typ body)\n```\n\n## Aside\n\nHere's the typing rule for `ELetRec`\n\n.\n\n`decls`\n\nis the set of names being bound in the let-rec. `A_x`\n\nis the type for binding `x`\n\nand `rhs_x`\n\nis its right-hand-side. `Γ_rec`\n\nis the context `Γ`\n\nextended with all of the bindings, so each `rhs_x`\n\ncan refer to any of them. This rule basically says that if each `rhs_x`\n\nhas the type `A_x`\n\nunder `Γ_rec`\n\n, and the body has the type `B`\n\nunder `Γ_rec`\n\n, then `let rec { x = rhs_x | x ∈ decls } in body`\n\nhas type `B`\n\n.\n\nFor annotated bindings, assuming `A_x`\n\nis well-formed, `A_x`\n\ncomes from its annotation with its quantified type variables made rigid in `Γ_rec`\n\n.\n\nThat's our `let rec`\n\ncase! Let's test it out with some examples. At this point, manually writing out the AST is going to get tedious, so I'll just show the source. If you're following along with the repo, you'll notice that we've integrated so that we can avoid writing the AST out by hand.\n\nYou can find and run these examples in [lib/four.ml](lib/four.ml).\n\n[Examples](#examples-4)\n\n``` php\ntypecheck_source {|\n    let rec f = fun x -> if x then g x else x\n    and g = fun x -> if x then f x else x\n    in f true\n|}\n```\n\nOutput: `bool`\n\n``` php\ntypecheck_source {|\n    let rec f = fun x -> if x then g x else x\n    and g : bool -> bool -> bool = fun x -> if x then f x else x in\n    f true\n|}\n```\n\nOutput: `UnificationFailure \"failed to unify type bool -> bool with bool\"`\n\n[Type declarations](#type-declarations)\n\nAn example type declaration in this language will be of the form\n\n``` php\ntype Foo = {\n    bar: bool\n    baz: bool -> bool\n}\n```\n\nNote that the type name corresponds to a record.\n\nIt can be constructed and projected (selected from) with\n\n``` php\nlet foo : Foo = {bar = true, baz = fun x -> x} in\nfoo.baz\n```\n\nA declaration is *nominal*. So another type with the same fields like\n\n``` php\ntype Qux = {\n    bar: bool\n    baz: bool -> bool\n}\n```\n\ncannot be used in place of `Foo`\n\n. For example, this program won't type-check:\n\n``` php\nlet qux : Qux = {bar = true, baz = fun x -> x} in\nlet foo : Foo = qux (* Type error: expected Foo, got Qux *)\n```\n\nNominal types (or newtypes) are called that because you only need to compare their names to know they are different, and this is how our `unify`\n\nwill treat them as well. However, if you notice in the previous examples, we construct these records without mentioning the type name on the record literal, i.e. there is no `Foo{...}`\n\nform. So while we can compare types by their name to know that they are different, we will still have to infer that a record literal corresponds to some pre-declared type. The secret sauce we need to do that is called *rows*, which we'll describe shortly.\n\nLet's discuss the implementation of our nominal types.\n\nFirst, let's introduce type declarations (type constructors) to our language.\n\n```\ntype tycon = {\n    name : id;\n    ty : record_ty;\n}\nand record_ty = (id * ty) list\n```\n\nOur `prog`\n\nis now more than just an `exp`\n\n. Rather, it is now a list of type constructors and an `exp`\n\n.\n\n```\ntype prog = tycon list * exp\n```\n\nWe also need a way of referring to a user-defined type by name.\n\n```\ntype ty =\n    ...\n    | TyName of id (* Type name: Foo *)\n```\n\nTypes get added to the environment just like variables. When we search for and find a type in the environment, we get back a `TypeBind`\n\nholding our `tycon`\n\n```\ntype bind =\n    ...\n    | TypeBind of tycon (* A type binding maps to a type constructor. *)\n```\n\nAt the expression-level, we need a way to construct these records by specifying all of its fields.\n\n```\ntype exp =\n    ...\n    | ERecord of record_lit (* {x = true, y = false} *)\n```\n\nWe'll add a way to update fields using a `with`\n\nexpression, producing a record of the same type with the provided fields set to different values\n\n```\ntype exp =\n    ...\n    | EWith of exp * record_lit (* { r with x = true, y = false} *)\n```\n\nand a way to access fields (also called projection)\n\n```\ntype exp =\n    ...\n    | EProj of exp * id (* r.y *)\n```\n\nwhere `record_lit`\n\nis a list of (field name, expression) pairs\n\n```\nand record_lit = (id * exp) list\n```\n\nCorrespondingly, we update `texp`\n\nand define `tyrecord_lit`\n\n```\ntype texp =\n    ...\n    | TERecord of tyrecord_lit * ty\n    | TEWith of texp * tyrecord_lit * ty\n    | TEProj of texp * id * ty\nand tyrecord_lit = (id * texp) list\n```\n\nWe can go ahead and declare the `lookup_tycon`\n\nhelper that (similar to `lookup_var_type`\n\n) will search the `env`\n\nfor a `TypeBind`\n\nwith a name.\n\n``` js\n(* Lookup a type constructor in the environment. *)\nlet lookup_tycon name (e : env) : tycon =\n    match List.Assoc.find e ~equal name with\n    | Some (TypeBind t) -> t\n    | _ -> raise (undefined_error \"type\" name)\n```\n\nFinally, in order to infer that a record literal constructs some predeclared type, we need that secret sauce I mentioned earlier. Remember how unification used type variables in place of types we hadn't figured out yet? We want to use those type variables for records as well, but we also want to say \"we haven't figured out the type of this record yet, but we know it should have *these* fields\". To do this, we are going to stash a constraint inside our `Unbound`\n\ntype variable called a `row_constraint`\n\n.\n\nBasically, a *row* is a set of (label, type) pairs, a.k.a a set of field names and their types (they can also correspond to variant names and their types, but we will not be covering those in this post). If you recall, `record_ty`\n\nis exactly the structure we want to represent such a set. Our constraints, then, are how we indicate that a type variable should have *at least* certain fields or *exactly* certain fields.\n\n```\nand row_constraint =\n    | NoRow (* No row constraint. *)\n    | OpenRow of record_ty (* Must contain at least these fields (from EProj/EWith). *)\n    | ClosedRow of record_ty (* Must contain exactly these fields (from ERecord). *)\n```\n\nand so we extend the definition of an `Unbound`\n\ntype variable to be\n\n```\nand tv = (* A type variable *)\n    | Unbound of id * row_constraint\n    ...\n```\n\n`NoRow`\n\nis there to represent a regular type variable as we used it earlier, `OpenRow`\n\nis there to say that a type must have at least these fields, and `ClosedRow`\n\nsays that a type must have exactly these fields. For example, `A::{x: int, ...}`\n\nsays that the type `A`\n\nmust be a record type that at least has a field named `x`\n\nwhose type is `int`\n\n. Similarly, `B::{x: int, y: int}`\n\nsays that the type `B`\n\nmust be a record type that exactly has the fields `x`\n\nand `y`\n\nwhich are both of the type `int`\n\n.\n\nDuring type inference, we will encounter places where given two types that both contain row constraints, we will need to combine them. For this reason, we will define a `union_rows`\n\nhelper.\n\n``` js\nlet rec union_rows env (row_a: row_constraint) (row_b: row_constraint) : row_constraint =\n    match (row_a, row_b) with\n    ...\n```\n\nIf `NoRow`\n\nis unioned with something else, that other row is the result.\n\n``` php\n| NoRow, row | row, NoRow -> row\n```\n\nIf two `OpenRow`\n\ns are unioned together, we want to merge their fields, producing a new `OpenRow`\n\n. If they both share fields of the same name, unify their types together.\n\n``` php\n| OpenRow row_a, OpenRow row_b ->\n    OpenRow (List.dedup_and_sort (row_a @ row_b) ~compare:(fun (f1,t1) (f2,t2) ->\n    if String.equal f1 f2 then (unify env t1 t2; 0)\n    else Poly.compare (f1,t1) (f2,t2)))\n```\n\nIf an `OpenRow`\n\nis unioned with a `ClosedRow`\n\n, we return the `ClosedRow`\n\n. However, we need to ensure that the fields in the `OpenRow`\n\nare a subset of the fields in the `ClosedRow`\n\n, since an `OpenRow`\n\nsays we want *at least* those fields held by its constraint. For this case, we will define a `fld_exists`\n\nhelper that checks that a field name exists in a `record_ty`\n\nand if so, that its type unifies with our expected type.\n\n``` php\nand fld_exists env (rcd: record_ty) id ty =\n    List.exists rcd ~f:(fun (f,t) -> String.equal f id && (unify env t ty; true))\n```\n\nUsing this, our next case inside `union_rows`\n\nis as follows:\n\n``` php\n| OpenRow o_row, ClosedRow c_row | ClosedRow c_row, OpenRow o_row ->\n    List.iter o_row ~f:(fun (id,ty) ->\n        if not (fld_exists env c_row id ty) then\n            raise (row_mismatch row_a row_b)); ClosedRow c_row\n```\n\nNow if two `ClosedRow`\n\ns are unioned together, we need to ensure they have the same exact fields. We can first check that their length is the same and then check that each of the first row's fields is present in the second.\n\n```\n| ClosedRow flds1, ClosedRow flds2 when Int.equal (List.length flds1) (List.length flds2) ->\n    List.iter flds1 ~f:(fun (id,ty) ->\n        if not (fld_exists env flds2 id ty) then\n            raise (row_mismatch row_a row_b)); ClosedRow flds1\n```\n\nFinally, in any other case, we can say that the rows don't match. We raise a `RowMismatch`\n\nexception.\n\n``` php\n| _ -> raise (row_mismatch row_a row_b)\n```\n\nLet's see how these constraints come up during type inference.\n\nWe'll start off by extending `infer`\n\nto handle the record literal `ERecord`\n\n. Off the bat, even though we don't currently know the name of its type, we know exactly the fields it must have, so it must be a `ClosedRow`\n\nof some kind. Also, to know the type of a record, we need to know the type of each of its fields, so we should call `infer`\n\non each field's type.\n\n``` php\n| ERecord rec_lit ->\n    let rec_lit = List.map rec_lit ~f:(fun (id, x) -> (id, infer env x)) in\n```\n\nWe get back a `tyrecord_lit`\n\n. Since we want to return up its type, we want to map over and get the name and type of each field to get our `record_ty`\n\n.\n\n``` php\n    let flds = List.map ~f:(fun (id, x) -> (id, typ x)) rec_lit in\n```\n\nFinally, to represent the type we'll return up, we create an `Unbound`\n\ntype variable with a `ClosedRow`\n\nconstraint holding `flds`\n\n.\n\n```\nfresh_unbound_var ~row:(ClosedRow flds) ()\n```\n\nOverall, our `ERecord`\n\ncase of `infer`\n\nwill look like this:\n\n``` php\n| ERecord rec_lit ->\n    let rec_lit = List.map rec_lit ~f:(fun (id, x) -> (id, infer env x)) in\n    let flds = List.map ~f:(fun (id, x) -> (id, typ x)) rec_lit in\n    TERecord (rec_lit, fresh_unbound_var ~row:(ClosedRow flds) ())\n```\n\nThe next expression we have to typecheck is `EWith`\n\n, which is of the form `{ r with x = true }`\n\n. Basically, given some record and some fields that are being assigned, we need to ensure that the record has at least the fields that are being assigned. That is, the type we return for an `EWith(rcd, flds)`\n\nexpression is an `Unbound`\n\ntype variable with an `OpenRow`\n\nconstraint holding the types of `flds`\n\n.\n\nWe start by inferring the type of the record.\n\n``` php\n| EWith (rcd, flds) ->\n    let rcd = infer env rcd in\n```\n\nAnd like before, we want to map over the `flds`\n\nand infer the type of each one of them, followed by mapping over the `tyrecord_lit`\n\nto get a `record_ty`\n\n.\n\n``` php\nlet rec_lit = List.map flds ~f:(fun (id, x) -> (id, infer env x)) in\nlet flds = List.map ~f:(fun (id, x) -> (id, typ x)) rec_lit in\n```\n\nWe can now generate an `Unbound`\n\ntype variable for the return type holding the `OpenRow`\n\nconstraint.\n\n``` js\nlet row = fresh_unbound_var ~row:(OpenRow flds) () in\n```\n\nWe want to ensure that the type of `rcd`\n\nis equivalent to `row`\n\n, since `EWith`\n\nguarantees that the input and the output type are the same. This is why we'll unify them.\n\n```\nunify env (typ rcd) row;\n```\n\nWe can then return the `TEWith`\n\nup with the typed record, fields, and unified type. Overall, our `EWith`\n\ncase should look like this:\n\n``` php\n| EWith (rcd, flds) ->\n    let rcd = infer env rcd in\n    let rec_lit = List.map flds ~f:(fun (id, x) -> (id, infer env x)) in\n    let flds = List.map ~f:(fun (id, x) -> (id, typ x)) rec_lit in\n    let row = fresh_unbound_var ~row:(OpenRow flds) () in\n    unify env (typ rcd) row;\n    TEWith (rcd, rec_lit, typ rcd)\n```\n\nThe last expression we need to infer the type of is `EProj`\n\n, which involves accessing a field from a record. We follow a similar approach here. Given an `EProj(rcd, fld)`\n\n, we synthesize an `Unbound`\n\ntype variable with an `OpenRow`\n\nconstraint containing `fld`\n\nand its type, unifying the type variable with the type of `rcd`\n\n.\n\n``` php\n| EProj (rcd, fld) ->\n    let rcd = infer env rcd in\n    let fld_ty = fresh_unbound_var () in\n    let row = fresh_unbound_var ~row:(OpenRow [(fld, fld_ty)]) () in\n    unify env (typ rcd) row;\n    TEProj (rcd, fld, fld_ty)\n```\n\nThose are all the changes to `infer`\n\n. We need to now update `unify`\n\nto handle type names and row constraints.\n\nThe bulk of the new logic happens when we try to unify `TyVar`\n\nwith some other type. Since `TyVar`\n\ncan now hold a `row_constraint`\n\n, if the other type is record-shaped, we need to combine the constraints of both types.\n\nWe start off by destructuring the type variable.\n\n```\nand unify env (t1 : ty) (t2 : ty) : unit =\n    ...\n    | TyVar tv, ty | ty, TyVar tv ->\n        let Unbound(_, tv_row) = !tv in\n```\n\nWe then want to match against `ty`\n\nand depending on what it is, union its rows with `tv_row`\n\n.\n\nFirst, if `ty`\n\nis a `TyName`\n\n, we want to look up that type name in the environment, get its underlying record, and union its rows with `tv_row`\n\n.\n\n``` php\nmatch ty with\n| TyName tname ->\n    let tc = lookup_tycon tname env in\n    ignore (union_rows env tv_row (ClosedRow tc.ty))\n...\n```\n\nNext we handle the case where `ty`\n\nis a type variable that is distinct from `tv`\n\n.\n\n``` php\n| TyVar other when not (phys_equal tv other) ->\n    let Unbound(id, other_row) = !other in\n```\n\nWe will union `tv_row`\n\nwith `other_row`\n\n. Let's define a helper `row_iter`\n\nthat walks the fields in a `row_constraint`\n\n.\n\n``` php\nlet row_iter (row : row_constraint) f =\n  match row with\n  | NoRow -> ()\n  | OpenRow flds | ClosedRow flds -> List.iter flds ~f\n```\n\nFirst, we should make sure `tv`\n\nisn't mentioned inside the fields' types inside `other_row`\n\n.\n\n``` php\nrow_iter other_row (fun (_, ty) -> occurs tv ty);\n```\n\nThen we can call our `union_rows`\n\nhelper and set `other`\n\n's row to be the unioned row.\n\n``` js\nlet row = union_rows env tv_row other_row in\nother := Unbound(id, row)\n```\n\nNext, if `tv_row`\n\nis a `NoRow`\n\n, i.e. it's a regular type variable without a row constraint, we do a standard occurs check between `tv`\n\nand `ty`\n\n.\n\n``` php\n| _ when equal tv_row NoRow ->\n    (* If either type is a type variable, ensure that the type variable does\n       not occur in the type. *)\n    occurs tv ty\n```\n\nOtherwise, it must mean that `tv_row`\n\nhas some row constraint and `ty`\n\nis not record-shaped, which means unification has failed.\n\n``` php\n| _ ->\n    (* ty is not record-like. Can't unify with a row. *)\n    raise (unify_failed t1 t2));\n```\n\nIf we haven't failed unification by now, we can `Link ty`\n\nto `tv`\n\n.\n\nOverall, the `TyVar`\n\ncase of the match looks like\n\n``` php\n| TyVar tv, ty | ty, TyVar tv ->\n    let Unbound(_, tv_row) = !tv in\n    (match ty with\n    | TyName tname ->\n        let tc = lookup_tycon tname env in\n        ignore (union_rows env tv_row (ClosedRow tc.ty))\n    | TyVar other when not (phys_equal tv other) ->\n        (* Union the rows of these two distinct type variables. *)\n        let Unbound(id, other_row) = !other in\n        row_iter other_row (fun (_, ty) -> occurs tv ty);\n        let row = union_rows env tv_row other_row in\n        other := Unbound(id, row)\n    | _ when equal tv_row NoRow ->\n        (* If either type is a type variable, ensure that the type\n           variable does not occur in the type. *)\n        occurs tv ty\n    | _ ->\n        (* ty is not record-like. Can't unify with a row. *)\n        raise (unify_failed t1 t2));\n    (* Link the type variable to the type. *)\n    tv := Link ty\n```\n\nFinally, `unify`\n\nneeds to handle the trivial case where two `TyName`\n\ns have the same name.\n\n```\nand unify env (t1 : ty) (t2 : ty) : unit =\n    ...\n    | TyName a, TyName b when equal a b -> () (* The type names are the same. *)\n```\n\nWith that, we've added type declarations and inference for record literals into our language.\n\n## Aside\n\nHere is the formation rule for user-defined types\n\nThis rule basically says if T is a user-defined type, in order for it to be well-formed, each of its field's types must be well-defined.\n\nNote: From this point on, Γ needs to be threaded throughout formation rules for well-formedness to hold, including WF-Bool and WF-Arrow. For example, WF-Arrow relies on `A`\n\nand `B`\n\nbeing well-formed for `A -> B`\n\nto be well-formed.\n\nThe typing rule for records:\n\nThis says that if there is a type T in the context whose fields match that of the record, the record's type is T.\n\nThe rule for with expressions:\n\nThis says that if r is a record of type T that contains each of the fields in the with expression, then the returned value of the with expression is also of type T.\n\nand for projection:\n\nThis says that if r is a record of type T containing a field f, then r.f has the type of that field in T.\n\nLet's take a look at some examples.\n\nYou can find and run these examples in [lib/five.ml](lib/five.ml).\n\n[Examples](#examples-5)\n\n``` php\ntypecheck_source {|\n    type Foo = { x : bool, y : bool -> bool }\n    let foo : Foo = { x = true, y = fun x -> x } in foo.y true\n|}\n```\n\nOutput: `bool`\n\n``` js\ntypecheck_source {|\n    type Foo = { x : bool }\n    let foo : Foo = { x = true } in { foo with y = true }\n|}\n```\n\nOutput: `RowMismatch \"{y: bool, ...} and {x: bool}\"`\n\n[Polymorphism](#polymorphism)\n\nUp until now, the language we have type-checked does not have any polymorphism.\n\nFor example, in the following program, `f`\n\ncannot be applied both to a value of type `A`\n\nand to a value of type `B`\n\n.\n\n``` php\ntype A = {}\ntype B = {}\n\nlet f = fun x -> x in\nlet a : A = {} in\nlet b : B = {} in\nlet _ = f a in\nf b\n```\n\nOur type inference algorithm gives `f`\n\nthe type `A -> A`\n\n, so when it tries to type-check the application to `B`\n\n, it fails.\n\nWe'd have to rewrite the example to have a separate `fA`\n\nand `fB`\n\nto get it to type-check.\n\n``` php\nlet fA = fun x -> x in\nlet fB = fun x -> x in\nlet a : A = {} in\nlet b : B = {} in\nlet _ = fA a in\nfB b\n```\n\nBut when we look at `f`\n\n, nothing about its definition requires it to be restricted to `A`\n\n. How do we make `f`\n\npolymorphic (or generic) over its arguments?\n\n[Instantiation](#instantiation)\n\nWe'd like `f`\n\n's type to be something like `forall 'a. 'a -> 'a`\n\n. But hang on, how would we even use a type like that? We can't exactly unify `'a`\n\nwith `A`\n\n. We need to treat `'a`\n\nas a placeholder (or type parameter) that gets substituted with a concrete type argument. This process of taking a generic type and replacing its type parameters with concrete types is called *instantiation*. When `f`\n\ngets applied to an `A`\n\nor `B`\n\n, we look up its type (which will now be generic), and instantiate it to have its type parameters substituted with fresh type variables.\n\nSo for example, when `f`\n\nis applied to an `A`\n\n, `forall 'a. 'a -> 'a`\n\ngets instantiated to get `?0 -> ?0`\n\n. Then `A -> ?1`\n\ngets unified with `?0 -> ?0`\n\nas normal, resulting in `A -> A`\n\nas the type of the *concrete* instance of `f`\n\n.\n\nLikewise, when `f`\n\nis applied to a `B`\n\n, `forall 'a. 'a -> 'a`\n\ngets instantiated to get `?2 -> ?2`\n\n, and the same process will result in its concrete type becoming `B -> B`\n\n.\n\nTo get generic instantiation working, we first want to introduce `generic_ty`\n\nthat contains a type as well as a list of type parameters.\n\n```\n(* A generic type. Should be read as forall p1..pn. ty, where p1..pn\n   are the type parameters. It is separated from ty because in HM, a\n   forall can only be at the top level of a type. *)\ntype generic_ty = {\n    type_params: id list;\n    ty : ty;\n}\n```\n\nYou might be wondering why `generic_ty`\n\nis not just another variant under `ty`\n\n, like `Forall of id list * ty`\n\n. The short answer is that this kind of polymorphism is not possible in HM, because you no longer get complete type inference and lose principal types. In HM, the type parameters in a generic type are always at the outermost level. So you can have `forall 'a 'b. 'a -> 'b`\n\n, but you cannot have `forall 'a. 'a -> forall 'b. 'b`\n\n. Supporting the latter would be called *higher-rank polymorphism* (not to be confused with rank polymorphism).\n\nFor example, this function can't be written in HM. It accepts a polymorphic identity function as a parameter and instantiates it with two different types.\n\n``` php\nfun f (g: forall a. a -> a) => (g 1, g \"hello\")\n```\n\nHowever, this is not very limiting in practice. Most of the polymorphic functions you'd ever want to write can be written in HM.\n\n## Aside\n\nThere are some useful exceptions though. For example, Haskell can use the `ST`\n\nmonad to scope an action to a particular thread with the signature\n\n``` php\nrunST :: forall a. (forall s. ST s a) -> a\n```\n\nAnother example is with existentials, like Rust's `dyn`\n\ntraits or Go's interface values. These can be encoded with higher-rank polymorphism. `exists X. T`\n\n(which is roughly like `dyn Trait`\n\n) can be written as `forall Y. (forall X. T -> Y) -> Y`\n\n. However, this latter feature can be added on its own, and doesn't necessarily need higher-rank polymorphism/polymorphic lambda calculus/System F.\n\nNext, we will update the definition of `VarBind`\n\nto hold a `generic_ty`\n\ninstead of a `ty`\n\n. This is needed because, if you consider the example above, `f`\n\n's type in the environment needs to be generic so it can be instantiated. So now `bind`\n\nis defined as\n\n```\ntype bind =\n    | VarBind of generic_ty (* A variable binding maps to a generic type. *)\n    | TypeBind of tycon (* A type binding maps to a type constructor. *)\n```\n\nOur `lookup_var_type`\n\nfunction should be modified accordingly to return a `generic_ty`\n\n.\n\n``` js\n(* Lookup a variable's type in the environment. *)\nlet lookup_var_type name (e : env) : generic_ty =\n    match List.Assoc.find e ~equal name with\n    | Some (VarBind t) -> t\n    | _ -> raise (undefined_error \"variable\" name)\n```\n\nWe also want to update the type annotations in let bindings to be generic, so if someone were inclined, they could write `let f : forall 'a. 'a -> 'a`\n\n.\n\n```\nand let_decl = id * generic_ty option * exp\n...\nand tlet_decl = id * generic_ty option * texp\n```\n\nNow let's discuss the changes to type inference. The first thing that's different is that when we look up a variable in our environment, we get a `generic_ty`\n\n. In order to actually use it, we need to instantiate it to turn it into a regular `ty`\n\n, where the type parameters are substituted in for fresh type variables.\n\nSo in our `EVar`\n\ncase, after calling `lookup_var_type`\n\n, we want to `inst`\n\nantiate the `var_ty`\n\nbefore returning it up.\n\n``` php\n| EVar name ->\n    (* Variable is being used. Look up its type in the environment, *)\n    let var_ty = lookup_var_type name env in\n    (* instantiate its type by replacing all of its quantified type\n       variables with fresh unbound type variables.*)\n    TEVar (name, inst var_ty)\n```\n\nNow let's delve into how `inst`\n\nis implemented. Let's run through a simple example first, to get the point across.\n\nGiven the generic type `forall 'a 'b. 'a -> ('b -> 'a)`\n\n, if `a`\n\nhad the fresh type variable `?0`\n\nand `b`\n\nhad the fresh type variable `?1`\n\n, the instantiated type should be `?0 -> (?1 -> ?0)`\n\n.\n\nMore concretely, if the `generic_ty`\n\nis\n\n```\n{\n    type_params = [\"'a\"; \"'b\"];\n    ty = TyArrow(TyName \"'a\", TyArrow(TyName \"'b\", TyName \"'a\"))\n}\n```\n\nthe instantiated type is something like\n\n``` js\nlet a = TyVar(ref(Unbound(\"?0\", NoRow))) in\nlet b = TyVar(ref(Unbound(\"?1\", NoRow))) in\nTyArrow(a, TyArrow(b, a))\n```\n\n(Note: I bound the `TyVar`\n\ns to variables here to show that the references would be the same.)\n\nThe actual implementation of `inst`\n\njust needs to create a hash table mapping from a type parameter to a fresh unbound type variable, traverse over the type, and replace each reference to that type parameter with that type variable.\n\n```\n(* Instantiate a generic type by replacing all the type parameters\n   with fresh unbound type variables. Ensure that the same ID gets\n   mapped to the same unbound type variable by using an (id, ty) Hashtbl. *)\nlet inst (gty: generic_ty) : ty =\n  let tbl = Hashtbl.create (module String) in\n  List.iter gty.type_params ~f:(fun pid ->\n    Hashtbl.set tbl ~key:pid ~data:(fresh_unbound_var ()));\n  let rec inst' ty =\n    match force ty with\n    | TyName id as ty -> (\n      match Hashtbl.find tbl id with\n      | Some tv -> tv\n      | None -> ty)\n    | TyArrow (from, dst) -> TyArrow (inst' from, inst' dst)\n    | ty -> ty\n  in\n  if Hashtbl.is_empty tbl then gty.ty else inst' gty.ty\n```\n\nThe next place things are different is `ELet`\n\n(and similarly `ELetRec`\n\n). Let's look at how `ELet`\n\nchanges first. Because a let binding can have annotations and we now have generic types, those annotations can be polymorphic. Perhaps more interestingly, when a let binding is unannotated, it gets inferred to be as polymorphic as possible. This latter behavior is what we mentioned at the beginning of the tutorial as *generalization*. We'll cover the simple case of polymorphic bindings first, which is basically no inference, or rather, how to typecheck a let binding that has a polymorphic type annotation.\n\nWhat do we want from a polymorphic type annotation? Imagine we had the following program\n\n``` php\nlet f : forall 'a. 'a -> 'a = fun x ->\n    let y : 'a = x in y\nin f true\n```\n\nNotice how the annotation on `f`\n\nintroduces `'a`\n\nwith a `forall`\n\n. `'a`\n\nis a type variable that can range over any type, so it can be instantiated with for example, `int`\n\n, `bool`\n\n, `string`\n\n, etc... On the right-hand-side of the let binding, we see a function whose body has another let binding, this time with an annotation mentioning the same `'a`\n\n. Basically, if `f`\n\ngets instantiated with an `int`\n\n, the instantiation's type would be `int -> int`\n\n, and the inner let binding's type would be `int`\n\n. We can say that the `'a`\n\nintroduced by `forall 'a`\n\non `f`\n\n's annotation is scoped to the right-hand-side of `f`\n\n's binding.\n\nAnother factor we should consider when we have type variables inside annotations is rigidity. What would happen if we tried to instantiate a generic annotation? Let's try to type-check the following program and see what happens:\n\n``` php\nlet f : forall 'a 'b. 'a -> 'b = fun x -> x in f\n```\n\nTo be clear, we do *not* want this to type-check. We assigned an identity function the generic type `forall 'a 'b. 'a -> 'b`\n\n, but that implies `'a`\n\nand `'b`\n\nmust be distinct types, whereas in an identity function like `fun x -> x`\n\n, the input type must match the output type.\n\nHowever, when we instantiate this type, we see `TyArrow(?a, ?b)`\n\nwhich `check`\n\ntries to `unify`\n\nwith the function's type `TyArrow(?0, ?0)`\n\n, so\n\n```\nunify TyArrow(?0, ?0) TyArrow(?a, ?b)\n    unify ?0 ?a\n        ?0 := Link(?a)\n    unify ?0 ?b\n        force ?0 = ?a\n            unify ?a ?b\n                ?a := Link(?b)\n```\n\nWe just unified `?a`\n\nwith `?b`\n\nand type-checking passed, which is *not* what we want. So how do we fix this?\n\nWell imagine that instead of working with `Unbound`\n\ntype variables for `?a`\n\nand `?b`\n\n, we were working with `TyName \"a\"`\n\nand `TyName \"b\"`\n\n. Two `TyName`\n\ns with different names are inherently unequal and do not unify. That's the key. We treat type variables as `TyName`\n\ns on their own.\n\n```\nunify TyArrow(?0, ?0) TyArrow(TyName \"a\", TyName \"b\")\n  unify ?0 TyName \"a\"\n    ?0 := Link(TyName \"a\")\n  unify ?0 TyName \"b\"\n    force ?0 = TyName \"a\"\n        unify TyName \"a\" TyName \"b\"\n```\n\n`unify TyName \"a\" TyName \"b\"`\n\nfails, and so this program doesn't type-check.\n\nWhat `TyName`\n\nis doing here is serving as a *rigid* type variable. That is, unless the type variables have the same name, they do not unify.\n\nInstead of our typical instantiation, we will turn all the type variables in a `generic_ty`\n\ninto `TyName`\n\ns (a.k.a rigid type variables), and extend our environment to hold those names. Extending our environment this way gives us scoping for these type variables, hence these are *scoped* and *rigid* type variables, a lot of jargon to say that the type variables in an annotation get put in our environment and get referenced as type names.\n\nTo implement this, first, we will introduce another kind of binding to our environment called `TypeVarBind`\n\n. This just indicates that a name we look up is in fact a type variable.\n\n```\ntype bind =\n    ...\n    | TypeVarBind (* A type variable binding marks some rigid type. *)\n```\n\nThen, we'll update our environment to hold `TypeVarBind`\n\ns for every type parameter in our generic type. This is where we make our annotation rigid. We don't need to modify the body of the type, because if we aren't calling `inst`\n\n, references to the type variable are just `TyName`\n\ns.\n\n```\n(* Turn a generic_ty into its rigid form, so that when annotations are instantiated, \n   they don't produce Unbound type variables that can unify with each other.*)\nlet as_rigid (gty: generic_ty) : env * ty =\n    let extras = List.map gty.type_params ~f:(fun id -> (id, TypeVarBind)) in\n    (extras, gty.ty)\n```\n\nFor the `unify`\n\ntrace from earlier to actually reject the invalid program, we need to update `unify`\n\nto handle `TypeVarBind`\n\n. We generalize `lookup_tycon`\n\ninto a `lookup_binding`\n\nfunction that returns any binding, and the `TyName`\n\narm becomes\n\n``` php\n| TyName tname ->\n  (match lookup_binding tname env with\n   | TypeVarBind ->\n     (match tv_row with\n      | NoRow -> ()\n      | _ -> raise (expected_ty_error \"record\" tname))\n   | TypeBind tc -> ignore (union_rows env tv_row (ClosedRow tc.ty))\n   | VarBind _ -> raise (undefined_error \"type\" tname))\n```\n\nNow let's look at how we modify `ELet`\n\n.\n\n``` php\nmatch ann with\n| Some ann ->\n    let (extras, check_ty) = as_rigid ann in\n    check (extras @ env) check_ty rhs\n| None -> infer env rhs\n```\n\nIn the case where there is a generic annotation, we extend the environment with its type parameters (`extras`\n\n), and type-check the right-hand-side. The annotation puts us back in checking mode, and the rigid type variables introduced by `as_rigid`\n\nmean the check enforces the polymorphic annotation rather than falling back to inference.\n\nNow what if we want to have a generic function that doesn't need a type annotation? This is where generalization comes in.\n\n[Generalization](#generalization)\n\nNow that we've covered instantiation of generic types and checking generic annotations, it's time to get to the meat of Hindley-Milner--generalization. Simply put, generalization takes a type that's not generic and makes it generic by turning its unbound type variables into type parameters.\n\nSo given a type like\n\n``` js\nlet t = TyVar(ref(Unbound \"?0\")) in\nTyArrow(t, t)\n```\n\ngeneralization could turn it into something like\n\n```\n{\n    type_params = [\"?0\"];\n    ty = TyArrow(TyName \"?0\", TyName \"?0\")\n}\n```\n\nI say *could* turn into and not *will* turn into, because there is one crucial piece of information that plays a role in whether a type variable gets generalized--its scope.\n\nThe rule is that a type variable on the right-hand-side of a let binding is only generalized if it was created in the right-hand-side of that let binding. So in other words, for an expression like\n\n``` js\nlet x = RHS\nin BODY\n```\n\nif the type variable's scope is within RHS, it can be generalized.\n\nNow before we talk about how to implement it, let's build up an intuition for why this is.\n\nWhen a type variable is generalized inside a let binding, it becomes distinct from type variables outside the let binding. This means if that type variable outside the let binding was meant to be resolved into a `bool`\n\n, that no longer applies to the generalized type variable.\n\nLet's work through the following example.\n\n``` php\n(fun x -> let y = x in y) true true\n```\n\nAlready off the bat, we can tell that this program shouldn't type-check. We effectively have an identity function being applied to `true`\n\n, and the result is applied to `true`\n\n. And we know that `Bool`\n\nis not an arrow type, so it can't be applied to anything. (We saw a similar contradiction in an earlier example.)\n\nSo what would happen in this example if we generalized *every* type variable on the right-hand-side of a let binding?\n\nIf we trace the process of inference, it would look like this\n\n``` php\n 1. (fun x -> let y = x in y) true true              EApp\n 2.     (fun x -> let y = x in y) true               EApp\n 3.         fun x -> let y = x in y                  ELam\n 4.             ty_param = ?0\n 5.             env' = (\"x\", forall _. ?0) :: env\n 6.                 let y = x in y                   ELet\n 7.                     x                            EVar\n 8.                         inst (forall _. ?0)\n 9.                         ?0\n10.                     ty_gen = gen ?0\n11.                     env = (\"y\", forall ?0. ?0)\n12.                     y                            EVar\n13.                         inst (forall ?0. ?0)\n14.                         ?1\n15.                 ?0 -> ?1\n16.         true                                     EBool\n17.         ty_res = ?2\n18.         ty_arr = Bool -> ?2\n19.         unify (?0 -> ?1) (Bool -> ?2)\n20.         ?2\n21.     true                                         EBool\n22.     ty_res = ?3\n23.     ty_arr = Bool -> ?3\n24.     unify ?2 (Bool -> ?3)\n25.     ?3\n```\n\nOn line 4, `x`\n\nis given the type variable `?0`\n\n.\n\nOn line 6, it's bound to `y`\n\n.\n\nOn line 10, `y`\n\n's type is generalized.\n\nOn line 13, it's instantiated to get the type variable `?1`\n\n.\n\nOn line 15, we see the type of lambda ends up being `?0 -> ?1`\n\n.\n\nOn line 19, this type is unified with `Bool -> ?2`\n\nbecause of the inner application to `true`\n\n.\n\nOn line 24, the result type `?2`\n\nis unified with `Bool -> ?3`\n\nbecause of the outer application to `true`\n\n.\n\nWe end up with `?3`\n\n, an unbound type var, as the program's type.\n\nBut this program should have produced a type error because `Bool`\n\nis applied to `Bool`\n\n!\n\nThe issue is the type of the lambda becoming `?0 -> ?1`\n\n. After all, this is an identity function--the type of the lambda should be `?0 -> ?0`\n\n. However, generalizing `x`\n\n's type effectively separates it from `?0`\n\n, so while `?0`\n\ncan continue to get unified with `Bool`\n\n, that information doesn't get propagated to `?1`\n\n.\n\nBecause generalization happens at a let binding, we can't allow type variables that are mentioned outside of the let binding to be generalized. That outer type variable and the inner type variable (once instantiated) can end up unifying with different types and diverging, letting you make incorrect assumptions about that type.\n\nIn the above example, if we knew that `?0`\n\n's scope escaped outside the let binding, we would not generalize it. The lambda's type would be `?0 -> ?0`\n\nand we'd end up trying to unify `Bool`\n\nwith an arrow type, leading to a type error.\n\nSo if our rule is: only generalize type variables inside the scope of the right-hand-side of the let binding, how do we actually track the scope of a type variable?\n\nSince the scope corresponds to the right-hand-side of a let binding, the more nested a let binding is, the deeper its scope. This means we can treat the scope as a positive integer, where a deeper scope (a.k.a a deeper let binding) has a higher number.\n\nTurns out, we just need a single global integer corresponding to the current scope the type-checker is at--we'll call it `current_scope`\n\n. When we enter a let binding, we increment its value via `enter_scope`\n\n. When we exit the right-hand-side of the let binding, we decrement its value via `leave_scope`\n\n.\n\n``` js\n(* The scope is an integer counter that holds the depth of the current\n   let binding. Every unbound type variable contains the scope at which\n   it was created. *)\ntype scope = int\n\n(* Global state that stores the current scope. *)\nlet current_scope = ref 1\nlet enter_scope () = Int.incr current_scope\nlet leave_scope () = Int.decr current_scope\n```\n\nOkay we know the current scope, but now we need to associate each type variable with the scope it was created at. We need to modify our definition of `tv`\n\n(type variable) to store its scope, as well as modify our `fresh_unbound_var`\n\nfunction to construct a fresh type variable with the current scope.\n\n```\n(* A type variable *)\ntype tv =\n  | Unbound of id * row_constraint * scope\n    (* Unbound type variable: Holds the type variable's unique name, any\n       row constraint, and the scope at which it was created. *)\n  | Link of ty (* Link type variable: Holds a reference to a type. *)\n\n(* Generate a fresh unbound type variable with a unique name, an optional\n   row constraint, and the current scope. *)\nlet fresh_unbound_var ?(row=NoRow) () =\n    let n = !gensym_counter in\n    Int.incr gensym_counter;\n    let tvar = \"?\" ^ Int.to_string n in\n    TyVar (ref (Unbound (tvar, row, !current_scope)))\n```\n\nIf the scope of a type variable is greater than the `current_scope`\n\n, we know it is deeper (contained inside), and it is safe to generalize. If we take another look at our previous example\n\n``` php\n(fun x -> let y = x in y) true true\n```\n\nwith the updated generalization logic, we'll see\n\n``` php\n 1. (fun x -> let y = x in y) true true              EApp\n 2.     (fun x -> let y = x in y) true               EApp\n 3.         fun x -> let y = x in y                  ELam\n 4.             ty_param = (?0, 1)                   (scope is 1)\n 5.             env' = (\"x\", forall _. ?0) :: env\n 6.                 let y = x in y                   ELet (scope is 2)\n 7.                     x                            EVar\n 8.                         inst (forall _. ?0)\n 9.                         ?0\n10.                     ty_gen = gen ?0              1 <= 2\n11.                     env = (\"y\", forall _. ?0)\n12.                     y                            EVar\n13.                         inst (forall _. ?0)\n14.                         ?0\n15.                 ?0 -> ?0\n16.         true                                     EBool\n17.         ty_res = ?1\n18.         ty_arr = Bool -> ?1\n19.         unify (?0 -> ?0) (Bool -> ?1)\n20.         Bool\n21.     true                                         EBool\n22.     ty_res = ?2\n23.     ty_arr = Bool -> ?2\n24.     unify Bool (Bool -> ?2)                      TYPE ERROR\n```\n\nNote how the generalized type (the type of `y`\n\non line 11) does not have `?0`\n\nas a type parameter, since `?0`\n\n's scope (`1`\n\n) is not greater than `current_scope`\n\n(`2`\n\n). When the generalized type gets instantiated, it just returns `?0`\n\n, giving the lambda the type `?0 -> ?0`\n\n. We see further down that this leads to unifying `Bool`\n\nwith `Bool -> ?2`\n\n, which is a type error, as we expected.\n\nThis leads us to ask another question. What happens when a type variable `tv`\n\nin an outer scope gets unified with a type `ty`\n\ncontaining type variables in an inner scope? Since we think of unification as equating two types, we should interpret this scenario as `ty`\n\nreplacing all occurrences of `tv`\n\n. This means that all of the type variables inside of `ty`\n\nshould have the scope of `tv`\n\n(a.k.a the outer scope). In implementation terms, this is the minimum of the two scopes.\n\nSo what we want is when `unify`\n\ning a `TyVar`\n\nwith another type, we traverse the other type for its type variables, and update their scope to be the minimum. If we take a look at our `TyVar`\n\ncase in `unify`\n\n, we already do a traversal of the other type inside of `occurs`\n\n. We can take advantage of this and update the scope inside the occurs check.\n\n``` php\n| TyVar tv, ty | ty, TyVar tv ->\n    (* If either type is a type variable, ensure that the type variable does\n       not occur in the type. Update the scopes while you're at it. *)\n    occurs tv ty;\n    (* Link the type variable to the type. *)\n    tv := Link ty\n```\n\nThen, after the occurs check when we `Link`\n\nto the target, the source type variable is automatically updated to point to the target.\n\nLet's look at how we'll modify the `occurs`\n\ncheck. We need to add an additional case.\n\n``` php\n| TyVar ({ contents = Unbound (id, tgt_row, tgt_scope) } as tgt) ->\n    (* Recurse into the target's row constraint to lower scopes there too. *)\n    row_iter tgt_row (fun (_, ty) -> occurs src ty);\n    (* Grab src and tgt's scopes. *)\n    let { contents = Unbound(_, _, src_scope) } = src in\n    (* Compute the minimum of their scopes (the outermost scope). *)\n    let min_scope = min src_scope tgt_scope in\n    (* Update the tgt's scope to be the minimum, preserving its row. *)\n    tgt := Unbound (id, tgt_row, min_scope)\n```\n\nSince `Unbound`\n\ncarries `row_constraint`\n\nas well as `scope`\n\nnow, the destructure and update both have to include it. We also recurse into the row's types so their scopes get lowered too.\n\nNow the overall occurs check looks like\n\n```\n(* Occurs check: check if a type variable (src) occurs in a type (ty).\n   If it does, raise an exception. Otherwise, update the scopes of the\n   type variables in ty to be the minimum of its scope and the scope of src. *)\nlet rec occurs (src : tv ref) (ty : ty) : unit =\n    (* Follow all the links. If we see a type variable, it will only be Unbound. *)\n    match force ty with\n    | TyVar tgt when phys_equal src tgt ->\n        (* src type variable occurs in ty. *)\n        raise OccursCheck\n    | TyVar ({ contents = Unbound (id, tgt_row, tgt_scope) } as tgt) ->\n        row_iter tgt_row (fun (_, ty) -> occurs src ty);\n        let { contents = Unbound(_, _, src_scope) } = src in\n        let min_scope = min src_scope tgt_scope in\n        tgt := Unbound (id, tgt_row, min_scope)\n    | TyArrow(from, dst) ->\n        occurs src from;\n        occurs src dst;\n    | _ -> ()\n```\n\nTo fully ensure that the scope of a type variable gets updated to be the outermost one, we also need to update the TyVar-TyVar arm in `unify`\n\nthat unions row constraints. That arm doesn't go through the `occurs`\n\ncheck, so we set the min scope there directly:\n\n``` php\n| TyVar other when not (phys_equal tv other) ->\n    let Unbound(id, other_row, other_scope) = !other in\n    row_iter other_row (fun (_, ty) -> occurs tv ty);\n    let min_scope = min src_scope other_scope in\n    let row = union_rows env tv_row other_row in\n    other := Unbound(id, row, min_scope)\n```\n\nThe change is just pulling out `other_scope`\n\n, taking the minimum with `src_scope`\n\n, and stamping it on the updated `other`\n\n.\n\nNow let's actually look into how generalization is implemented. `gen`\n\nwill be a function that accepts a `ty`\n\nand returns a `generic_ty`\n\n.\n\n``` js\nlet gen (ty: ty) : generic_ty =\n    let type_params = Hash_set.create (module String) in\n```\n\nWe want to walk over the type to find all of its type variables, and grab the `id`\n\ns of the ones whose scope is greater than the `current_scope`\n\n. We keep track of those ids in a `Hash_set`\n\ncalled `type_params`\n\n. Those will be the type parameters in our generalized type.\n\nWe create a helper (`gen'`\n\n) to recur down the type and return the generalized type. The first case is the only interesting one (the rest are just recurring over the `ty`\n\n). If the `scope`\n\nof the `Unbound`\n\ntype variable is greater than `current_scope`\n\n, we add the type variable's `id`\n\nto the `type_params`\n\nhash set and return up a `TyName`\n\nwith that `id`\n\nto reference that type parameter. (Remember when `inst`\n\nantiation comes around, it will look for `TyName`\n\ns that correspond to those type parameters.) We also mutate the tvar to `Link`\n\nto the generalized `TyName`\n\n, so any later code that walks the typed AST doesn't see it as still Unbound.\n\n``` js\nlet rec gen' ty =\n    match force ty with\n    | TyVar ({ contents = Unbound (id, _, scope) } as tv) when scope > !current_scope ->\n        Hash_set.add type_params id;\n        tv := Link (TyName id);\n        TyName id\n    | TyArrow (from, dst) -> TyArrow (gen' from, gen' dst)\n    | ty -> ty\nin\n```\n\nWe ignore the row constraint for now and come back to it when we handle row polymorphism.\n\nFinally, we call `gen'`\n\non the input `ty`\n\n, get a sorted list of type parameters from the hash set, and return up our `generic_ty`\n\n.\n\n``` js\nlet ty = gen' ty in\nlet type_params = Hash_set.to_list type_params |> List.sort ~compare in\n{ type_params; ty }\n```\n\nOverall, our `gen`\n\nimplementation looks as follows:\n\n``` js\nlet gen (ty: ty) : generic_ty =\n    let type_params = Hash_set.create (module String) in\n    let rec gen' ty =\n        match force ty with\n        | TyVar ({ contents = Unbound (id, _, scope) } as tv) when scope > !current_scope ->\n            Hash_set.add type_params id;\n            tv := Link (TyName id);\n            TyName id\n        | TyArrow (from, dst) -> TyArrow (gen' from, gen' dst)\n        | ty -> ty\n    in\n    let ty = gen' ty in\n    let type_params = Hash_set.to_list type_params |> List.sort ~compare in\n    { type_params; ty }\n```\n\nSince we've finished writing the generalization procedure, let's update our type inference logic in `infer`\n\n. Most of our cases will actually not be affected. Just our `ELam`\n\n, `ELet`\n\n, and `ELetRec`\n\ncases.\n\nYou might wonder why `ELam`\n\nis affected, since we only want to generalize at let bindings. This is correct. We don't want to generalize in `ELam`\n\n. However, the bindings in our environment have `generic_ty`\n\n. In `ELam`\n\n, when we add the `param`\n\nto the environment with a fresh type variable, we need to wrap its type inside a `generic_ty`\n\n, without actually generalizing it. Let's make a function `dont_generalize`\n\nfor this purpose.\n\n```\n(* The environment stores generic types, but sometimes, we need to\n   associate a non-generalized type to a variable. This function\n   wraps a type into a generic type. *)\nlet dont_generalize ty : generic_ty = { type_params = []; ty }\n```\n\nSo now, our `ELam`\n\ncase has to add the binding `(param, VarBind (dont_generalize ty_param))`\n\ninstead of `(param, VarBind ty_param)`\n\n. All in all, it should look like\n\n``` php\n| ELam (param, body) ->\n    (* Instantiate a fresh type variable for the lambda parameter, and\n        extend the environment with the param and its type. *)\n    let ty_param = fresh_unbound_var () in\n    let env' = (param, VarBind (dont_generalize ty_param)) :: env in\n    (* Typecheck the body of the lambda with the extended environment. *)\n    let body = infer env' body in\n    (* Return a synthesized arrow type from the parameter to the body. *)\n    TELam (param, body, TyArrow ( ty_param, typ body ))\n```\n\nNext, our `ELet`\n\ncase will be the first interesting one. As mentioned before, we want to `enter_scope()`\n\nat its beginning, and `leave_scope()`\n\nafter inferring the right-hand-side of the binding. The beginning of the case should now look like\n\n``` php\n| ELet ((id, ann, rhs), body) ->\n    enter_scope();\n    let rhs =\n        match ann with\n        | Some ann ->\n            let (extras, check_ty) = as_rigid ann in\n            check (extras @ env) check_ty rhs\n        | None -> infer env rhs\n    in\n    leave_scope();\n    ...\n```\n\nOn top of this change, we want to generalize the type of the right-hand-side when there's no annotation. If there is an annotation, we already have the polymorphic type we want, so we use the annotation directly.\n\n``` php\nlet ty_gen =\n    match ann with\n    | Some ann -> ann\n    | None -> gen (typ rhs)\nin\n```\n\nThat's the type we add to the environment for the binding.\n\n``` js\nlet env = (id, VarBind ty_gen) :: env in\n```\n\nAnd the rest is kept as usual. Overall, the `ELet`\n\ncase should look like\n\n``` php\n| ELet ((id, ann, rhs), body) ->\n    enter_scope();\n    let rhs =\n        match ann with\n        | Some ann ->\n            let (extras, check_ty) = as_rigid ann in\n            check (extras @ env) check_ty rhs\n        | None -> infer env rhs\n    in\n    leave_scope();\n    let ty_gen =\n    match ann with\n    | Some ann -> ann\n    | None -> gen (typ rhs)\nin\n    let env = (id, VarBind ty_gen) :: env in\n    let body = infer env body in\n    TELet ((id, ann, rhs), body, typ body)\n```\n\n[Try let generalization](#try-let-generalization)\n\nHere's a tool to visualize the process of let generalization. Pick the naive rule or the scope-checked one, then step through to see how the trace differs between them.\n\nThe `ELetRec`\n\ncase is slightly more complicated. To type-check recursive let bindings, we want to *delay* generalization until after each let binding is inferred. This means that mutually recursive bindings will not be referencing generic versions of each other. Why is this? Turns out that referencing generic versions of each other while fully inferring their types is undecidable.\n\nHere is an example of polymorphic recursion from the [ocaml docs](https://v2.ocaml.org/manual/polymorphism.html#s%3Apolymorphic-recursion).\n\n``` js\ntype 'a nested =\n    | List of 'a list\n    | Nested of 'a list nested\n\nlet rec depth = function\n    | List _ -> 1\n    | Nested n -> 1 + depth n\n```\n\nLooks like a fairly simple function, but the issue here is that the inner call to `depth n`\n\nends up trying to unify `'a nested`\n\nwith `'a list nested`\n\n, which is not satisfiable. The type-checker doesn't realize that the `'a nested`\n\n`depth`\n\nwas initially called on is different from the `'a list nested`\n\nthat's the element type. This can be solved by providing an explicit annotation to `depth`\n\n, like `forall 'a. 'a nested -> int`\n\n. However, there are other issues related to polymorphic recursion in that it can't be monomorphized, and leads to inefficient implementation. In practice, not having polymorphic recursion is not an issue. Most of the recursive functions you'll ever want to define can be written and inferred in this setting.\n\n## Aside\n\nThe undecidability of type inference for polymorphic recursion was shown by Fritz Henglein in the paper \"Type Inference with Polymorphic Recursion\". They show that semi-unification reduces to type inference for polymorphic recursion, which implies it's undecidable.\n\nWe still want to `enter_scope()`\n\nat the beginning. After that, we run `as_rigid`\n\non each declaration's annotation (if it has one), storing the result in a list called `prepared`\n\nso we can reuse it across the next few passes.\n\n``` php\n| ELetRec (decls, body) ->\n    enter_scope();\n    let prepared = List.map decls ~f:(fun (id, ann, rhs) ->\n        match ann with\n        | Some ann ->\n            let (extras, check_ty) = as_rigid ann in\n            (id, Some ann, rhs, extras, check_ty)\n        | None ->\n            (id, None, rhs, [], fresh_unbound_var ()))\n    in\n```\n\nWhen we map over each prepared declaration to add its binding to the environment, since we aren't generalizing until later, we just want to add the non-generalized versions by wrapping each `check_ty`\n\nin `dont_generalize`\n\n. This time, the extended environment is set to a variable named `env_with_decls`\n\ninstead of `env`\n\n, because we want to keep the old `env`\n\naround so we can add the generalized bindings.\n\n``` php\nlet env_decls = List.map prepared ~f:(fun (id, _, _, _, check_ty) ->\n    (id, VarBind (dont_generalize check_ty)))\nin\nlet env_with_decls = env_decls @ env in\n```\n\nWhen we check the right-hand-side of the declarations using the types in the extended environment (`env_with_decls`\n\n), we add the rigid `extras`\n\nto it and check against `check_ty`\n\n. After all of the right-hand-side expressions in the recursive let binding have been inferred, then we can `leave_scope()`\n\n. This list of declarations needs to be a `tlet_decl list`\n\n, since that's what `TELetRec`\n\nexpects.\n\n``` js\nlet tdecls : tlet_decl list = List.map prepared ~f:(fun (id, ann, rhs, extras, check_ty) ->\n    let trhs = check (extras @ env_with_decls) check_ty rhs in\n    (id, ann, trhs))\nin\nleave_scope();\n```\n\nNow we generalize the types of all the bindings. As with `ELet`\n\n, if there's an annotation we use it directly, otherwise we call `gen`\n\non the inferred type.\n\n``` php\nlet generalized_bindings = List.map tdecls ~f:(fun (id, ann, rhs) ->\n    let ty_gen =\n        match ann with\n        | Some ann -> ann\n        | None -> gen (typ rhs)\n    in\n    (id, VarBind ty_gen))\nin\n```\n\nFinally, we add it to the original `env`\n\nso we can use it to infer the body of the recursive let binding.\n\n``` js\nlet env_body = generalized_bindings @ env in\nlet body = infer env_body body in\nTELetRec (tdecls, body, typ body)\n```\n\nOverall, our updated implementation of `ELetRec`\n\nlooks like\n\n``` php\n| ELetRec (decls, body) ->\n    enter_scope();\n    let prepared = List.map decls ~f:(fun (id, ann, rhs) ->\n        match ann with\n        | Some ann ->\n            let (extras, check_ty) = as_rigid ann in\n            (id, Some ann, rhs, extras, check_ty)\n        | None ->\n            (id, None, rhs, [], fresh_unbound_var ()))\n    in\n    let env_decls = List.map prepared ~f:(fun (id, _, _, _, check_ty) ->\n        (id, VarBind (dont_generalize check_ty)))\n    in\n    let env_with_decls = env_decls @ env in\n    let tdecls : tlet_decl list = List.map prepared ~f:(fun (id, ann, rhs, extras, check_ty) ->\n        let trhs = check (extras @ env_with_decls) check_ty rhs in\n        (id, ann, trhs))\n    in\n    leave_scope();\n    let generalized_bindings = List.map tdecls ~f:(fun (id, ann, rhs) ->\n        let ty_gen =\n            match ann with\n            | Some ann -> ann\n            | None -> gen (typ rhs)\n        in\n        (id, VarBind ty_gen))\n    in\n    let env_body = generalized_bindings @ env in\n    let body = infer env_body body in\n    TELetRec (tdecls, body, typ body)\n```\n\n## Aside\n\nHere's our typing rule for `ELet`\n\n.\n\n`FV(A)`\n\nand `FV(Γ)`\n\nare the sets of type variables that appear free in `A`\n\nand `Γ`\n\nrespectively. We write `vars`\n\nfor `FV(A) \\ FV(Γ)`\n\n, the type variables free in `A`\n\nbut not in `Γ`\n\n. This rule basically says that if `rhs`\n\nhas the type `A`\n\nin `Γ`\n\n, and `Γ`\n\nextended with `x`\n\nhaving the type `∀ vars. A`\n\nlets us give the body the type `B`\n\n, then `let x = rhs in body`\n\nhas type `B`\n\n.\n\nAnd here's our typing rule for `ELetRec`\n\n.\n\nHere, we have the same `decls`\n\n, `A_x`\n\n, `rhs_x`\n\n, and `Γ_rec`\n\nas in the original T-LetRec rule. `vars_x`\n\nis the set of type variables we generalize for the binding `x`\n\n. This rule basically says that if each `rhs_x`\n\nhas the type `A_x`\n\nin `Γ_rec`\n\n, and `Γ`\n\nextended with each `x`\n\nhaving the type `∀ vars_x. A_x`\n\nlets us give the body the type `B`\n\n, then `let rec { x = rhs_x | x ∈ decls } in body`\n\nhas type `B`\n\n. Note that when the rhs is inferred, `x`\n\nis not polymorphic (i.e. it's a monotype), so in our formulation of T-LetRec, polymorphic recursion isn't supported.\n\nThe original Damas-Milner formulation factors generalization out as its own rule:\n\nThis says that if `e`\n\nhas the type `A`\n\nin `Γ`\n\n, and `a`\n\nis any type variable not free in `Γ`\n\n, then `e`\n\ncan be given the type `∀a. A`\n\n. With T-Gen-DM in place, the simpler versions of T-Let and T-LetRec don't need to generalize themselves. They assume `A`\n\nis already polymorphic where needed, and just thread it through. Chaining applications of T-Gen-DM for each variable in `FV(A) \\ FV(Γ)`\n\ngives us the same generalization present in our T-Let and T-LetRec rules.\n\nWoo! That was a doozy. But we got through it now. How about we take a look at some examples to celebrate?\n\nYou can find and run these examples in [lib/six.ml](lib/six.ml).\n\n[Examples](#examples-6)\n\n``` php\ntypecheck_source {|\n    type A = {}\n    let a : A = {} in\n    let f = fun x -> x in\n    let _ = f a in\n    f true\n|}\n```\n\nOutput: `bool`\n\n``` php\ntypecheck_source {|\n    let rec fix = fun f -> fun x -> f (fix f) x in\n    fix (fun f -> fun arg -> if arg then f false else true)\n|}\n```\n\nOutput: `bool -> bool`\n\n``` js\ntypecheck_source {|\n    type A = {}\n    let a : A = {} in\n    let f : forall 'a. 'a -> bool = fun x -> true in\n    f a\n|}\n```\n\nOutput: `bool`\n\n``` php\ntypecheck_source {|\n    type A = {}\n    let f : forall 'a. 'a -> A = fun x -> true in\n    f true\n|}\n```\n\nOutput: `TypeError \"expression does not have type 'a -> A\"`\n\n``` php\ntypecheck_source \"(fun x -> let y = x in y) true true\"\n```\n\nOutput: `UnificationFailure \"failed to unify type bool with bool -> ?2\"`\n\n[Row polymorphism](#row-polymorphism)\n\nCurrently, we are able to infer the types of expressions involving records, as long as they ultimately unify to some concrete type. However, one of the beautiful things about let generalization in Hindley-Milner is that we can define a function without a type signature that can operate on values of different types. What if we could apply that polymorphism to records?\n\nFor example, given the records\n\n``` php\ntype Foo = { x : bool }\ntype Bar = { x : bool -> bool }\nlet r1 : Foo = { x = true } in\nlet r2 : Bar = { x = fun y -> y } in\n```\n\nand the function\n\n``` php\nlet f = fun r -> r.x in\n```\n\nWe'd like to be able to invoke `f`\n\non both records, that is `f r1`\n\nand `f r2`\n\n. There's no reason why `r.x`\n\ncan't operate on both records, since they both have fields named `x`\n\n. Similarly with `{ r with x = true }`\n\n, the only requirement on `r`\n\nis that it contain a field `x`\n\nwhose type is `bool`\n\n. How do we encode this in our type system?\n\nRecall when inferring the type of record projection, we give the record a fresh unbound type variable with an `OpenRow`\n\nconstraint. We expressed to the type-checker \"we don't know what record this is but we know it should contain a field named `x`\n\n.\" So at the expression level, that constraint is already attached to the `Unbound`\n\ntype variable for `r`\n\n.\n\nThe problem is what happens at generalization. When `f`\n\nis normally generalized, the `OpenRow`\n\nconstraint isn't preserved, so you end up with `forall 'a 'b. 'a -> 'b`\n\n. We need some way of encoding that `'a`\n\nis a record with a field `x : 'b`\n\n.\n\nHere's the syntax for that:\n\n``` php\nforall 'a 'b. 'a :: { x : 'b, ... } => 'a -> 'b\n```\n\nThe piece between `::`\n\nand `=>`\n\nis a row constraint, and it corresponds directly to the `OpenRow`\n\nthe expression-level inference produced for `r`\n\n. So row polymorphism is really about carrying that constraint through `gen`\n\n, `inst`\n\n, and `unify`\n\n. The expression-level inference doesn't have to change at all.\n\n## Aside\n\nOur approach borrows ideas from Atsushi Ohori's [A Polymorphic Record Calculus and Its Compilation](https://dl.acm.org/doi/10.1145/218570.218572), including the `::`\n\nsyntax for kind-based constraints on type variables. There are many other approaches to row polymorphism. Mitchell Wand's original row variables and Didier Rémy's presence/absence type system handle simple rows where labels must be unique. Daan Leijen's [Extensible records with scoped labels](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/scopedlabels.pdf) lifts that restriction with scoped rows. More recent work by Morris and McKinna in [Abstracting Extensible Data Types: Or, Rows by Any Other Name](https://dl.acm.org/doi/10.1145/3290325) introduces a system called Rose that abstracts over rows via qualified types and captures both simple and scoped rows. Its successor, [Generic Programming with Extensible Data Types](https://dl.acm.org/doi/10.1145/3607843) by Hubers and Morris, extends Rose with first-class labels and generic programming over rows, enabling polymorphic equality on extensible records and variants.\n\nOkay so given that type signature, it's clear we need to update `generic_ty`\n\nand `TypeVarBind`\n\nto carry row constraints. Then we need to update `gen`\n\n, `inst`\n\n, and `unify`\n\naccordingly.\n\n```\ntype generic_ty = {\n    type_params : (id * row_constraint) list;\n    ty : ty;\n}\n```\n\nSimilarly, we want type variables sitting in our environment to hold onto those row constraints as well.\n\n```\ntype bind =\n    ...\n    | TypeVarBind of row_constraint\n```\n\nAnd correspondingly, we have to update the producers and consumers of `generic_ty`\n\n. Let's start with the simplest: `as_rigid`\n\n.\n\n``` js\nlet as_rigid (gty: generic_ty) : env * ty =\n    let extras = List.map gty.type_params ~f:(fun (id, row) -> (id, TypeVarBind row)) in\n    (extras, gty.ty)\n```\n\nNext, let's update `gen`\n\n. We're going to create a small helper to map over the fields of a row constraint.\n\n``` php\nlet map_row ~f row =\n    match row with\n    | NoRow -> NoRow\n    | OpenRow flds -> OpenRow (List.map flds ~f:(fun (id, ty) -> (id, f ty)))\n    | ClosedRow flds -> ClosedRow (List.map flds ~f:(fun (id, ty) -> (id, f ty)))\n```\n\nNow `gen`\n\njust needs to map over the row constraint to generalize unbound type variables present in the row constraint.\n\n```\n| TyVar ({ contents = Unbound (id, row, scope) } as tv) when scope > !current_scope ->\n    Hashtbl.set type_params ~key:id ~data:(map_row ~f:gen' row);\n```\n\nWe also change `type_params`\n\nfrom a `Hash_set`\n\nto a `Hashtbl`\n\nso we can map a type variable to its row_constraint.\n\nOverall, our `gen`\n\nimplementation looks like\n\n``` js\nlet gen (ty: ty) : generic_ty =\n    let type_params : (id, row_constraint) Hashtbl.t = Hashtbl.create (module String) in\n    let rec gen' ty =\n        match force ty with\n        | TyVar ({ contents = Unbound (id, row, scope) } as tv) when scope > !current_scope ->\n            Hashtbl.set type_params ~key:id ~data:(map_row ~f:gen' row);\n            tv := Link (TyName id);\n            TyName id\n        | TyArrow (from, dst) -> TyArrow (gen' from, gen' dst)\n        | ty -> ty\n    in\n    let ty = gen' ty in\n    let type_params =\n        Hashtbl.to_alist type_params\n        |> List.sort ~compare:(fun (a,_) (b,_) -> String.compare a b)\n    in\n    { type_params; ty }\n```\n\nSimilarly, we update `inst`\n\n. It's mostly the same as before: create fresh type variables for each type parameter, then walk the type body replacing each `TyName`\n\nwith its corresponding type variable.\n\nHowever, this time, we need to also preserve whatever row constraint the type parameter had, recursing over the row to instantiate its type variables as well.\n\nOverall, our `inst`\n\nimplementation looks like\n\n``` js\nlet inst (gty: generic_ty) : ty =\n    let tbl = Hashtbl.create (module String) in\n    List.iter gty.type_params ~f:(fun (pid, _) ->\n        Hashtbl.set tbl ~key:pid ~data:(fresh_unbound_var ()));\n    let rec inst' ty =\n        match force ty with\n        | TyName id as ty -> (\n            match Hashtbl.find tbl id with\n            | Some tv -> tv\n            | None -> ty)\n        | TyArrow (from, dst) -> TyArrow (inst' from, inst' dst)\n        | ty -> ty\n    in\n    (* Attach the instantiated row constraint to each fresh type variable in the table. *)\n    List.iter gty.type_params ~f:(fun (pid, row) ->\n        match row with\n        | NoRow -> ()\n        | _ ->\n            match Hashtbl.find_exn tbl pid with\n            | TyVar tv ->\n                let Unbound(id, _, scope) = !tv in\n                tv := Unbound(id, map_row ~f:inst' row, scope)\n            | _ -> failwith \"unreachable: tbl always holds TyVars\");\n    inst' gty.ty\n```\n\nWe now need to update `unify`\n\nto handle type variables that carry row constraints, like those in a type annotation that `as_rigid`\n\nputs in env as rigid TyNames. We need to ensure that fields in a type variable's row constraint exist in the rigid type's row constraint. We'll pull this into a helper called `check_rigid_subset`\n\n.\n\nIf the type variable has no row constraint, it's trivially a subset of anything else.\n\n```\nand check_rigid_subset env tname tv_row rigid_row =\n    match tv_row, rigid_row with\n    | NoRow, _ -> ()\n    ...\n```\n\nIf the type variable has a row but the rigid type doesn't, then the rigid type isn't record-shaped, so we throw an error.\n\n``` php\n| _, NoRow -> raise (expected_ty_error \"record\" tname)\n...\n```\n\nIf the type variable has an OpenRow, we check if each of its fields exists in the rigid type's row by calling `fld_exists`\n\n, which if you recall tries to unify the field's type.\n\n``` php\n| OpenRow flds, OpenRow rigid_flds\n| OpenRow flds, ClosedRow rigid_flds ->\n    List.iter flds ~f:(fun (id, ty) ->\n    if not (fld_exists env rigid_flds id ty) then\n        raise (row_mismatch tv_row rigid_row))\n...\n```\n\nThe catch-all covers when the type variable has a ClosedRow, which means it's a record literal that hasn't found its nominal type yet. A rigid type variable isn't a nominal type so they can't be bound.\n\n``` php\n| _ -> raise (row_mismatch tv_row rigid_row)\n```\n\nOverall, `check_rigid_subset`\n\nlooks like\n\n```\n(* Check that tv_row's fields are contained within rigid_row. *)\nand check_rigid_subset env tname tv_row rigid_row =\n    match tv_row, rigid_row with\n    | NoRow, _ -> ()\n    | _, NoRow -> raise (expected_ty_error \"record\" tname)\n    | OpenRow flds, OpenRow rigid_flds\n    | OpenRow flds, ClosedRow rigid_flds ->\n        List.iter flds ~f:(fun (id, ty) ->\n        if not (fld_exists env rigid_flds id ty) then\n            raise (row_mismatch tv_row rigid_row))\n    | _ -> raise (row_mismatch tv_row rigid_row)\n```\n\nNow the `TypeVarBind`\n\ncase just becomes a call to `check_rigid_subset`\n\n.\n\n``` php\n| TyVar tv, ty | ty, TyVar tv ->\n    let Unbound(_, tv_row, _) = !tv in\n    (match ty with\n    | TyName tname ->\n        (match lookup_binding tname env with\n        | TypeVarBind rigid_row -> check_rigid_subset env tname tv_row rigid_row\n        ...)\n    ...\n```\n\n## Aside\n\nHere are the typing rules for projection and with-expressions on row-polymorphic type variables. They mirror T-Proj and T-With from the section on [Type declarations](#type-declarations), but get their fields from `TypeVarBind`\n\ninstead of `TypeBind`\n\n.\n\nWe update the T-Let rule to hold row constraints in the quantified type, where the row constraint for type variable `a`\n\nis denoted by `R_a`\n\n.\n\nT-LetRec is similarly updated to carry row constraints in polymorphic types, but for each binding.\n\nNow let's look at some examples.\n\nYou can find and run these examples in [lib/seven.ml](lib/seven.ml).\n\n[Examples](#examples-7)\n\nIn this example, `'r`\n\nis a rigid type variable in the environment with an OpenRow constraint `{ x : bool, ... }`\n\n. The body's `r.x`\n\ncreates an OpenRow constraint `{ x : ?_ }`\n\non `r`\n\n's type variable. `check_rigid_subset`\n\nconfirms `r`\n\n's row is contained in `'r`\n\n's row.\n\n``` js\ntypecheck_source {|\n    type Foo = { x : bool, y : bool }\n    let get_x : forall 'r. 'r :: { x : bool, ... } => 'r -> bool =\n        fun r -> r.x\n    in\n    let foo : Foo = { x = true, y = false } in\n    get_x foo\n|}\n```\n\nOutput: `bool`\n\nIn this example, `gen`\n\ngives `f`\n\na row-polymorphic type, a function whose input can be any record with an `x`\n\nfield. Each application instantiates `f`\n\n's type to a fresh type variable with an OpenRow constraint, which is unified with the argument.\n\n``` php\ntypecheck_source {|\n    type Foo = { x : bool }\n    type Bar = { x : bool -> bool }\n    let r1 : Foo = { x = true } in\n    let r2 : Bar = { x = fun y -> y } in\n    let f = fun r -> r.x in\n    let _ = f r1 in\n    f r2\n|}\n```\n\nOutput: `bool -> bool`\n\nIn this example, `get_x`\n\n's type is instantiated to a fresh type variable with the OpenRow constraint `{ x : bool, ... }`\n\n. That type variable is unified with `Foo`\n\n, but when unify tries to union their fields together, it finds that `Foo`\n\ndoesn't have an `x`\n\nfield.\n\n``` js\ntypecheck_source {|\n    type Foo = { y : bool }\n    let get_x : forall 'r. 'r :: { x : bool, ... } => 'r -> bool =\n    fun r -> r.x\n    in\n    let foo : Foo = { y = true } in\n    get_x foo\n|}\n```\n\nOutput: `RowMismatch \"{x: bool, ...} and {y: bool}\"`\n\nThat's row polymorphism! We can write functions that are polymorphic over the record types they operate on. Note that we didn't have to touch `infer`\n\nat all. All of our changes happened with `generic_ty`\n\n, `gen`\n\n, `inst`\n\n, `unify`\n\n, plus some new helpers. This makes sense given that nothing about our expression language has changed. We just want to make our functions more generic.\n\n[Generic type declarations](#generic-type-declarations)\n\nYou'll notice in our examples above, the type declarations are not generic. However, in languages like Java or ML, you have access to types like `List`\n\nthat are instantiated with some type argument. Similarly to row polymorphism, we won't need to change anything about our expression-level inference. All of the work happens at the level of `gen`\n\n, `inst`\n\n, `unify`\n\n, and `occurs`\n\n. The first thing we'll want to do is modify our definition of `tycon`\n\nto contain type parameters.\n\n```\ntype tycon = {\n    name : id;\n    type_params : id list;\n    ty : record_ty;\n}\n```\n\nThis allows us to write definitions like\n\n```\ntype Box<T> = {\n    x: T\n}\n```\n\nwhere `Box`\n\nis a type constructor with a single type parameter. We need some way to represent a concrete type like `Box<bool>`\n\n, which is the result of type *application*. We will modify our `ty`\n\ndefinition to add a `TyApp`\n\nvariant.\n\n```\ntype ty =\n    ...\n    | TyApp of ty list (* Type application: head :: args, e.g. TyApp[TyName \"box\"; TyBool] *)\n```\n\nA `Box<Bool>`\n\nwould be represented as `TyApp [TyName \"Box\"; TyBool]`\n\n.\n\nNext, we need to update our `gen`\n\n, `inst`\n\n, and `occurs`\n\nimplementations to handle `TyApp`\n\n. They all follow the same pattern of recursively mapping over the `TyApp`\n\n's list.\n\nHere's `gen`\n\n:\n\n``` php\nlet rec gen' ty =\n    match force ty with\n    ...\n    | TyApp app -> TyApp (List.map app ~f:gen')\n    ...\n```\n\nHere's `inst`\n\n:\n\n``` php\nlet rec inst' ty =\n    match force ty with\n    ...\n    | TyApp app -> TyApp (List.map app ~f:inst')\n    ...\n```\n\nAnd here's `occurs`\n\n:\n\n``` js\nlet rec occurs (src : tv ref) (ty : ty) : unit =\n    match force ty with\n    ...\n    | TyApp app -> List.iter app ~f:(occurs src)\n    ...\n```\n\nWe now want to update `unify`\n\nto handle type applications. Conceptually, this is divided into two cases. The first is the case where you have two `TyApp`\n\ns and want to unify them. This is just a matter of checking that their type argument lists unify. That is, they are the same length and that each type argument of the first list unifies with the corresponding type argument of the second list.\n\n```\nand unify env (t1 : ty) (t2 : ty) : unit =\n    ...\n    | TyApp app1, TyApp app2 when List.length app1 = List.length app2 ->\n      List.iter2_exn app1 app2 ~f:(unify env)\n    ...\n```\n\nThe second case is where a type variable gets unified with a reference to some pre-declared type constructor that is possibly applied to some arguments. This case takes effect in `unify`\n\nwhere one of the two types is a `TyVar`\n\n.\n\n``` php\nmatch (t1, t2) with\n...\n| TyVar tv, ty | ty, TyVar tv ->\n  ...\n```\n\nThis case is split so we can handle when the other `ty`\n\nis either `TyName`\n\n(just a type constructor's name) or `TyApp`\n\n(a type constructor applied to type arguments). For example, this would be the first case, where we'd unify `Foo`\n\nwith `ClosedRow { x: bool }`\n\n.\n\n``` js\ntype Foo = { x: bool }\nlet foo : Foo = { x = true } in foo\n```\n\nFor `Foo`\n\n, there's nothing to apply. Its fields `{ x: bool }`\n\nare already what we want to unify against.\n\nAnd this would be the second case, where we'd unify `Box bool`\n\nwith `ClosedRow { x: bool }`\n\n.\n\n``` js\ntype Box 'a = { x: 'a }\nlet box : Box bool = { x = true } in box\n```\n\nHowever, here we want to actually apply the type. That is, we want to substitute `bool`\n\ninto the argument for `Box`\n\nto get a `{ x: bool }`\n\nwe can actually unify against.\n\nWe define a helper `apply_tyapp`\n\nto do this for us.\n\n``` js\nlet apply_tyapp env (ty : ty) : record_ty =\n    match force ty with\n    ...\n```\n\nIf it's a `TyName`\n\n, there's nothing to apply. We just look the binding up in the environment, expecting a `TypeBind`\n\nholding a type constructor. Before we return its underlying record fields, we validate that the type constructor's argument list is empty, since we don't want someone to be referring to a type like `Box`\n\nin an annotation without fully applying it.\n\n``` php\n| TyName name ->\n  (match lookup_binding name env with\n  | TypeBind tc when List.is_empty tc.type_params -> tc.ty\n  | TypeBind _ -> raise (cannot_apply name)\n  | TypeVarBind _ | VarBind _ -> raise (undefined_error \"type\" name))\n```\n\nIf it's a `TyApp`\n\n, we still look up the binding, but now we have to substitute all the type arguments (like `bool`\n\nin `Box bool`\n\n) for the type parameters (like `'a`\n\nin `Box 'a`\n\n) wherever they show up in the underlying record type (so `{ x : bool }`\n\ninstead of `{ x : 'a }`\n\n). If you recall back in the [Instantiation](#instantiation) section, we also perform substitution on types. Turns out, if we create a hash table mapping from the type parameters to their corresponding type arguments, we can use the same exact logic for substitution here. Let's first avoid some duplication by extracting out the substitution logic in a helper (`inst`\n\nnow calls `substitute`\n\ninstead of `inst'`\n\n).\n\n``` js\nlet rec substitute (tbl : (id, ty) Hashtbl.t) (ty : ty) : ty =\n    match force ty with\n    | TyName id as ty ->\n        (match Hashtbl.find tbl id with\n        | Some t -> t\n        | None -> ty)\n    | TyArrow (from, dst) -> TyArrow (substitute tbl from, substitute tbl dst)\n    | TyApp app -> TyApp (List.map app ~f:(substitute tbl))\n    | ty -> ty\n```\n\nAnd our type application logic just maps over the record fields of a type constructor, and substitutes each field's type using our table, returning the resultant record.\n\n``` php\n| TyApp (TyName name :: args) ->\n  (match lookup_binding name env with\n  | TypeBind tc ->\n    (match List.zip tc.type_params args with\n     | Ok pairs ->\n       let tbl = Hashtbl.of_alist_exn (module String) pairs in\n       List.map tc.ty ~f:(fun (id, t) -> (id, substitute tbl t))\n     | Unequal_lengths -> raise (cannot_apply name))\n  | TypeVarBind _ | VarBind _ -> raise (undefined_error \"type\" name))\n```\n\nFinally, if we don't see a `TyName`\n\nor `TyApp`\n\n, we throw an error.\n\n``` php\n| _ -> failwith \"apply_tyapp: expected TyName or TyApp\"\n```\n\nOverall, our `apply_tyapp`\n\nfunction looks like\n\n``` js\nlet apply_tyapp env (ty : ty) : record_ty =\n    match force ty with\n    | TyName name ->\n        (match lookup_binding name env with\n        | TypeBind tc when List.is_empty tc.type_params -> tc.ty\n        | TypeBind _ -> raise (cannot_apply name)\n        | TypeVarBind _ | VarBind _ -> raise (undefined_error \"type\" name))\n    | TyApp (TyName name :: args) ->\n        (match lookup_binding name env with\n        | TypeBind tc ->\n            (match List.zip tc.type_params args with\n            | Ok pairs ->\n            let tbl = Hashtbl.of_alist_exn (module String) pairs in\n            List.map tc.ty ~f:(fun (id, t) -> (id, substitute tbl t))\n            | Unequal_lengths -> raise (cannot_apply name))\n        | TypeVarBind _ | VarBind _ -> raise (undefined_error \"type\" name))\n    | _ -> failwith \"apply_tyapp: expected TyName or TyApp\"\n```\n\nThat might've seemed like a long detour but let's reorient ourselves. We're back to updating `unify`\n\nunder the `TyVar`\n\nbranch. We want to handle the cases where the other type is a `TyName`\n\nor `TyApp`\n\ncorresponding to some type constructor.\n\nHere's how we handle these cases:\n\n``` php\nmatch (t1, t2) with\n...\n| TyVar tv, ty | ty, TyVar tv ->\n  let Unbound(_, tv_row, src_scope) = !tv in\n  (match ty with\n   | TyName tname ->\n     (match lookup_binding tname env with\n      ...\n      | TypeBind _ ->\n        let tycon_row = ClosedRow (apply_tyapp env ty) in\n        ignore (union_rows env tv_row tycon_row))\n   | TyApp _ ->\n     let tycon_row = ClosedRow (apply_tyapp env ty) in\n     ignore (union_rows env tv_row tycon_row)\n```\n\nWe apply the type to get its underlying record type and union its row constraints with those of our type variable that we're unifying.\n\n## Aside\n\nWe update our formation rules for type constructors by updating `TypeBind`\n\nto carry a list of type parameters.\n\nWe add T-Record-App, T-Proj-App, and T-With-App to mirror T-Record, T-Proj, and T-With from the section on [Type declarations](#type-declarations), but they substitute the type arguments into the tycon's body. We write `T[params ↦ args]`\n\nfor `T`\n\nwith each parameter in `params`\n\nreplaced by the corresponding argument in `args`\n\n.\n\nAnd so ends our process of typechecking generic type declarations! Let's look at some examples.\n\nYou can find and run these examples in [lib/eight.ml](lib/eight.ml).\n\n[Examples](#examples-8)\n\nHere's the `box bool`\n\nexample from earlier. `{ x = true }`\n\nis inferred as a `ClosedRow { x : bool }`\n\nthat's checked against the annotation. `apply_tyapp`\n\ntakes `box`\n\nand substitutes `bool`\n\nfor `'a`\n\nin its body, giving us `{ x : bool }`\n\n, which matches that closed row constraint.\n\n``` js\ntypecheck_source {|\n    type box 'a = { x : 'a }\n    let r : box bool = { x = true } in r.x\n|}\n```\n\nOutput: `bool`\n\nHere's an identity function of the type `forall 'a. box 'a -> box 'a`\n\n. It gets instantiated to `box ?0 -> box ?0`\n\n, so when it's applied to `r`\n\n, `box ?0`\n\ngets unified with `box bool`\n\n. This will hit the `(TyApp, TyApp)`\n\ncase of our `unify`\n\n.\n\n``` js\ntypecheck_source {|\n    type box 'a = { x : 'a }\n    let identity : forall 'a. box 'a -> box 'a = fun b -> b in\n    identity (let r : box bool = { x = true } in r)\n|}\n```\n\nOutput: `box bool`\n\nHere's a case where the literal's fields don't match the type constructor's. `box`\n\nrequires `x`\n\nand `y`\n\n, but the literal only provides `x`\n\n. Going through the same route as before in `apply_tyapp`\n\n, we find that `{ x : bool, y : bool }`\n\ndoesn't unify with `ClosedRow { x : bool }`\n\n, so we raise a `RowMismatch`\n\n.\n\n``` js\ntypecheck_source {|\n    type box 'a = { x : 'a, y : bool }\n    let r : box bool = { x = true } in r.x\n|}\n```\n\nOutput: `RowMismatch \"{x: bool} and {x: bool, y: bool}\"`\n\n[Side effects](#side-effects)\n\nUp until now, this language has been pure. You would think adding features like mutability and other side effects would not be a problem for a polymorphic type system like ours. However, there are gotchas. Let's say we added mutability to our language with a `Ref 'a`\n\ntype. For example, `Ref int`\n\ncorresponds to a memory location containing an `int`\n\nvalue. A `Ref 'a`\n\ncan be built with a `ref`\n\nfunction, whose type is `forall 'a. 'a -> Ref 'a`\n\n. You can retrieve the value at a `Ref 'a`\n\nvia `deref`\n\n, whose type is `forall 'a. Ref 'a -> 'a`\n\n. A shorthand operator for this is `*r`\n\n, where `r`\n\nis the name of the reference. Finally, you can update the value at an existing memory location via `update`\n\n, whose type is `forall 'a. Ref 'a -> 'a -> Unit`\n\n(`Unit`\n\nis just an empty record type). A shorthand syntax for this operation is `*r = v`\n\n, where `r`\n\nis the name of the reference and `v`\n\nis the value being stored.\n\nSo for example,\n\n``` js\nlet r = ref 0 in\n*r = (*r) + 1;\n*r\n```\n\nthis program would create a reference to a memory location containing an integer `0`\n\n, increment its contents to be one higher, and then return that value. So far so good. Now let's see what happens when we try to combine references with polymorphism.\n\n``` php\ntype A = {}\nlet r = ref (fun x -> true) in\n*r = fun x -> if x then false else true;\nlet a : A = {} in\n(*r) a\n```\n\nIn this example, we create a reference to a lambda and then generalize it. `r`\n\nwould have the generic type `forall 'a. Ref('a -> bool)`\n\n. The third line instantiates the generic type to be `Ref(?0 -> bool)`\n\nand unifies `?0 -> bool`\n\nwith the right-hand-side, whose type is `bool -> bool`\n\n. All well and good.\n\nNow comes a problem. The last line again instantiates the generic type to get `Ref(?1 -> bool)`\n\n, dereferences it, and applies the lambda to `a`\n\n. The type-checker accepts this program.\n\nHowever, if we actually run this, the third line stores a lambda that accepts a `bool`\n\nean condition and does `if condition then ...`\n\nto it. And the fifth line invokes that lambda on a value of type `A`\n\n. But `A`\n\nis not a `bool`\n\n!\n\nWhat we learn from this example is that by generalizing a variable that is mutable, each instantiation gets to ignore changes to the type that other updates made to that memory location. We don't want that, because it means our program is not type-safe! In other words, the evaluator of this language would reach an invalid state at runtime when it tries to do `if a then ...`\n\n, since `A`\n\nis not `bool`\n\n.\n\nWhat can we do about this? The immediate answer is to not generalize expressions like `ref <exp>`\n\n, so we just end up with `r`\n\n's type as `Ref(?0 -> bool)`\n\n. Then when the third line updates the location to hold `bool -> bool`\n\n, that information gets carried into the last line, where we try to unify `bool -> bool`\n\nwith `A -> ?1`\n\nand fail.\n\nWell our intuition is mostly correct, but turns out we need to ensure that expressions we generalize don't contain `ref <exp>`\n\n*anywhere* in their body. In our case, we can safely generalize *any* expression that is a constant (`EBool`\n\n), variable (`EVar`\n\n), lambda (`ELam`\n\n), or a record literal (`ERecord`\n\n) whose elements are all generalizable. Everything else, like a function application, if expression, record projection, or let binding, we won't generalize.\n\nThis is called the *syntactic* value restriction. It is the criterion Standard ML uses to handle mutability. We are basically ensuring that we only generalize *values*, but take a conservative approach and define value as any constant, variable, lambda, or record literal whose elements are all values.\n\nThere are other approaches here, including OCaml's that does a deeper syntactic check to allow nested let bindings, record projection, some lambda applications, etc... Other approaches include changing our evaluation model from eager to lazy, analyzing the bodies of functions being applied to see if there are observable side effects, using an effect system to track effects of expressions, etc... That last one (which Koka employs) is kind of the ultimate solution to the problem, because we get precise tracking at the type-level for expressions that don't perform side effects and can be generalized. However, discussing effect systems is outside the scope of this article, and restricting generalization to syntactic values turns out to not be a problem in practice.\n\n## Aside\n\nStephen Dolan recently took a different approach in [Rethinking the Value Restriction](https://www.youtube.com/watch?v=C1g_PO_xcI8) that supports full rank-1 types in ML, restricts generalization to lambda abstractions, and lazily instantiates foralls only when forced to by application or projection. There is some nuance around covariance annotations and curried functions, but consider taking a look at this talk!\n\nLet's take a look at how we can modify `infer`\n\nto implement the value restriction. We'll start with `is_value`\n\n, a helper that determines if a typed expression is syntactically a value.\n\n``` js\nlet rec is_value (x: texp) : bool =\n    match x with\n    | TEBool _ | TEVar _ | TELam _ -> true\n    | TERecord (rec_lit, _) -> List.for_all rec_lit ~f:(fun (_, fld) -> is_value fld)\n    | TEWith _ | TEApp _ | TEIf _ | TEProj _ | TELet _ | TELetRec _ -> false\n```\n\nThe only interesting case is `TERecord`\n\n, where we check that all of its fields are also values.\n\nNext, let's write `generalize_if_value`\n\n, a helper that wraps `gen`\n\nso we only generalize when the `rhs`\n\nis a value. Otherwise, we fall back to `dont_generalize`\n\n.\n\n``` js\nlet generalize_if_value rhs : generic_ty =\n    if is_value rhs then gen (typ rhs)\n    else dont_generalize (typ rhs)\n```\n\nNow in our `ELet`\n\ncase, the `None`\n\nbranch where we previously called `gen (typ rhs)`\n\nbecomes:\n\n``` php\n| None -> generalize_if_value rhs\n```\n\nThe `ELetRec`\n\ncase gets the same treatment in the `List.map`\n\nthat produces `generalized_bindings`\n\n. The `None`\n\nbranch where we previously called `gen (typ rhs)`\n\nbecomes:\n\n``` php\n| None -> generalize_if_value rhs\n```\n\nPutting this all together, we have implemented the value restriction. If we added `ref`\n\n, `deref`\n\n, and `update`\n\nbuilt-ins, our language would correctly handle mutability.\n\n## Aside\n\nGiven the `is_value`\n\npredicate we defined in this section, the typing rules for let and let-rec are updated to generalize only when the rhs is a value.\n\nLet's take a look at some examples.\n\nYou can find and run these examples in [lib/nine.ml](lib/nine.ml).\n\n[Examples](#examples-9)\n\nWe can simulate the example from before with our type-checker. We add definitions and signatures for `Ref`\n\n, `ref`\n\n, `deref`\n\n, and `update`\n\n. We can't really implement `update`\n\nwithout an actual memory store, so we just return an empty `Unit`\n\nrecord.\n\nBecause of the value restriction, `r`\n\ndoesn't get generalized, and so `update`\n\nknows the type in the ref is just `bool -> bool`\n\n. When we try to apply the dereferenced value to a `Unit`\n\n, it now fails to typecheck as expected.\n\n```\ntypecheck_source {|\n    type Ref 'a = { value : 'a }\n    type Unit = {}\n    let ref : forall 'a. 'a -> Ref 'a = fun x -> { value = x } in\n    let deref : forall 'a. Ref 'a -> 'a = fun r -> r.value in\n    let update : forall 'a. Ref 'a -> 'a -> Unit = fun r -> fun x -> {} in\n    let r = ref (fun x -> x) in\n    let _ = update r (fun x -> if x then false else true) in\n    let u : Unit = {} in\n    deref r u\n|}\n```\n\nOutput: `UnificationFailure \"failed to unify type bool with Unit\"`\n\nAnd here's how it looks being used successfully.\n\n```\ntypecheck_source {|\n    type Ref 'a = { value : 'a }\n    type Unit = {}\n    let ref : forall 'a. 'a -> Ref 'a = fun x -> { value = x } in\n    let deref : forall 'a. Ref 'a -> 'a = fun r -> r.value in\n    let update : forall 'a. Ref 'a -> 'a -> Unit = fun r -> fun x -> {} in\n    let r = ref (fun x -> x) in\n    let _ = update r (fun x -> if x then false else true) in\n    update r (fun x -> false)\n|}\n```\n\nOutput: `Unit`\n\n[Conclusion](#conclusion)\n\nThe features covered in this article already give you a very sound and flexible system to program in, with generics, row polymorphism, side effects, and annotation-less type safety. As mentioned, see the companion repository [https://github.com/smasher164/hm_tut](https://github.com/smasher164/hm_tut) for the source corresponding to each section. In the next post, we'll cover how to implement typeclasses/traits and their various extensions.\n\n[Acknowledgements](#acknowledgements)\n\nI'd like to thank Ashley Chekhova, Russell Johnston, and others for their helpful feedback.", "url": "https://wpnews.pro/news/type-inference-part-1", "canonical_source": "https://www.blog.akhil.cc/type-inference-part-1", "published_at": "2026-06-26 20:58:36+00:00", "updated_at": "2026-06-26 21:05:22.702160+00:00", "lang": "en", "topics": ["machine-learning", "developer-tools"], "entities": ["Damas-Hindley-Milner", "Algorithm J", "OCaml", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/type-inference-part-1", "markdown": "https://wpnews.pro/news/type-inference-part-1.md", "text": "https://wpnews.pro/news/type-inference-part-1.txt", "jsonld": "https://wpnews.pro/news/type-inference-part-1.jsonld"}}