{"slug": "50-years-of-proof-assistants", "title": "50 Years of Proof Assistants", "summary": "The first LCF-style proof assistant, Edinburgh LCF, was introduced in 1975, establishing the foundational principles of a proof kernel, natural deduction, and goal-directed proof that underpin modern systems like Isabelle, Coq, and Lean. This milestone counters claims of scientific stagnation by demonstrating 50 years of continuous academic and government-funded progress in formal verification, a field that has advanced from proving simple parsing algorithms to enabling complex mathematical formalizations.", "body_md": "## 50 years of proof assistants\n\n[\n\nmemories\n\nLCF\n\nHOL system\n\nIsabelle\n\nCoq\n\nLean\n\nMJC Gordon\n\n[]](/tag/Standard_ML)\n\nStandard ML\n\nCrackpots ranging from billionaire Peter Thiel to random YouTube influencers claim that science has been stagnating for the past 50 years. They admit that computing is an exception: they don’t pretend that my personal 32GB laptop is not an advance over the 16MB mainframe that served the whole Caltech community when I was there. Instead they claim that advances in computing were driven solely by industrial research, quite overlooking the role of academia\nand government funding\nin pushing the VLSI revolution, RISC processor design, networking, hypertext, virtual memory and indeed computers themselves. As for the industrial research,\nmost of it came from just two “blue sky” institutes – [Bell Labs](https://sites.stat.columbia.edu/gelman/research/published/bell.pdf)\nand [Xerox PARC](https://spectrum.ieee.org/xerox-parc) – that closed a long time ago.\nLCF-style proof assistants are a world away from mainstream computing,\nso let’s look at 50 years of progress there.\n\n### 1975–1985: Edinburgh LCF\n\nThe first instance of LCF was Stanford LCF, developed by Robin Milner in 1972, but it was **not** an LCF-style proof assistant! LCF meant “Logic for Computable Functions”, a quirky formalism based on Scott domains and intended for reasoning about small functional programs. But “LCF-style proof assistant” means one that, like Edinburgh LCF, was coded in some form of\nthe ML programming language and provided a proof kernel,\nencapsulated in an abstract type definition, to ensure that a theorem could only be generated\nby applying inference rules to axioms or other theorems:\n\n… the ML type discipline is used… so that—whatever complex procedures are defined—all values of type\n\n`thm`\n\nmust be theorems, as only inferences can compute such values…. This security releases us from the need to preserve whole proofs… — an important practical gain since large proofs tended to clog up the working space… [Edinburgh LCF, page IV]\n\nEdinburgh LCF was first announced in 1975, which conveniently is exactly 50 years ago,\nat the almost mythical conference on *Proving and Improving Programs* held at Arc-et-Senans.\nThe [user manual](https://link.springer.com/book/10.1007/3-540-09724-4), published in the Springer lecture notes series, came out in 1979.\nEdinburgh LCF introduced some other principles that people still adhere to today:\n\n- inference rules in the\n*natural deduction*style, with a dynamic set of assumptions - a\n*goal-directed*proof style, where you start with the theorem statement and work backwards - a structured system of\n*theories*to organise groups of definitions\n\nEdinburgh LCF had its own version of the ML language.\nIt supported a fragment of first-order logic containing\nthe logical symbols $\\forall$, $\\land$ and $\\to$ along with\nthe relation symbols $\\equiv$ and $\\sqsubseteq$.\nIt introduced proof tactics and also *tacticals*:\noperators for combining tactics.\nTactics supported goal-directed proof,\nbut Edinburgh LCF had no notion of the current goal or anything to help the user manage the tree of subgoals.\nIts user interface was simply the ML top level and the various theorem-proving primitives were simply ML functions.\nML stood for *metalanguage*, since managing the process of proof was its exact job.\n\nAvra Cohn and Robin Milner wrote a [report](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-20.html)\non proving the correctness of a parsing algorithm\nusing Edinburgh LCF.\nThe proof consists of one single induction followed by\na little simplification and other reasoning.\nThe report includes a succinct description of Edinburgh LCF.\nIt is a nice snapshot of the state of the art in 1982,\nwhen I arrived in Cambridge to join a project run by Robin Milner and Mike Gordon.\nFull of youthful enthusiasm, I told Mike that it would be great\nif one day we could formalise the Prime Number Theorem.\nI hardly knew what the theorem was about or how to prove it,\nbut my college roommate had told me it was really deep.\n\nDisappointed to discover that we only had $\\forall$, $\\land$ and $\\to$,\nI set out to fix that, to support full first-order logic.\nI ended up changing so much\n(backwards compatibility is overrated) that people eventually shamed me into writing my own [user manual](https://www.cambridge.org/gb/universitypress/subjects/computer-science/programming-languages-and-applied-logic/logic-and-computation-interactive-proof-cambridge-lcf).\nCambridge LCF never caught on because, well,\nnobody liked the LCF formalism.\nBut I used it for a development that seemed big at the time: to [verify the unification algorithm](https://doi.org/10.1016/0167-6423(85)90009-7).\nThis development was later [ported to Isabelle](https://isabelle.in.tum.de/dist/library/HOL/HOL-ex/Unification.html).\nIt contains 36 inductions, so we were making progress.\nAnd this takes us to 1985, exactly 40 years ago;\nsee also [this survey](https://doi.org/10.48456/tr-54) of the state of play.\nBut there was almost no mathematics: no negative numbers and no decimal notation, so you could not even write 2+2=4.\nAs far as the broader computer science community was concerned, we were a joke.\n\n### 1985–1995: Cambridge LCF and HOL\n\nCambridge LCF was in itself a dead end, but because it included a much faster ML compiler,\nit ended up [being incorporated](/2022/09/28/Cambridge_LCF.html) into a lot of other proof assistants, notably Mike’s [HOL88](https://github.com/theoremprover-museum/HOL88).\nAnd just like that, [hardware verification](/2023/01/04/Hardware_Verification.html) became a reality.\nAlthough software verification seemed stuck in the doldrums,\na couple of production-ready chip designs were verified!\nMike’s explanation was that hardware verification was simply easier.\n\nAlso in 1985, we got a new [standard for the ML language](https://doi.org/10.1145/3386336)\nand, soon, two compilers for it.\nSo then I started working on experiments that would\n[lead to Isabelle](/2022/07/13/Isabelle_influences.html).\nIt would be like LCF but would support constructive type theory,\ncrucially allowing both unification and backtracking, like in Prolog.\nBut there was no working system yet, just a grant application.\nAnd that was the state of play 40 years ago.\n\nFunding secured, Isabelle development started in earnest in 1986.\nIt was coded in [Standard ML](https://www.lfcs.inf.ed.ac.uk/software/ML/) from the start, while HOL88 was ported from the Cambridge LCF version of ML\nto Standard ML, emerging as HOL90.\nMike acquired a bevy of energetic PhD students,\nwho engaged in verification projects or built extensions for HOL.\nVersions of HOL were being used in institutes around the world.\n\nStepping aside from HOL for a moment, other proof assistants had made great progress\nby the mid 1990s.\nThe addition of inductive definitions to the calculus of constructions\ngave us the [calculus of inductive constructions](https://rdcu.be/eR7e8),\nwhich in essence is the formalism used today by Rocq and Lean.\nThe very first release of Isabelle/HOL [happened in 1991](https://rdcu.be/eR7gl),\nprimarily the work of Tobias Nipkow, though I was soon to\n[join in](https://www.cl.cam.ac.uk/~lp15/Grants/holisa.html).\nIsabelle/ZF, which was my pet project, formalised axiomatic set theory\nto some [quite deep results](https://arxiv.org/abs/cs/9612104).\n\nBut I am still not certain whether negative numbers were supported (can somebody help me?).\nOur weak support for arithmetic may seem odd\nwhen our research community was aware that the real numbers\nhad been [formalised in AUTOMATH](/2022/06/22/Why-formalise.html),\nbut we didn’t seem to want them.\nTo many, we were still a joke. This was about to change.\n\n### 1995–2005: Proof assistants come of age\n\nIn 1994, came the Pentium with its [FDIV bug](https://www.techradar.com/news/computing-components/processors/pentium-fdiv-the-processor-bug-that-shook-the-world-1270773):\na probably insignificant but detectable error in floating-point division.\nThe subsequent product recall cost Intel nearly half a billion dollars.\nJohn Harrison, a student of Mike’s, decided to devote his PhD research\nto the verification of floating-point arithmetic.\nBy June 1996 he had submitted an extraordinary [thesis](https://doi.org/10.48456/tr-408),\n*Theorem Proving with the Real Numbers*,\nwhich described a formidable series of achievements:\n\n- a formalisation of the real member system in HOL\n- formalised analysis including metric spaces, sequences and series, limits, continuity and differentiation, power series and transcendental functions, integration\n- proper numerals represented internally by symbolic binary, and calculations on them\n- computer algebra techniques including a decision procedure for real algebra\n- tools and techniques for floating-point verification by reference to the IEEE standard\n\nThis thesis, which I had the privilege to examine, won a Distinguished Dissertation Award\nand was [published as a book](https://link.springer.com/book/10.1007/978-1-4471-1591-5) by Springer.\nSo by the middle of the 1990s, which was 30 years ago,\nwe had gone from almost no arithmetic to a decent chunk of formalised real analysis\nthat was good enough to verify actual floating-point algorithms.\n\nThis period also saw something of an arms race in automation.\nMy earlier, Prolog-inspired vision of backtracking search\nhad led to some [fairly general automation](https://doi.org/10.48456/tr-396) that was effective not just in standard predicate logic\nbut with any theorems were expressed in a form suitable for forward or backward chaining.\nI had also done experiments with classical automatic techniques such as model elimination, which, although pathetic compared with automatic provers of that era,\nwas good enough to troll users on the `hol-info`\n\nmailing list.\nSoon I had provoked John Harrison to build a superior version of ME for HOL Light.\nLater, Joe Hurd built his `metis`\n\nsuperposition prover, which found its way into HOL4.\nNot to be outdone, Tobias made Isabelle’s simplifier the best in its class incorporating a number of sophisticated refinements, including some great ideas from Nqthm.\n\nTwenty years from the start of this chronology we now had\nseveral reasonably mature and powerful systems, including Isabelle/ZF, Isabelle/HOL,\nmultiple versions of the HOL system, and Coq (now Rocq). 1\nMany of them used\n\n[Proof General](https://proofgeneral.github.io), a common user interface for tactic-based proof assistants based on the Emacs editor. And we had 100MHz machines, some with 64MB of memory! We were ready to do big things.\n\nDuring this period, I did a lot of work on the\n[verification of cryptographic protocols](https://doi.org/10.3233/JCS-1998-61-205),\nalso [here](https://doi.org/10.48550/arXiv.2105.06319).\nThese secure Internet connections and other network communications;\nthey are valuable when you need to know who is on the other end\nand need to keep messaging secure from eavesdropping and tampering.\nAmong the protocols investigated were the ubiquitous TLS\nand the late, unlamented SET protocol.\nThese proofs were not at the level of code or bits;\nbuggy implementations could and did emerge.\n\nIn 2005, the big thing that caught everyone’s eye\nwas [George Gonthier’s formalisation](https://rdcu.be/eSgTy) (in Coq)\nof the Four Colour Theorem.\nMost educated people had heard of the theorem already,\nand its history is fascinating:\nnumerous proofs had been attempted and rejected since the mid 19th century.\nThe 1977 proof by Appel and Haken was questioned\nbecause it relied on a lot of ad-hoc computer code.\nSuddenly, despite the still unwelcome involvement of computers,\nno one could doubt the theorem anymore.\n\nAt the opposite extreme was [my own formalisation](https://doi.org/10.1112/S1461157000000449) of Gödel’s proof of the relative consistency of the axiom of choice in Isabelle/ZF.\nThis was the apex of my ZF work, technically difficult but incomprehensible to most people.\nMy early dream of having a formalisation of the Prime Number Theorem came true in 2005\nwhen Jeremy Avigad [formalised](https://arxiv.org/abs/cs/0509025) the theorem in Isabelle.\nSomewhat later, John Harrison [formalised a different proof](https://rdcu.be/eShga) in HOL Light.\nAnd there was much more. Without any doubt, our systems were capable of serious mathematics.\n\nPerhaps the most consequential achievement of this period was Mike Gordon’s collaboration\nwith Graham Birtwistle and Anthony Fox to [verify the ARM6 processor](https://rdcu.be/eShzn).\nGraham, at Leeds, formally specified the instruction set architecture of the processor\n(i.e. the assembly language level), while Mike and Anthony at Cambridge verified the implementation of that architecture in terms of lower level hardware components.\nEventually a [number of other processors](https://doi.org/10.1145/3290384) were similarly specified,\nand some verified.\nWithout any doubt, our systems were capable of serious verification.\n\nDespite of the focus on applications in this section, system development continued in the run-up to 2005. I am only familiar with Isabelle development, but they were tremendous:\n\n- the\n*Isar language*for structured, legible proofs (a break with the LCF idea that the top level must be a programming language, i.e. ML) *axiomatic type classes*, providing principled overloading*counterexample finders*:[Quickcheck](https://doi.org/10.1109/SEFM.2004.1347524)and Refute (now Nitpick)*code generation*from the executable fragment of higher-order logic, and reflection*sledgehammer*was under active development, but only ready a couple of years later.\n\nWith so much going on, it’s not surprising that our community started doing big things, and other people were starting to notice.\n\n### 2005–2015: The first landmarks\n\nI am not used to phone calls from journalists:\nfor most of my career, formal verification has been seen as (at best) niche.\nBut the journalist on the end of the line was asking for information about\n[seL4](https://doi.org/10.1145/1629575.1629596),\nthe first operating system kernel ever to be formally verified.\nTools for extended static checking were by then able to detect a lot of program faults, but the seL4 verification claimed to cover *full functional correctness*:\nthe code did exactly what it was supposed to do.\nThere is now an [entire ecosystem](https://sel4.systems) around seL4,\nbacked by a million lines of Isabelle/HOL proofs.\n\nPeople have wanted to verify compilers\n[since forever](https://doi.org/10.1007/3-540-10886-6).\nThe task of fully specifying a programming language, target machine\nand compiler already seemed impossible, let alone providing the actual proof.\nWith [CompCert](https://inria.hal.science/hal-01238879v1), that task was finally fulfilled, for a large subset of the C language:\n\nWhat sets CompCert apart from any other production compiler, is that it is formally verified, using machine- assisted mathematical proofs, to be exempt from mis- compilation issues. In other words, the executable code it produces is proved to behave exactly as specified by the semantics of the source C program.\n\nA seemingly intractable problem with compiler verification\nwas how to translate your verified compiler into machine code.\nFor example, CompCert is mostly written in Rocq,\nwhich is then extracted to OCaml code.\nThe OCaml compiler had never been verified,\nso how do we know that its compiled code is correct?[2](#fn:2)\n\n[CakeML](https://cakeml.org) squares this circle through [bootstrapping](https://doi.org/10.1145/3437992.3439915).\nCakeML translates from its source language (a dialect of ML)\nto assembly language, accompanied by a proof that the two pieces of code are equivalent.\nThis work was an outgrowth of the ARM6 project mentioned earlier.\n[Magnus Myreen](https://www.cse.chalmers.se/~myreen/)\nhad [developed techniques](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-765.html) for\nautomatically and verifiably translating between assembly language\nand recursive functions in higher-order logic, in both directions.\nAt the start of the bootstrapping process,\na tiny compiler was written in pure logic and proved correct.\nIt was now safe to run this compiler\nand use its tiny language to implement a bigger language.\nThis process ultimately produced a verified compiler in both source form\nand assembly language form, with a proof of their equivalence,\nas well as [verified extraction](https://doi.org/10.1145/2364527.2364545) from higher-order logic to ML.\n\nThe end of the decade also saw impressive results in the formalisation of mathematics:\n\n[Gödel second incompleteness theorem](https://rdcu.be/eSZwv), by yours truly, in Isabelle/HOL- the\n[Central Limit Theorem](https://arxiv.org/abs/1405.7012), by Avigad et al., ditto - the\n[Flyspeck](https://github.com/flyspeck/flyspeck)project, by Hales et al., in Isabelle/HOL and HOL Light - the\n[odd order theorem](https://doi.org/10.1145/2480359.2429071), in Rocq\n\nWithout going into details here, each of these was an ambitious proof, combining in various ways deep mathematics, intricate technicalities and sheer bulk. Our community was proud of our achievements. We were no longer a joke, but what exactly we were good for?\n\n### 2015–2025: Breaking through\n\nThis period brought something astonishing: acceptance of proof assistants by many mainstream mathematicians. I mostly recall mathematicians regardeding computers with something close to contempt. Even some logicians regarded formalised mathematics as impossible, somehow fixating on Gödel’s incompleteness or that notorious proof of 1+1=2 on page 360. Regarding my work formalising big chunks of ZF theory, someone commented “only for finite sets obviously”.\n\nMy EU-funded [ALEXANDRIA](https://www.cl.cam.ac.uk/~lp15/Grants/Alexandria/) project started in 2017.\nMy team formalised more advanced and deep mathematics\nthan I ever imagined to be possible, using Isabelle/HOL.\n(I have told this story in an [earlier blogpost](/2023/08/31/ALEXANDRIA_finished.html).)\nBut ALEXANDRIA alone would not have had much of an impact on mathematical practice.\nWhat made a difference was [Kevin Buzzard](https://xenaproject.wordpress.com/what-is-the-xena-project/) and his enthusiastic, tireless promotion of the idea of formalising mathematics\nin [Lean](https://lean-lang.org).\nHe recruited a veritable army.\nI got the idea of blogging from him, but my blog has not had the same impact. Where are you guys?\n\nIn 2022, for the first time ever, machine assistance\nwas [used to confirm](https://leanprover-community.github.io/blog/posts/lte-final/)\nbrand-new mathematics that a Fields Medallist had concerns about.\nMathematicians will for the most part continue to work the way they always have done,\nbut proof assistants are getting better and better,\nand they will encroach more and more on the everyday practice of mathematics.\n\nMeanwhile, Isabelle continued to be useful for verification.\nI was amazed to hear that that the systems group here in the Computer Lab\nhad completed a [major verification](https://doi.org/10.1145/3133933) using Isabelle/HOL.\nThe tradition is for systems people to despise verification tools\nfor sweeping aside ugly things like overflow and floating point errors, even though they no longer do.\nBesides, a research tool like Isabelle is only used by its own developer and his students.\nTimes were changing.\n\nIsabelle is also one of the several proof assistants involved\nwith [CHERI](https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/), a large-scale project\nreviving the old idea of *capabilities* to ensure security at the hardware level.\nCHERI has produced numerous publications, some of which\n(for example [this one](https://doi.org/10.1007/978-3-030-99336-8_7)\nand [that one](https://doi.org/10.1109/SP40000.2020.00055)) describe very large proofs.\nThese concern the design and implementation of novel computer architectures\nwith fine-grained memory protection,\nand a design process with formal verification at its heart.\n\nIsabelle has also contributed to the design of [WebAssembly](https://webassembly.org),\na relatively new platform for web applications.\nBy subjecting the WebAssembly specification to [formal scrutiny](https://doi.org/10.1145/3167082),\nConrad Watt was able to identify a number of issues in time for them to be fixed.\n\nFinally, I’d like to mention this announcement (4 December 2025) by Dominic Mulligan of Amazon Web Services (AWS):\n\nOver three years, lots of hard work, and 260,000 lines of Isabelle/HOL code later, the Nitro Isolation Engine (NIE)\n\n[is finally announced]alongside Graviton5.Working with our colleagues in EC2, Annapurna, and AWS AppSec, we have been working to rearchitect the Nitro system for Graviton5+ instances around a small, trusted separation kernel. Written from scratch in Rust, we have additionally specified the behaviour of a core subset of the Nitro Isolation Engine kernel, verified that the implementation meets this specification, and additionally proved deep security properties—confidentiality and integrity—of the implementation.\n\nI am biased, since I’ve been working with AWS on [this exact project](https://www.youtube.com/watch?v=hqqKi3E-oG8), but it’s a big deal.\nAWS has been using formal verification tools for a considerable time.\nA notable earlier accomplishment was verify tricky but efficient algorithms using HOL Light,\n[speeding up](https://www.amazon.science/blog/formal-verification-makes-rsa-faster-and-faster-to-deploy)\nRSA encryption by a massive factor.\n\n### 2025–2035 Becoming ordinary\n\nA couple of months ago, Apple announced new models in their iPhone range, but no crowds formed around Apple Stores. They once did: the iPhone was once regarded as revolutionary. Now, smartphones are a commodity, which is the final stage of a new technology. Formal verification is not ordinary yet. But it’s coming: more and more software will be seen as too important to develop any other way, as is already the case for hardware.\n\n### Postscript\n\nI am well aware that there is much outstanding work adjacent to that described here, e.g. using other interactive tools, such as Nqthm and ACL2, PVS and Agda, and much else using Rocq. There have been amazing advances in the broader theorem proving world, also in model checking, SAT/SMT solving and their applications to extended static checking of software. I have related what I personally know. And remember, the point of this post is not (simply) to boast but to demonstrate the progress of our research community, so the more achievements the better. Feel free to add some in the comments!\n\nThis post does not prove anything about other fields of science, such as solid-state physics, molecular biology or mathematics. But it’s fair to assume that such fields have not been idle either. People have proved Fermat’s Last Theorem and the Poincaré conjecture, and settled more obscure questions such as the projective plane of order 10. People have located the remains of King Richard III, who died in 1485, excavating and positively identifying the body by its DNA. People have linked a piece of bloody cloth to Adolf Hitler and diagnosed that he had a specific genetic condition. The immensely complex James Webb Space Telescope was successfully deployed; it is now revealing secrets about the early Universe.\n\nSometimes I wonder about the motives of those who claim that science is moribund. Do they have political aims, or just unrealistic expectations? Were they expecting time travel or some sort of warp drive? People need to remember that movies are fiction.\n\n-\nCool things were also done in\n\n[LEGO](https://era.ed.ac.uk/handle/1842/504), another type theory proof assistant, but sadly it soon fell by the wayside. And they were sued by some crazy guys from Billund.[↩](#fnref:1) -\nIn fact, the correctness of CompCert is delicate for\n\n[a number of reasons](https://doi.org/10.1007/978-3-030-99336-8_8).[↩](#fnref:2)", "url": "https://wpnews.pro/news/50-years-of-proof-assistants", "canonical_source": "https://lawrencecpaulson.github.io/2025/12/05/History_of_Proof_Assistants.html", "published_at": "2026-05-26 14:31:17+00:00", "updated_at": "2026-05-26 14:38:22.909166+00:00", "lang": "en", "topics": ["ai-research"], "entities": ["Peter Thiel", "Robin Milner", "Bell Labs", "Xerox PARC", "LCF", "HOL system", "Isabelle", "Coq"], "alternates": {"html": "https://wpnews.pro/news/50-years-of-proof-assistants", "markdown": "https://wpnews.pro/news/50-years-of-proof-assistants.md", "text": "https://wpnews.pro/news/50-years-of-proof-assistants.txt", "jsonld": "https://wpnews.pro/news/50-years-of-proof-assistants.jsonld"}}