What’s the big deal about proof assistants? Why would anyone bother to write an article about them? These are good questions. In this post, I will try to answer them.
A proof assistant is a computer program that checks a mathematical proof for correctness. In order for this to happen, the proof must be formalised, that is, written into the computer in such a way that the program can read it and act on it. The proof assistant program then checks that the argument of the proof is valid: that each step logically follows from the ones before. The point is that this process can mechanically check of the correctness of a mathematical proof. This would be especially useful in cases where large numbers of details or computations make it highly likely for human error to creep in.
This table, taken from Wiedijk [2007], lists some currently used proof assistants:
proof assistant proof style
HOL Light procedural
Mizar declarative
ProofPower procedural
Isabelle both procedural & declarative possible
Coq declarative
The proof style refers to how the program is used for proof-checking. In a declarative system, one writes the proof in a highly formal language with very small steps of inference and inputs it to the program. In a procedural system, instead of writing the proof explicitly, the user interacts with the proof assistant by executing commands that reduce proof goals to zero or more new and hopefully simpler subgoals (Wiedijk [2007]).
There is something irresistible about the idea of a complete formalization of mathematics. One would have absolute certainty of the correctness of a result, verified by objective, mechanical procedures rather than by the mistake-prone human mind (no more pesky sign-omission errors!). There is an aesthetic value to an elegant and powerful computer script that can deal with the most complicated proofs. One would have the freedom to leave tedious computations, or checkings of cases, to a machine and instead use that time to dream up lofty new mathematical ideas.
However, a number of potential caveats quickly arise: how long will it take to learn to use one of these programs (compared to, say, learning to use LaTeX to typeset mathematical formulae)? What are the chances of a significant bug somewhere in the program? Even if a proof is correct, or checked, surely it matters that someone actually understands why it’s true? A vague yet persistently niggling memory of some “Incompleteness” theorem or other also contributes to the sense of unease. Let’s investigate…
1. A Very Brief History Of Formalization and Computer-Assisted Proof
Logic was developed in ancient Greece, China, and India as the science of valid inference and correct argumentation. Plato (428-347 BC) raised the question “What can properly be called true or false?”. Arguably, Aristotle (384-322 BC) provided the foundation for the formal logic we use today with his theory of the syllogism. Gottfried Liebniz (1646-1716), who, amongst other significant achievements, developed calculus independently of Newton and who is also noted for his philosophical optimism, imagined a universal, formal language (characteristica universalis) that could express all mathematical, physical, or metaphysical arguments. Any dispute could in principle be translated into the characteristica and upon logical calculation the true assertion would be revealed, thus settling the dispute. The very great difficulty of this project eventually became apparent, and indeed for much of the 20th century such formalization was an impossible goal even for mathematics, that most rigorous of subjects. Indeed, members of the Bourbaki group abandoned formalized mathematics in their writings because even the smallest proof “would already require several hundreds of signs for its complete formalization” (see Avigad & Harrison [2014], p. 67).
In the later 20th century, technological advancements began to affect mathematical research. The first “proof by computation” came in the form of the Four Colour Theorem (FCT). The theorem asserts that if a plane (2D flat surface) is divided up into different regions, as a map of the world might be divided into countries, no more than four colours are ever needed to colour the map so that no two adjacent regions (or countries) have the same colour. Appel and Hakken proved the theorem in 1976 by using a computer to check through nearly 2,000 possible map configurations—something that could not feasibly have been done by hand. The approach aroused controversy and many mathematicians did not accept the result, precisely because it was infeasible for humans to check the proof by hand. And so the discussion of the philosophical implications of computer-assisted proof began (for example, see Tymoczko [1979]). More recent proofs of the FCT have simplified the Appel-Hakken approach, but all fundamentally rely on computers. Notably, the theorem was proved again in 2005 by Gonthier, using the proof assistant Coq.
Advances in logic and computer science in the 20th century also enabled another role for computers in mathematics. Early computational proof systems, the precursors of today’s proof assistants, appeared by the 1970s, but their primary purpose was the formalized writing of mathematical proofs rather than true computer assistance. The declarative Mizar system, introduced in 1973, provided significant computational assistance and is still in use today. In the 1980s, further influential, procedural, systems were introduced, including HOL and Coq, and Isabelle arrived in the 1990s. By 2005, a number of important theorems, including Goedel’s First Incompleteness Theorem, the Jordan Curve Theorem, and the prime number theorem, had seen themselves formalized and verified in at least one of these systems. Today, in addition to much activity in proving new theorems, there is great interest in the theory that underlies automated proof checking. For example the Univalent Foundations project, which developed out of the axiomatic basis for the proof assistant Coq, analyzes foundations of mathematics based on homotopy type theory rather than set theory (for more, below).
Formal verification by proof assistant is perhaps the closest we’ve ever come to Liebniz’s ideal of the characterista universalis. However, presumably Liebniz had imagined that the calculations would be performed by humans and hence would be understandable to humans—a property that modern computer-assisted proofs generally cannot fulfil.
2. The Good
Many mathematicians are excited about formally verified mathematics and what proof assistants can do—and this enthusiasm is well-founded. It would be wonderful to know that there are no mistakes in an important proof, especially if one is not able to check it oneself due to time or expertise. It would be very useful to leave the filling-in of tedious details in a long proof to a machine guaranteed to perform its function correctly and not get bored. We could build huge digital libraries of mathematical results (oracles) that could be used for quick access.
To many people working in the intersection of mathematics and computer science, it is also fundamentally fascinating to explore how far mathematics can be formalized and proofs automated. The more we explore this frontier, the more we learn about mathematics itself. For example, as mentioned above, the Univalent Foundations of Mathematics project arose from the axiomatic basis of the proof assistant Coq. Using type theory combined with homotopy theory and constructive logic, Univalent Foundations seeks to simplify certain mathematical concepts in a way not achievable in set theory, the current generally accepted foundation of mathematics. A Special Year on Univalent Foundations was held in 2012-2013 at the Institute of Advanced Study, Princeton, and the status of the Univalence Axiom is an active point of research in many university departments. Certainly, deep insights into mathematics, logic, knowledge representation and even artificial intelligence will arise from research into automated formal verification.
3. The Bad
There are number of difficulties that a full formalization of mathematics associated with widespread use of proof assistants would need to overcome. Many arise from the fact that currently we must formalise our proofs ourselves because proof assistants cannot (yet) read in mathematical papers directly. The following issues immediately spring to mind:
(i) A question of reliability: Large programs contain bugs. This has been a basic fact of software engineering since 1947. Does the use of proof assistants simply place the burden of accuracy onto potentially buggy programs, rather than potentially “buggy” proofs (or reviewers)? In other words, why should we believe that mistakes in proof assistants should arise significantly less often than the old sorts of mistakes?
(ii) A related question is whether the introduction of typographical mistakes in the formalization of a proof for inputting it into a program will end up being just as annoying a source of error as typos are in normal proofs. Does this simply move the burden of accuracy to a different aspect of the proof process, without decreasing said burden? Will the burden of the peer-reviewer not only include checking the proof but also checking the formalisation?
(iii) A question of efficiency: Bourbaki abandoned formalization because the length of formal proofs of the simplest statements quickly ran into hundreds of symbols. How can the underlying construction of proof assistant systems ensure that the length of the formalization does not grow exponentially, or by some other impractical function, with the length of the normal proof? [Note: this issue is under active research. Under the assumption of a constant multiplicative loss, the de Brujn factor is the ratio of the length of a formalization compared to the length of the original proof. Preliminary results suggest this value is around 4.]
(iv) More generally, why should we believe that the human-hours (or any other measure of effort) taken to formalize and verify a non-trivial proof with a proof assistant would, in the general case, out-weight the human-hours required to simply check the proof carefully by enough mathematicians, and enough times, to reach an equivalent level of certainty? What if we add in the human-hours required to develop sufficient expertise with least one proof assistant program?
(v) A question of philosophy: What’s the point of mathematics? Most mathematicians would say something about beautiful ideas. In an article entitled “Proof and progress in mathematics”, W.P. Thurston wrote “The measure of [mathematicians’] success is whether what we do enables people to understand and think more clearly and effectively about mathematics” [1994] (original’s italics). Will this aim remain at the forefront of mathematical practice during a time of ultraformalization?
Some of these caveats are already solved, openly discussed, or under active research. For example, even the most enthusiastic proponents of formalized mathematics do not claim absolute certainty from the results of formalization—merely a reliability “orders of magnitude larger than if one had just used human minds” (Wiedijk [2007], p. 1). And at least the problem of incompleteness can be avoided by requiring that proof assistants only check proof ideas given by a person, thereby reducing the risk of spinning into an endless loop. However, it is still an open question how long the other issues will remain significant limiting factors on the widespread use of proof assistants in everyday mathematical practice.
4. The Ug…
The pursuit of formally verified mathematics throws up deep and exciting questions, but also potentially faces severe limitations. In some articles in this blog, I will discuss aspects of proof assistants and mathematical formalization, highlighting particularly interesting or controversial ideas where they appear. In the next post, I’ll look more closely at Wiedijk’s [2007] paper “Formal proof – getting started”, which investigates how the Law of Quadratic Reciprocity, a deep theorem of elementary number theory, is formalized in two different proof assistants: Mizar and HOL Light.
REFERENCES
Avigad, J. and Harrison, J. Formally verified mathematics. Communications of the ACM, Vol. 57, No. 4, pp. 66-75. 2014.
Thurston, W.P. On proof and progress in mathematics. Bulletin of the AMS, Vol. 30, No. 2, pp. 161-177. 1994.
Tymoczko, T. The four-color problem and its philosophical significance. The Journal of Philosophy, Vol. 76, No. 2, pp. 57-83. 1979.
Homotopy Type Theory: Univalent foundations of mathematics. The Univalent Foundations Program, 2013. http://homotopytypetheory.org/book/.
Wiedijk, F. Formal proof – getting started. Notices of the AMS, Vol. 55, No. 1, pp. 1408-1414. 2007.