In 1979, Michael Rabin proved that his encryption system could be inverted — so as to decrypt the encrypted message — only if an attacker could factor n. And since this factoring task is computationally hard for any sufficiently large n, Rabin’s encryption scheme was said to be “provably secure” so long as one used a sufficiently large n.
Since then, creating encryption algorithms with this kind of “provable security” has been a major goal of cryptography,1 and new encryption algorithms that meet these criteria are sometimes marketed as “provably secure.”
First, Rabin-style security proofs don’t actually prove security. Instead, they prove that an efficient algorithm for breaking the encryption system would imply an efficient algorithm for solving the underlying computational problem (e.g. integer factorization), which is assumed to be computationally hard. And though these underlying problems are assumed to be computationally hard, they aren’t actually proven to be computationally hard. (Hence, the word “proof” in “security proof” is different from what “proof” usually means in mathematics, where a proof can be deductively traced all the way from the proven mathematical statement to the axioms of the mathematical system being used.)
Second, the system’s formal security requirements might fail to capture everything the attacker can do to break the system, and what information is available to the attacker. For example, it wasn’t long after Rabin (1979) that Ron Rivest (the R in RSA) pointed out4 that if an adversary could trick the user of Rabin’s crypto system to decrypt a cipher text of the adversary’s choosing, this would make it much easier for the adversary to break the whole system. As another example, Bleichenbacher (1998) showed that an early version of the RSA encryption system could be successfully attacked by making intelligent use of the error messages received after failed attacks.
In general, security proofs usually do not account for side channel attacks, which make use of information taken from the crypto system’s physical implementation. For example, a physical computer will inevitably use power and give off electromagnetic radiation when running a crypto system. In some cases, statistical analyses of this power consumption or radiation can be used to break the crypto system.
Third, sometimes, mathematical proofs themselves are wrong, including security proofs. For example, Boldyreva et al. (2007) created a new kind of digital signature (called OMS) which they claimed was more efficient and more secure than other systems with similar functionality. But then Hwang et al. (2009) showed that the “provably secure” OMS protocol could easily be broken. As it turned out, there was an undetected error in Boldyreva et al.’s 4-page proof for their Theorem 5.1.5
In fact, no system can be “provably secure” in the strongest sense, since (1) we can’t be 100% certain that the system’s formal security requirements have been specified properly, and (2) we can’t be 100% certain the security proof itself is without error.
Similarly, some computer systems are labeled “provably safe,” usually because the software has (in whole or in part) been formally verified against formally specified safety criteria.6 But once again, we must remember that (1) we can never be 100% certain the formal safety specification captures everything we care about, and (2) we can never be 100% certain that a complex mathematical proof (formal verification) is without error.
The same reasoning applies to AGI “friendliness.” Even if we discover (apparent) solutions to known open problems in Friendly AI research, this does not mean that we can ever build an AGI that is “provably friendly” in the strongest sense, because (1) we can never be 100% certain that our formal specification of “friendliness” captures everything we care about, and (2) we can never be 100% certain that there are no errors in our formal reasoning. (In fact, the “friendliness specification” problem seems much harder than the “security specification” and “safety specification” problems, since proper friendliness specification involves a heavy dose of philosophy, and humans are notoriously bad at philosophy.)
Thus, the approaches sometimes called “provable security,” “provable safety,” and “provable friendliness” should not be misunderstood as offering 100% guarantees of security, safety, and friendliness.7 Rather, these approaches are meant to provide more confidence than we could otherwise have, all else equal, that a given system is secure, safe, or “friendly.”
Especially for something as complex as Friendly AI, our message is: “If we prove it correct, it might work. If we don’t prove it correct, it definitely won’t work.”8
Humans often want zero-risk solutions, but there aren’t any zero-risk solutions for computer security, safety, or friendliness. On the other hand, we shouldn’t ignore the value of formal proofs. They give us more precision than any other method available, and are a useful complement to (e.g.) testing.9
And when it comes to building self-improving AGI, we’d like to have as much confidence in the system’s friendliness as we can. A self-improving AGI is not just a “safety-critical system,” but a world-critical system. Friendliness research is hard, but it’s worth it, given the stakes.10
- An encryption system is said to be provably secure if its security requirements are stated formally, and proven to be satisfied by the system, as was the case with Rabin’s system. See Wikipedia. ↩↩
- Security reductions can still be useful (Damgård 2007). My point is just that term “provable security” can be misleading, especially to non-experts. ↩↩
- For more details, and some additional problems with the term “provable security,” see Koblitz & Menezes’ Another Look website and its linked articles, especially Koblitz & Menezes (2010). ↩↩
- See Williams (1980). ↩↩
- See Koblitz & Menezes (2010). For additional examples of errors in mathematical proofs, see Branwen (2012); Kornai (2013). ↩↩
- See also Transparency in Safety-Critical Systems. ↩↩
- Because this misunderstanding is so common, MIRI staff try to avoid using phrases like “provably Friendly.” MIRI research fellow Eliezer Yudkowsky is often accused of advocating for “provably friendly” AGI, but I was unable to find an example of Yudkowsky himself using that phrase. If Yudkowsky has used that phrase in the past, it seems likely that he was talking about having certain proofs about the internal structure of the AGI (to give us more confidence in its friendliness than would otherwise be the case), not proofs about its behavior in the physical world.
Added 10-16-2014: A reader pointed me to several passages from Yudkowsky’s writings that could have contributed to this confusion. In an October 2008 post, Yudkowsky used the phrase “provably correct Friendly AI,” but didn’t explain what he meant by the phrase. Yudkowsky also used the phrase “Friendly AI that (provably correctly) self-improves” in an October 2008 blog post comment. In a December 2008 post, Yudkowsky wrote: “Programmers operating with strong insight into intelligence, directly create along an efficient and planned pathway, a mind capable of modifying itself with deterministic precision – provably correct or provably noncatastrophic self-modifications. This is the only way I can see to achieve narrow enough targeting to create a Friendly AI.” And finally, in a September 2005 mailing list comment, Yudkowsky wrote a clearer exposition of what he tends to mean when using the word “provable” in the context of Friendly AI:
No known algorithm could independently prove a CPU design correct in the age of the universe, but with human-chosen *lemmas* we can get machine-*verified* correctness proofs. The critical property of proof in an axiomatic system is not that it’s certain, since the system might be inconsistent for all we know or can formally prove. The critical property is that, *if* the system is consistent, then a proof of ten thousand steps is as reliable as a proof of ten steps. There are no independent sources of failure. I hope that provably correct rewrites, like provably correct CPUs, will be managable if the AI can put forth deductive reasoning with efficiency, tractability, and scalability at least equalling that of a human mathematician. An AI-complete problem? Sure, but let’s not forget – we *are* trying to design an AI.
- This isn’t to say we expect to be able to (or need to) formally verify an entire AGI system. One possibility is that early AGIs will be hierarchical autonomous agents — ala Fisher et al. (2013) — and only the “top” control layer of the agent will be constructed to be amenable to formal correctness proofs of one kind or another. ↩↩
- See also Kornai (2012); Muehlhauser (2013); Damgård (2007). ↩↩
- My thanks to Louie Helm and Eliezer Yudkowsky for their feedback on this post. ↩↩