# Mathematical Proofs Improve But Don’t Guarantee Security, Safety, and Friendliness

|   |  Analysis

In 1979, Michael Rabin proved that his encryption system could be inverted — so as to decrypt the encrypted message — only if an attacker could factor n. And since this factoring task is computationally hard for any sufficiently large n, Rabin’s encryption scheme was said to be “provably secure” so long as one used a sufficiently large n.

Since then, creating encryption algorithms with this kind of “provable security” has been a major goal of cryptography,1 and new encryption algorithms that meet these criteria are sometimes marketed as “provably secure.”

Unfortunately, the term “provable security” can be misleading,2 for several reasons3.

First, Rabin-style security proofs don’t actually prove security. Instead, they prove that an efficient algorithm for breaking the encryption system would imply an efficient algorithm for solving the underlying computational problem (e.g. integer factorization), which is assumed to be computationally hard. And though these underlying problems are assumed to be computationally hard, they aren’t actually proven to be computationally hard. (Hence, the word “proof” in “security proof” is different from what “proof” usually means in mathematics, where a proof can be deductively traced all the way from the proven mathematical statement to the axioms of the mathematical system being used.)

Second, the system’s formal security requirements might fail to capture everything the attacker can do to break the system, and what information is available to the attacker. For example, it wasn’t long after Rabin (1979) that Ron Rivest (the R in RSA) pointed out4 that if an adversary could trick the user of Rabin’s crypto system to decrypt a cipher text of the adversary’s choosing, this would make it much easier for the adversary to break the whole system. As another example, Bleichenbacher (1998) showed that an early version of the RSA encryption system could be successfully attacked by making intelligent use of the error messages received after failed attacks.

In general, security proofs usually do not account for side channel attacks, which make use of information taken from the crypto system’s physical implementation. For example, a physical computer will inevitably use power and give off electromagnetic radiation when running a crypto system. In some cases, statistical analyses of this power consumption or radiation can be used to break the crypto system.

Third, sometimes, mathematical proofs themselves are wrong, including security proofs. For example, Boldyreva et al. (2007) created a new kind of digital signature (called OMS) which they claimed was more efficient and more secure than other systems with similar functionality. But then Hwang et al. (2009) showed that the “provably secure” OMS protocol could easily be broken. As it turned out, there was an undetected error in Boldyreva et al.’s 4-page proof for their Theorem 5.1.5

In fact, no system can be “provably secure” in the strongest sense, since (1) we can’t be 100% certain that the system’s formal security requirements have been specified properly, and (2) we can’t be 100% certain the security proof itself is without error.

Similarly, some computer systems are labeled “provably safe,” usually because the software has (in whole or in part) been formally verified against formally specified safety criteria.6 But once again, we must remember that (1) we can never be 100% certain the formal safety specification captures everything we care about, and (2) we can never be 100% certain that a complex mathematical proof (formal verification) is without error.

The same reasoning applies to AGIfriendliness.” Even if we discover (apparent) solutions to known open problems in Friendly AI research, this does not mean that we can ever build an AGI that is “provably friendly” in the strongest sense, because (1) we can never be 100% certain that our formal specification of “friendliness” captures everything we care about, and (2) we can never be 100% certain that there are no errors in our formal reasoning. (In fact, the “friendliness specification” problem seems much harder than the “security specification” and “safety specification” problems, since proper friendliness specification involves a heavy dose of philosophy, and humans are notoriously bad at philosophy.)

Thus, the approaches sometimes called “provable security,” “provable safety,” and “provable friendliness” should not be misunderstood as offering 100% guarantees of security, safety, and friendliness.7 Rather, these approaches are meant to provide more confidence than we could otherwise have, all else equal, that a given system is secure, safe, or “friendly.”

Especially for something as complex as Friendly AI, our message is: “If we prove it correct, it might work. If we don’t prove it correct, it definitely won’t work.”8

Humans often want zero-risk solutions, but there aren’t any zero-risk solutions for computer security, safety, or friendliness. On the other hand, we shouldn’t ignore the value of formal proofs. They give us more precision than any other method available, and are a useful complement to (e.g.) testing.9

And when it comes to building self-improving AGI, we’d like to have as much confidence in the system’s friendliness as we can. A self-improving AGI is not just a “safety-critical system,” but a world-critical system. Friendliness research is hard, but it’s worth it, given the stakes.10

1. An encryption system is said to be provably secure if its security requirements are stated formally, and proven to be satisfied by the system, as was the case with Rabin’s system. See Wikipedia
2. Security reductions can still be useful (Damgård 2007). My point is just that term “provable security” can be misleading, especially to non-experts.
3. For more details, and some additional problems with the term “provable security,” see Koblitz & Menezes’ Another Look website and its linked articles, especially Koblitz & Menezes (2010)
4. See Williams (1980)
5. See Koblitz & Menezes (2010). For additional examples of errors in mathematical proofs, see Branwen (2012); Kornai (2013)