Mathematical Proofs Improve But Don’t Guarantee Security, Safety, and Friendliness

 |   |  Analysis

encryptionIn 1979, Michael Rabin proved that his encryption system could be inverted — so as to decrypt the encrypted message — only if an attacker could factor n. And since this factoring task is computationally hard for any sufficiently large n, Rabin’s encryption scheme was said to be “provably secure” so long as one used a sufficiently large n.

Since then, creating encryption algorithms with this kind of “provable security” has been a major goal of cryptography,1 and new encryption algorithms that meet these criteria are sometimes marketed as “provably secure.”

Unfortunately, the term “provable security” can be misleading,2 for several reasons3.

First, Rabin-style security proofs don’t actually prove security. Instead, they prove that an efficient algorithm for breaking the encryption system would imply an efficient algorithm for solving the underlying computational problem (e.g. integer factorization), which is assumed to be computationally hard. And though these underlying problems are assumed to be computationally hard, they aren’t actually proven to be computationally hard. (Hence, the word “proof” in “security proof” is different from what “proof” usually means in mathematics, where a proof can be deductively traced all the way from the proven mathematical statement to the axioms of the mathematical system being used.)

Second, the system’s formal security requirements might fail to capture everything the attacker can do to break the system, and what information is available to the attacker. For example, it wasn’t long after Rabin (1979) that Ron Rivest (the R in RSA) pointed out4 that if an adversary could trick the user of Rabin’s crypto system to decrypt a cipher text of the adversary’s choosing, this would make it much easier for the adversary to break the whole system. As another example, Bleichenbacher (1998) showed that an early version of the RSA encryption system could be successfully attacked by making intelligent use of the error messages received after failed attacks.

In general, security proofs usually do not account for side channel attacks, which make use of information taken from the crypto system’s physical implementation. For example, a physical computer will inevitably use power and give off electromagnetic radiation when running a crypto system. In some cases, statistical analyses of this power consumption or radiation can be used to break the crypto system.

Third, sometimes, mathematical proofs themselves are wrong, including security proofs. For example, Boldyreva et al. (2007) created a new kind of digital signature (called OMS) which they claimed was more efficient and more secure than other systems with similar functionality. But then Hwang et al. (2009) showed that the “provably secure” OMS protocol could easily be broken. As it turned out, there was an undetected error in Boldyreva et al.’s 4-page proof for their Theorem 5.1.5

In fact, no system can be “provably secure” in the strongest sense, since (1) we can’t be 100% certain that the system’s formal security requirements have been specified properly, and (2) we can’t be 100% certain the security proof itself is without error.

Similarly, some computer systems are labeled “provably safe,” usually because the software has (in whole or in part) been formally verified against formally specified safety criteria.6 But once again, we must remember that (1) we can never be 100% certain the formal safety specification captures everything we care about, and (2) we can never be 100% certain that a complex mathematical proof (formal verification) is without error.

The same reasoning applies to AGIfriendliness.” Even if we discover (apparent) solutions to known open problems in Friendly AI research, this does not mean that we can ever build an AGI that is “provably friendly” in the strongest sense, because (1) we can never be 100% certain that our formal specification of “friendliness” captures everything we care about, and (2) we can never be 100% certain that there are no errors in our formal reasoning. (In fact, the “friendliness specification” problem seems much harder than the “security specification” and “safety specification” problems, since proper friendliness specification involves a heavy dose of philosophy, and humans are notoriously bad at philosophy.)

Thus, the approaches sometimes called “provable security,” “provable safety,” and “provable friendliness” should not be misunderstood as offering 100% guarantees of security, safety, and friendliness.7 Rather, these approaches are meant to provide more confidence than we could otherwise have, all else equal, that a given system is secure, safe, or “friendly.”

Especially for something as complex as Friendly AI, our message is: “If we prove it correct, it might work. If we don’t prove it correct, it definitely won’t work.”8

Humans often want zero-risk solutions, but there aren’t any zero-risk solutions for computer security, safety, or friendliness. On the other hand, we shouldn’t ignore the value of formal proofs. They give us more precision than any other method available, and are a useful complement to (e.g.) testing.9

And when it comes to building self-improving AGI, we’d like to have as much confidence in the system’s friendliness as we can. A self-improving AGI is not just a “safety-critical system,” but a world-critical system. Friendliness research is hard, but it’s worth it, given the stakes.10

  1. An encryption system is said to be provably secure if its security requirements are stated formally, and proven to be satisfied by the system, as was the case with Rabin’s system. See Wikipedia
  2. Security reductions can still be useful (Damgård 2007). My point is just that term “provable security” can be misleading, especially to non-experts. 
  3. For more details, and some additional problems with the term “provable security,” see Koblitz & Menezes’ Another Look website and its linked articles, especially Koblitz & Menezes (2010)
  4. See Williams (1980)
  5. See Koblitz & Menezes (2010). For additional examples of errors in mathematical proofs, see Branwen (2012); Kornai (2013)
  6. See also Transparency in Safety-Critical Systems
  7. Because this misunderstanding is so common, MIRI staff try to avoid using phrases like “provably Friendly.” MIRI research fellow Eliezer Yudkowsky is often accused of advocating for “provably friendly” AGI, but I was unable to find an example of Yudkowsky himself using that phrase. If Yudkowsky has used that phrase in the past, it seems likely that he was talking about having certain proofs about the internal structure of the AGI (to give us more confidence in its friendliness than would otherwise be the case), not proofs about its behavior in the physical world.

    Added 10-16-2014: A reader pointed me to several passages from Yudkowsky’s writings that could have contributed to this confusion. In an October 2008 post, Yudkowsky used the phrase “provably correct Friendly AI,” but didn’t explain what he meant by the phrase. Yudkowsky also used the phrase “Friendly AI that (provably correctly) self-improves” in an October 2008 blog post comment. In a December 2008 post, Yudkowsky wrote: “Programmers operating with strong insight into intelligence, directly create along an efficient and planned pathway, a mind capable of modifying itself with deterministic precision – provably correct or provably noncatastrophic self-modifications. This is the only way I can see to achieve narrow enough targeting to create a Friendly AI.” And finally, in a September 2005 mailing list comment, Yudkowsky wrote a clearer exposition of what he tends to mean when using the word “provable” in the context of Friendly AI:

    No known algorithm could independently prove a CPU design correct in the age of the universe, but with human-chosen *lemmas* we can get machine-*verified* correctness proofs. The critical property of proof in an axiomatic system is not that it’s certain, since the system might be inconsistent for all we know or can formally prove. The critical property is that, *if* the system is consistent, then a proof of ten thousand steps is as reliable as a proof of ten steps. There are no independent sources of failure. I hope that provably correct rewrites, like provably correct CPUs, will be managable if the AI can put forth deductive reasoning with efficiency, tractability, and scalability at least equalling that of a human mathematician. An AI-complete problem? Sure, but let’s not forget – we *are* trying to design an AI.


  8. This isn’t to say we expect to be able to (or need to) formally verify an entire AGI system. One possibility is that early AGIs will be hierarchical autonomous agents — ala Fisher et al. (2013) — and only the “top” control layer of the agent will be constructed to be amenable to formal correctness proofs of one kind or another. 
  9. See also Kornai (2012)Muehlhauser (2013)Damgård (2007)
  10. My thanks to Louie Helm and Eliezer Yudkowsky for their feedback on this post. 
  • Paul Crowley

    On your second point, it’s worth mentioning that both of those examples are from a long time ago; we have got much better at capturing the formal requirements to mesh with the capabilities that are actually available to an attacker. The protocol that Bleichenbacher attacked didn’t have a security reduction at all IIRC; a modern padding scheme like OAEP+ does, and is safe against that attack assuming the proof is good.

    On your third point, a much more famous example is the flaw in the proof for the widely used OAEP padding scheme for RSA encryption. OAEP was widely used for seven years before the flaw was found by Victor Shoup, leading to OAEP+.

    More broadly: you know the way the news media always get your particular specialisation wrong? Well, no crypto expert is credited or listed as co-author on this post, yet Luke just drew on some pretty detailed stuff in my field to make a point about AI safety and got essentially everything right. This seems slightly miraculous, and reasonable evidence for something good 🙂

    • lukeprog

      Thanks, I’ll look into the OAEP affair!

  • Chris Warburton

    It is certainly important to take the adjective “proven” with a pinch of salt. Proofs never apply to the real world, they only ever apply to abstract mathematical models which may or may not relate to the real world. However, the same can be said for any form of mental model.

    The nice property of formal systems is that we don’t lose confidence with each step, so once we’ve entered the mathematical realm by choosing a model (eg. a wave model for light), we can transform it into something that’s easier to study (for example, wave properties are equivalent to complex numbers, so we can study complex numbers with no further loss of accuracy).

  • Joshua Fox

    This is great! It expresses well what I’ve been trying to say for a while: Formal proofs are important, maybe more important than any other research, but far from a perfect solution, and so other measures should be found to complement formal proofs.

    By the way, I have spoken with two chip-design experts who have confirmed that the commonly-used analogy of (future) AGI friendliness proof to formal chip verification is inapt, for a reason you give here: You have to specify what you want to prove.

    This brings in another reason that proofs can fail to guarantee what you want. Your proofs will probably have to be restricted to a very low-level technical component, not the system as a whole.

    For processors, the proofs are still only at a component level not much higher than the gate level. In fact, as I understand it, even proving that dividing floating-point X by Y indeed produces X/Y is still impossible.

    And here is a third, related reason that proofs don’t work: Implementations, even when working as specified, do not follow the purest abstractions that we’d like to run our proofs on. For example, floating-point division is NOT real-number division. (E.g., Two numbers are never really equal, just closer-than-epsilon.) Likewise, integer arithmetic is Peano arithmetic that we’re used to — it can overflow. One can think of similar practical limitations in a future AGI implementation.

    Of course, we can compose proofs even about such “modified” arithmetic, but it is at one step removed from the logical structures that the mathematicians usually work with.

    I think proofs are top-priority, but it is good to recognize their limitations.

    • Chris Warburton

      I agree with your main point, but disagree about proofs being ‘low-level’ and your particular choice of invalid abstractions. It is certainly possible to prove properties of whole systems, but the main factor is how the property relates to the system.

      For example, in proof assistants like Coq and Agda our models are represented as computer programs and our properties are represented as (very strong!) types. To state that a model M satisfies a property P, we declare a function which accepts M as an argument and returns a value of type P. To prove this statement we must implement the function. Now, this can be easy or difficult depending on how M relates to P. The easiest case is when P is already the type of M, in which case we refer to M as being “correct by construction” (ie. it serves as its own correctness proof). The hardest case is when M ‘just so happens’ to be correct, ie. the way it is constructed coincidentally satisfies our property. In this case, we must show that every possible branch of the constructor code satisfies our property, which can get very tedious very quickly.
      The reason I take issue with your examples of floats not being Reals and words not being Integers is because those are such terrible models that anybody using them as assumptions in their formal proofs is DoingItWrong. Many verified systems use Integers, but they don’t implement them with machine words, they usually use a library like GMP. There are some verified systems which use floats, but they follow the IEEE754 specification rather than making such a terrible assumption.

      I don’t know of any verified system that uses Real arithmetic, but even if there are some, they certainly wouldn’t use floats as their implementation. Many verification tools are built on constructive logics, and in a constructive logic a Real number isn’t a simple value like a float; it’s actually a function which takes a Natural number representing the desired accuracy (eg. number of decimal places) and returns a Rational, which approximates the Real to that accuracy. This is clearly very different from a float.

      A good example of a dangerous assumption would be assuming infinite memory. Low-level systems will take this into account, eg. verified chips and possibly even operating system kernels, but most high-level systems are verified in some abstract language like lambda calculus, with the assumption that proofs at the source level will be maintained during compilation, but out-of-memory is a situation which doesn’t exist at the source level (hence it doesn’t feature in the correctness proof) but does at the machine level, even for the most faithful compilation. It’s also a manifestation of the halting problem, which makes it unsolvable in general.

      • Joshua Fox


        Thank you, it’s good to see that perspective on correct formal proof vs dangerous real-world assumptions.

        My comment was directed at formal chip verification. Eliezer and others in MIRI have often used chip verification as an example of an area where formal proof is possible, and my comment was contributing something I learned from conversations with two experts in the area.

        In chip verification, they tell me, these problems do exist. So, it would make sense for MIRI to stop using chip verification as an example of an area where full formal verification, in the sense intended by MIRI, is possible.

  • Patrick LaVictoire

    I’m surprised you didn’t mention the most striking example of Case 1: factoring integers was assumed to be computationally hard, but it turns out that a quantum computer could handle it in polynomial time! (Fortunately, scalable quantum computing seems to be well out of our grasp for now, and at any rate there are other problems we suspect aren’t in BQP that can be used for cryptosystems.)

    • Luke Muehlhauser

      I figured this wasn’t a great example *yet*, since we don’t have quantum computers yet.