Nik Weaver on Paradoxes of Rational Agency

 |   |  Conversations

Nik Weaver portraitNik Weaver is a professor of mathematics at Washington University in St. Louis. He did his graduate work at Harvard and Berkeley and received his Ph.D. in 1994. His main interests are functional analysis, quantization, and the foundations of mathematics. He is best known for his work on independence results in C*-algebra and his role in the recent solution of the Kadison-Singer problem. His most recent book is Forcing for Mathematicians.

Luke Muehlhauser: In Weaver (2013) you discuss some paradoxes of rational agency. Can you explain roughly what these “paradoxes” are, for someone who might not be all that familiar with provability logic?


Nik Weaver: Sure. First of all, these are “paradoxes” in the sense of being highly counterintuitive — they’re not outright contradictions.

They all relate to the basic Löbian difficulty that if you reason within a fixed axiomatic system S, and you know that some statement A is provable within S, you’re generally not able to deduce that A is true. This may be an inference that you and I would be willing to make, but if you try to build it into a formal system then the system becomes inconsistent. So, for a rational agent who reasons within a specified axiomatic system, knowing that a proof exists is not as good as actually having a proof.

This leads to some very frustrating consequences. Let’s say I want to build a spaceship, but first I need to be sure that it’s not going to blow up. I have an idea about how to prove this, but it’s extremely tedious, so I write a program to work out the details of the proof and verify that it’s correct. The good news is that when I run the program, it informs me that it was able to fill in the details successfully. The bad news is that I now know that there is a proof that the ship won’t blow up, but I still don’t know that the ship won’t blow up! I’m going to have to check the proof myself, line by line. It’s a complete waste of time, because I know that the program functions correctly (we can assume I’ve proven this), so I know that the line by line verification is going to check out, but I still have to do it.

You may say that I have been programmed badly. Whoever wrote my source code ought to have allowed me to accept “there is a proof that the ship won’t blow up” as sufficient justification for building the ship. This can be a general rule: for any statement A, let “there exists a proof of A” license all the actions that A licenses. We’re not contradicting Löb’s theorem — we still can’t deduce A from knowing there is a proof of A — but we’re finessing it by stipulating that knowing there’s a proof of A is good enough. But there are still problems. Imagine that I can prove that if the Riemann hypothesis is true, then the ship won’t blow up, and if it’s false then there exists a proof that the ship won’t blow up. Then I’m in a situation where I know that either A is true or there is a proof that A is true, but I don’t know which one. So even with the more liberal licensing condition, I still can’t build my lovely spaceship.


Luke: Section 2 of the paper develops an approach to solving this kind of problem via the notion of warranted assertibility. How does that work?


Nik: “Warranted assertibility” is a philosophical term that, in mathematical settings, refers to statements which can be asserted with perfect rational justification. My idea was that this concept could help with these Löbian type problems because it is nicer in some ways than formal provability. For instance, in any consistent axiomatic system there are statements A(n) with the property that for each value of n we can prove A(n), yet there is no single proof of the statement “for all n, A(n)”. However, it is always the case that if each A(n) is warranted then “for all n, A(n)” is also warranted.

So I introduce a predicate Box(A) to express “A is warranted”, and given a suitable axiomatization (which is a little subtle) I show that we can always infer Box(A) from knowing that there is a proof of A. We still can’t infer A — Löb’s theorem is not violated — but, along the lines I suggested earlier, we can program our agents to accept Box(A) as licensing the same actions that A licenses. Thus, it suffices to prove that the statement “the ship won’t blow up” is warranted. If you go this route then all the paradoxes seem to disappear. If an agent is able to reason about warranted assertibility, then it gets to perform all the actions it intuitively ought to be allowed to perform.


Luke: Later in the paper, you write:

[Yudkowsky & Herreshoff (2013)] involves a rational agent who is seeking… general permission to delegate the performance of actions to a second agent… The same issues appear in the case of an intelligent machine that is considering modifying its own source code (in order to make itself more intelligent, say). Before doing this it would want to be sure that its post-modification state will reason correctly, i.e., any theorem it proves after the modification should actually be true. This runs into the familiar Löbian difficulty that the agent is not even able to affirm the soundness of its pre-modification reasoning.

… In Section 4 of [Yudkowsky & Herreshoff (2013)] two constructions are presented of an infinite sequence of independently acting agents, each of whom can give a provable justification for activating the next one, yet none of whose deductive power falls below an initially prescribed level. The constructions are clever but they have a nonstandard flavor. Probably this is unavoidable, unless the problem description were to be altered in some fundamental way. In the remainder of this section we present another solution which uses nonstandard assertibility reasoning.

Can you describe the solution you go on to present — again, in terms that someone lacking a background in provability logic might be able to understand, at least in an intuitive way?


Nik: I will try! First, in regard to my comment that only nonstandard solutions seem possible, I should mention that there is now a theorem due to Herreshoff, Fallenstein, and Armstrong that makes this idea precise. The point is that it’s not clear I should be allowed to build a duplicate copy of myself and delegate my goals to it, for fear that it would then do the same thing, and on ad infinitum (the “procrastination problem”).

Now, going back to my spaceship example, let B be the statement that the ship won’t blow up. A basic feature of assertibility is that we have a general law that A implies Box(A), but we don’t have the converse. So Box(B) is “weaker” than B, and my proposal was that we should allow our agents to accept Box(B) to license the same action (building the spaceship) that B does. This helps with Löbian issues because the statement that B is provable doesn’t imply B, but it does imply Box(B).

But if Box(B) is good enough, why not accept the even weaker statement Box(Box(B))? Or Box(Box(Box(B)))? Let me write Box^k(B) for the general expression with k boxes. The idea about delegating tasks to successors is that if I am willing to accept Box^9(B), say, then I should be willing to build a successor who can’t accept Box^9(B) but can accept Box^8(B). I can reason that if my successor proves Box^8(B), then Box^8(B) is provable, and therefore Box^9(B) is true, which is good enough for me. Then my successor should be happy with a subsequent model that is required to prove Box^7(B), and so on. So we can create finite chains in this way, without doing anything nonstandard.

The way to get infinite chains is to introduce a nonstandard “infinite” natural number kappa, and start with an initial agent who is required to verify Box^kappa(B). Then it can build a successor who is required to verify Box^{kappa – 1}(B), and so on. All the agents in the resulting sequence have the same formal strength because a simple transposition (replacing kappa with kappa + 1) turns each agent into its predecessor, without affecting what they’re able to prove.


Luke: Now let me ask about research of this type in general. What’s your model for how this kind of research might be useful? It’s pretty distant from current methods for constructing actual software agents.


Nik: Yes, it’s very theoretical, no question. One model is pure mathematics, which is actually my background. In pure math, most of the time we’re working on questions that don’t have any immediate practical applications. I think experience has shown that this kind of work occasionally becomes very important, while most of it does not, but it’s hard to predict which directions are going to pan out.

Or, you know, maybe we’re all living inside a simulation like Nick Bostrom says. In which case, the fact that we’re trying to figure out how to help machines make themselves smarter may be a clue as to the nature of that simulation.


Luke: What do you think are some “promising avenues” for making additional progress on reasoning our way through these “paradoxes” of rational agency?


Nik: I don’t know. If you’re talking about the Löbian obstacle, I feel that something essential is missing from our current problem description. Among the directions that are available now my preference is for Benja Fallenstein’s parametric polymorphism. But I also think we should be looking at ways of modifying the problem description.


Luke: What is the origin of your interest in paradoxes of rational agency?


Nik: I’ve been thinking about issues surrounding truth and circularity for around five years. (I just checked: my earliest paper on the subject was posted on the Mathematics arXiv in May 2009.) This was purely theoretical work, coming from an interest in the foundations of mathematics and logic.

My involvement in paradoxes of rational agency was sparked by Eliezer Yudkowsky and Marcello Herreshoff’s paper on the Lobian obstacle. In fact I think I got the term “paradoxes of rational agency” from that paper. When I read it I had the immediate reaction that my ideas about assertibility should be relevant, so that’s how I got interested.


Luke: Thanks, Nik!