Existential Risk Strategy Conversation with Holden Karnofsky

On January 16th, 2014, MIRI met with Holden Karnofsky to discuss existential risk strategy. The participants were:

Eliezer Yudkowsky (research fellow at MIRI)
Luke Muehlhauser (executive director at MIRI)
Holden Karnofsky (co-CEO at GiveWell)

We recorded and transcribed the conversation, and then edited and paraphrased the transcript for clarity, conciseness, and to protect the privacy of some content. The resulting edited transcript is available in full here (41 pages).

Below is a summary of the conversation written by Karnofsky, then edited by Muehlhauser and Yudkowsky. Below the summary are some highlights from the conversation chosen by Karnofsky.

See also three previous conversations between MIRI and Holden Karnofsky: on MIRI strategy, on transparent research analyses, and on flow-through effects.

Summary

Pages 1-5:

Defining some basic terms, such as the hypothesis that far future considerations dominate utility (the “N lives” hypothesis). Holden temporarily accepts this hypothesis for the sake of argument.

Pages 5-9:

Discussion of how the path to colonization of the stars is likely to play out.

There has been significant confusion over the definition of the term “existential risk.” Going into the conversation, Holden interpreted this term to mean “risks of events that would directly and more-or-less immediately wipe out the human race,” whereas Eliezer and Luke meant to use it to mean anything that causes humanity to fall drastically short of its potential, including paths that simply involve never succeeding in e.g. space colonization. This confusion took up much of pages 1-9 and was clarified at pages 8-9.

Pages 9-13:

Further discussion of how the path to colonization of the stars is likely to play out. There isn’t much in the way of clear and major disagreement except for the dynamics around artificial general intelligence.

Pages 14-19:

Holden argues for an emphasis on “doing good without worrying much about its connection to the far future” as plausibly the best way to improve the far future. At the same time, he thinks it’s likely that global catastrophic risks are a promising area for philanthropy and thinks it’s likely that GiveWell will make recommendations relevant to these.

Holden thinks that many of the arguments advanced in favor of focusing on global catastrophic risks are “right for the wrong reasons,” focusing on Pascal’s Mugging type arguments and not sufficiently addressing the question of whether these risks are neglected, and how neglected compared to other areas. Eliezer and Luke agree that addressing this question is crucial (though they are confident that the areas they emphasize as promising are sufficiently neglected to justify this).

Holden wants to do more investigation into what’s neglected before being confident that these areas are relatively neglected, but his current guess is that they are. He discusses to some extent why GiveWell hasn’t focused on them previously despite suspecting this.

Pages 19-27:

Holden thinks that donating to AMF (a) may make the world more robust to catastrophe in multiple ways, if in fact “robust to catastrophe” turns out to be the most important criterion; (b) is a good thing to do if that criterion does not turn out to be so important and if our ability to identify key risk factors for the far future is essentially nil; (c) has helped GiveWell along its path to becoming influential and able to work on other fronts. He believes that the amount of good done on the (a) front is small relative to the amount of influence over the future of humanity that people like Eliezer and Luke seek to have, and that people who think they can have a great deal of influence on the future of humanity have reason to pursue other opportunities, but that it isn’t obvious that a donor (especially the sort of individual donor GiveWell has historically targeted) can do as much good as Luke and Eliezer seem to aim for.

Eliezer and Luke agree that doing “normal good stuff” has a good track record and is likely to produce significant goods in the near term, but they worry that faster economic growth accelerates AI capabilities research relative to AI safety/Friendliness research, since the former parallelizes more easily across multiple researchers and funding sources than the latter does. If smarter-than-human AI is figured out before Friendliness is, they expect global catastrophe. Hence, faster economic growth could be net bad in the long run. But, they’re much, much less sure of the sign of this argument than they are about e.g. the intelligence explosion and the orthogonality of capabilities and values.

Pages 28-32:

It’s possible that the disagreements between Holden and Luke/Eliezer reduce to their disagreements over the specific dynamics of AI risk. Holden feels that the “Astronomical Waste” essay has been misleadingly cited, as establishing that one should directly focus on x-risk reduction work over other causes; Eliezer feels the essay didn’t say anything incorrect but has been misinterpreted on this point.

Pages 32-35:

Discussion of how to deal with a situation in which there are multiple models of the world that might be right, and some involve much higher claimed utilities than others. Does one do a straightforward expected-value calculation, which means essentially letting the model with the bigger numbers in it dominate? Or does one do something closer to “giving each model votes/resources in proportion to their likelihood”? Every principled/explicit answer seems to have problems. Holden, Luke and Eliezer agree that the “multiply big utilities by small probabilities” approach is not optimal.

Pages 35-39:

Discussion of likelihood that humanity will eventually colonize the stars. Luke and Eliezer think that this is a default outcome if things continue to go moderately well; Holden had the impression that this was not the case, but hasn’t thought about it much. Possibilities for updating on this question are discussed.

Pages 39-45:

Discussion of whether “causing N future lives to exist when they would have otherwise not existed” should be valued at least 10%*N times as much as “saving one life today.” Holden has high uncertainty on this question, more so than Eliezer and Luke. He mistrusts the methodology of thought experiments to a greater degree than Eliezer and Luke.

Some highlights selected by Holden

Holden: Yeah, I agree. I think you want a list of all the things that could be highly disruptive and you want to consider them all risks, and you want to consider them all possibilities, I’m not really sure what else there is here.

Luke: I think we might also disagree on what you can figure out, and what you can’t, about the future.

Holden: Yeah, I think that’s our main disagreement.

Luke: Because I think we make a list and we think we know some things about the items on that list and therefore we can figure out which ones to focus on more.

Holden: Well, no, I would agree with what you just stated, as stated, but I just think that you are more confident than I am. I also believe we can make a list of things that are much more likely to be disruptive than other things, and then we should go and look into them and pay attention to them, but I just think that you guys are much more confident in your view of what are the things. My feeling is this: my feeling is basically we know very little. It’s very important to keep this [upward trend] going. That is not something we should neglect or ignore. So generally helping people is good, that’s kind of how we’ve gotten to where we’ve gotten, is that people have just done things that are good, without visualizing where they’re going.

The track record of visualizing where it’s all going is really bad. The track record of doing something good is really good. So I think we should do good things, I also think that we should list things that we think are in the far future or just relevant to the far future that are especially important. I think we should look into all of them. Another point worth noting is that my job is different from you guy’s job. You guys are working in an organization that’s trying to … it’s a specialized organization, it’s knowledge production. My job is explicitly to be broad. My job is to basically be able to advise a philanthropist and part of what I want to be able to do is to be able to talk about a lot of different options and know about a lot of different things. I don’t think it’s a good way for me to do my job, to just pick the highest expected value thing and put all my eggs in that basket. But perhaps that would be a good job for many people, just not for someone whose explicit value-add is breadth of knowledge.

So part of it is the role, but I do think that you guys are much more confident. My view is that we should list things we think we know, we should look into doing something about them. At the same time, we should also just do things that are good, because doing things that are good has a better track record of getting us closer to colonizing the stars than doing things that are highly planned out.

Eliezer: So indeed, if I tried to pass your ideological Turing test, I would have said some mixture of “we can’t actually model the weird stuff and people trying to do good is what got us where we are and it will probably take us to the galaxy as well,” that would have been the very thing …

Holden: You just need to water down a little.

Eliezer: Sure, so: “insofar as we’re likely to get to the galaxy at all, and it’s highly probable that a lot of the pathway will be people just trying to do good, so just try to do good and get there.”

Holden: Yeah, and it especially will come from people just doing good, as approximate goal and then having kind of civilizational consequences in ways that were hard to foresee, which is I’m particularly interested in opportunities to do good that just feel big. Even if the definition of big is different from opportunity to opportunity, so like a way to help a lot of animals. A way to help a lot of Africans, a way to help a lot of Americans. These are all, in some absolute sense, it seems unlikely that they could be in the same range, but in some market efficiency sense, they’re in the same range. This is: whoa, I don’t see something this good every day, most things is good someone else snaps up, let me grab this one, because this is the kind of thing that could be like a steam engine, where it’s like, this thing is cool, I built it. it’s super cool. Then it actually has civilizational consequences.

Eliezer: So in order to get an idea of what you think Earth’s ideal allocation of resources should be, if you were appointed economic czar of the world, you formed the sort of dangerous to think about counterfactual… or maybe a better way of putting it would be: how much money would you need to have personal control over before you started to trying to fund, say, bioterror, nanotech and AI risk type stuff? Not necessarily any current organization, but before you start trying to do something about it?

Holden: I mean, less than Good Ventures has.

Eliezer: Interesting.

Holden: Like, I think we’re probably going to do something. But it depends. We want to keep looking into it. Part of this is that I don’t have all the information and you guys may have information that I don’t, but I think a lot of people in your community don’t have information and are following a Pascal’s mugging-type argument to a conclusion that they should have just been a lot more critical of, and a lot more interested in investigating the world about. So my answer is: we’re still looking at all this stuff, but my view is that no, existential risks are a great opportunity. There’s not nearly enough funding there.

***

Holden: So anyway, that was an aside. I think you guys are more in the camp of thinking you understand the issues really well and not only understanding what the issues are, but who is working on what and believing that the neglectedness of x-risk is a large part of your interest in x-risk, I think. I think there are a lot of people who reason so quickly to believing x-risk is paramount that I don’t believe they’ve gone out and looked at the world and seen what is neglected and what isn’t neglected. I think they’re instead doing a version of Pascal’s mugging. But I’m happy to engage with you guys and just say that I don’t know everything about what’s neglected and what isn’t, I think existential risk looks pretty neglected, preliminarily, but I want to look at more things before I really decide how neglected it is and what else might be more neglected. Do you agree with me that the neglectedness of x-risk is a major piece of why you think it’s a good thing to work on?

Luke: I think it is for me.

Eliezer: I think I would like specialize that to say that there are particular large x-risks that look very neglected, which means you get a big marginal leverage from acting on them. But even that wouldn’t really honestly actually carry the argument.

Holden: But if you read “Astronomical Waste,” it concludes that x-risk is the thing to work on, without discussing whether it’s neglected, and I think that’s the chain of reasoning most people are following. I think that is screwed up.

Eliezer: Yeah, that can’t possibly be right. Or a sane Earth has some kind of allocation across all philanthropies. And insofar as things drop below their allocations, you’ll get benefit from putting stuff into them, and if they go above their allocations, you’re betting off putting your money somewhere else. There exists some amount of investment we can make in x-risk, such that our next investment should be in Against Malaria Foundation or something. Although that actually that still isn’t right, because that’s a better argument now because GiveWell actually did say Against Malaria Foundation is temporarily overinvested, let’s see what they can do what with their existing inflow.

Luke: Though not necessarily relative to the current allocation in the world!

Holden: Yeah, absolutely.

***

Holden: Yeah. Okay. I think some disagreements we have, which I think are like not enormous disagreements, I think they mostly have to do with how confident we can be. I think we agree that there are many things that are important, we agree that being neglected is part of what makes a cause good. If there are other causes that are really important and really neglected, those are good, too. We agree that everything that is good has some value, but I think the things that are good have more value relative to the things that seem to fit into the long-term plan and that has a lot to do with my feeling about how confident we can be about the long-term plan.

Eliezer: My reasoning for CFAR sounds a lot like this. Why, to some extent, I sort in practice, divide my efforts between MIRI and CFAR, is sort of like this, except that no matter what happens, I expect the causal pathways to galactic colonization to go down the “something weird happens and other weird things potentially prevent you from doing it” path.

I think that human colonization of the galaxy has probability nearly zero.

Holden: Right, you think it would be something human-like.

Eliezer: I’m hoping that they’re having fun and that they have this big, complicated civilization and that there was sort of a continuous inheritance from human values, because I think fun is at present, a concept that exists among humans and maybe to some lesser extent, other mammals, but not in the rocks. So you don’t get it for free, you don’t want a scenario where the galaxies end up being turned into paperclips or something. But: “humane” life might be a better term.

Holden: Sure, sure.

Eliezer: I think that along the way there you get weird stuff happening and weird emergencies. So CFAR can be thought of as a sort of generalized weird emergencies handler.

Holden: There’s a lot of generalized weird emergencies handlers.

Luke: Yeah, you can improve decision-making processes in the world in general, by getting prediction market standard or something.

Holden: Also just by making people wealthier and happier.

Eliezer: Prediction markers have a bit of trouble with x-risks for obvious reasons, like the market can’t pay off in most of the interesting scenarios.

Holden: I think you can make humanity smarter by making it wealthier and happier. It certainly seems to be what’s happened so far.

Eliezer: Yeah, and intelligence enhancement?

Holden: Yeah, well, that, too. But that’s further off and that’s more specific and that’s more speculative. I think the world really does get smarter, as an ecosystem. I don’t mean the average IQ. I think the ecosystem gets smarter. If you believe that MIRI is so important, I think the existence of MIRI is a testament to this, because I think the existence of MIRI is made possible by a lot of this wealth and economic development, certainly it’s true for GiveWell. If you take my egg and run it back 20 years, my odds of being able to do anything like this are just so much lower.

Eliezer: CFAR, from my perspective, it’s sort of like: generalize those kind of skills required to handle a weird emergency like MIRI and have them around for whatever other weird stuff happens.

Holden: I think the world ecosystem has been getting better in handling weird emergencies like that. I think that part of that, if you want to put a lot of weight on your CFARs, then I think that’s evidence, and if you don’t want to put a lot of weight, then I think there’s other evidence. There is more nonprofits that deal with random stuff, because we have more money.

Eliezer: I’m not sure if I’d rate our ability to handle weird emergencies as having increased. Nuclear weapons are the sort of classic weird emergency that actually did get handled by this lone genius figure who saw it coming and tried to mobilize efforts and so on, I’m talking about Leó Szilárd. So there was a letter to President Roosevelt, which Einstein wrote, except Einstein didn’t write it. It was ghost-written by someone who did it because Leó Szilárd told them to, and then Einstein sent it off. There is this sort of famous story about the conversation where Leó Szilárd explains to Einstein about the critical fission chain reaction and Einstein sort of goes “I never thought of that.” Then came the Manhattan Project, which was this big mobilization of government effort to handle it.
So my impression is that if something like that happened again, modern day Einstein’s letter does not get read by Obama. My impression is that we’ve somehow gotten worse at this.

Holden: I don’t agree with that.

Luke: Eliezer, why do you think that?

Holden: You’re also pointing to a very specific pathway. I’m also thinking about all the institutions that exist to deal with random stuff these days. And all the people who have the intellectual freedom, the financial freedom, to think about this stuff, and not just this stuff, other stuff that we aren’t thinking about, that can turn out to be more important.

Eliezer: We don’t seem to be doing very well with sort of demobilizing the nuclear weapons of the former Soviet Republics, for example.

Holden: We’re also talking about random response to random stuff. I think we just have a greater degree of society’s ability to notice random stuff and to think about random stuff.

Eliezer: That’s totally what I would expect on priors, I’m just wondering if we can actually see evidence that it’s true. On priors, I agree that that’s totally expected.

Eliezer: We could somehow be worse than it was in the 1940s and yet, still, increasing development could all else equal improve our capacity to handle weird stuff. I think I’d agree with that. I think that I would like also sort of agree that all else being equal, as society becomes wealthier, there are more nonprofits, there is like more room to handle weird stuff.

Holden: Yeah, it’s also true that as we solve more problems, people go down the list, so I think if it hadn’t been for all the health problems in Africa, Bill Gates might be working on [GCRs], or he might be working on something else with global civilizational consequences. So when I’m sitting here not knowing what to do and not feeling very educated in the various speculative areas, but knowing that I can save some lives, that’s another reason there is something to that.

But it’s certainly like: the case for donating to AMF, aside from the way in which it helps GiveWell, is definitely in a world in which I feel very not very powerful and not very important [relative to the world Eliezer and Luke envision]. I feel like, you know, I’m going to do [a relatively small amount of] good and that’s what I’m trying to do.

So in some sense, when you say, AMF isn’t like a player in the story or something, I think that’s completely fair, but also by trying to take a lot of donors who are trying to do this much [a small amount] and trying to help them, we’ve hopefully gotten ourselves in a position to also be a player in the story, if in fact the concept of a player in the story ends up making sense. If it doesn’t and this [small amount of good] turns out to be really good, we’ll at least have done that.

Eliezer: The sort of obvious thing that I might expect Holden to believe, but I’m not sure that that actually passes your ideological Turing test is that collectively, fixing this stuff collectively, is like a bigger player than collectively the people who go off and try to fix weird things that they think that the fate of the future will hinge on.

Holden: I just think it’s possible that what you just said is true, and possible that it isn’t. If I’m sitting here, knowing very little about anything, and I want to do a little bit of good, I think doing a little bit of good is better than taking a wild guess on something that I feel very ill-informed about. On the other hand, our ideal at GiveWell is to really be playing both sides.

***

Eliezer: Do we think a necessary and sufficient cause of our disagreement is just our visualization of how AI plays out?

Holden: I think it’s possible.

Eliezer: If your visualization of how AI worked magically, instantly switched to Eliezer Yudkowsky’s visualization of how AI worked. I mean, Eliezer Yudkowsky, given sudden magical control of GiveWell, does not just GiveWell to be all about x-risk. Eliezer puts it on the link like three steps deep, and just sort of tries to increase the degree to which incoming effective altruists are funneled toward…

…

Holden: I think it’s pretty possible, and I just want to contrast what you guys think with the normal tenor of the arguments I have over x-risk, which… I just talk to a lot of people who are just like, look, x-risk is clearly the most important thing. Why do you think that? Well, have you read “Astronomical Waste?” Well, that’s a little bit absurd. You have an essay that doesn’t address whether something is neglected, concludes what’s most important, and we’re not even talking about AI and path to AI and why AI, it’s just x-risk, [which people interpret to mean things like] asteroids, come on.

Eliezer: I endorse your objection. We can maybe issue some kind of joint statement, if you want, to inform people.

Holden: Yeah, perhaps. I was going to write something about this, so maybe I’ll run it by you guys. To the extent that I’m known as Mr. X-Risk Troll, or whatever, it’s because those are the arguments I’m always having. When I think about you guys, I think that you and I do not see eye to eye on AI, and that goes back to that conversation we had last time, and that may be a lot of the explanation. At the same time, it’s certainly on the table for us to put some resources into this.

***

Eliezer: Okay, I checked the “Astronomical Waste” paper and everything in there seemed correct, but I can see how we would all now wish, in retrospect, that a caveat had been added along the lines of “and in the debate over what to do nowadays, this doesn’t mean that explicit x-risk focused charities are the best way to maximize the probability of okay outcome.”

Holden: Right, and in fact, this doesn’t tell us very much. This may prove a useful framework, it may prove a useless framework. There’s many things that have been left unanswered, whereas the essay really had a conclusion of: we’ve narrowed it down from a lot to a little.

Eliezer: I don’t remember that being in that essay. It was just sort of like, this is the criterion by which we should choose between actions, which seems like obviously correct in my own ethical framework.

Holden: I also don’t agree with that, so maybe that’s the next topic.

Eliezer: Yeah. Suppose that you accepted Maximized Probability of Okay Outcome, not as a causal model of how the world works, but just as a sort of a determining ethical criterion. Would anything you’re doing change?

Holden: I’ve thought about this, maybe not as hard as I should. I don’t think much would change. I think I would be relatively less interested in direct, short-term suffering stuff. But I’m not sure by a lot. Actually, I think I would be substantially now. I think five years ago, I wouldn’t have changed much. I think right now I would be, because I feel like we’re becoming better positioned to actually target things, I think I would be a little bit more confident about zeroing in on extreme AI and the far future and all that stuff. And the things that I think matter most to that, but I don’t think it would be a huge change.

***

Holden: I just think there’s also a chance that this whole argument is crap and… so there is one guy [at GiveWell] who is definitely representing more the view that we’re not going to have any causal impact on all this [far future] stuff and there is suffering going on right now and we should deal with it, and I place some weight on that view. I don’t do it the way that you would do it in an expected value framework, where it’s like according to this guy, we can save N lives and according to this guy, we could save Q lives and they have very different world models. So therefore, the guy saying N lives wins because N is so much bigger than Q. I don’t do the calculation that way. I’m closer to equal weight, right.

Eliezer: Yeah, you’re going to have trouble putting that on a firm epistemic foundation but Nick Bostrom has done some work on what he calls parliamentary models of decision-making. I’m not sure Nick Bostrom would endorse their extension to this case, but descriptively, it seems a lot of what we do is sort of like the different things we think might be true get to be voices in our head in proportion to how true they are and then they negotiate with each other. This has the advantage of being robust against Pascal’s Mugging-type stuff, which I’d like to once again state for the historical record: I invented that term and not as something that you ought to do! So anyway, it’s robust against Pascal’s Mugging-type stuff, and it has the disadvantage of plausibly failing the what-if-everyone-did-that test.

***

Holden: Let me step back a second. I hear your claim that I should assign a very high probability that we can — if we survive — colonize the stars. I believe this to be something that smart technical people would not agree with. I’ve outlined why I think they wouldn’t agree with it, but not done a great job with it and that’s something that I’d be happy to think more about and talk more about.

Eliezer: Are there reasons apart from the Fermi Paradox?

Holden: I don’t know what all the reasons are. I’ve given my loose impression and it’s not something that I’ve looked into much, because I didn’t really think there was anyone on the other side.

Browse

Existential Risk Strategy Conversation with Holden Karnofsky

Summary

Some highlights selected by Holden

Categories