Friendly AI Research as Effective Altruism

 |   |  Analysis

MIRI was founded in 2000 on the premise that creating1 Friendly AI might be a particularly efficient way to do as much good as possible.

Some developments since then include:

  • The field of “effective altruism” — trying not just to do good but to do as much good as possible2 — has seen more publicity and better research than ever before, in particular through the work of GiveWell, the Center for Effective Altruism, the philosopher Peter Singer, and the community at Less Wrong.3
  • In his recent PhD dissertation, Nick Beckstead has clarified the assumptions behind the claim that shaping the far future (e.g. via Friendly AI) is overwhelmingly important.
  • Due to research performed by MIRI, the Future of Humanity Institute (FHI), and others, our strategic situation with regard to machine superintelligence is more clearly understood, and FHI’s Nick Bostrom has organized much of this work in a forthcoming book.4
  • MIRI’s Eliezer Yudkowsky has begun to describe in more detail which open research problems constitute “Friendly AI research,” in his view.

Given these developments, we are in a better position than ever before to assess the value of Friendly AI research as effective altruism.

Still, this is a difficult question. It is challenging enough to evaluate the cost-effectiveness of anti-malaria nets or direct cash transfers. Evaluating the cost-effectiveness of attempts to shape the far future (e.g. via Friendly AI) is even more difficult than that. Hence, this short post sketches an argument that can be given in favor of Friendly AI research as effective altruism, to enable future discussion, and is not intended as a thorough analysis.

An argument for Friendly AI research as effective altruism

Beckstead (2013) argues5 for the following thesis:

From a global perspective, what matters most (in expectation) is that we do what is best (in expectation) for the general trajectory along which our descendants develop over the coming millions, billions, and trillions of years.

Why think this? Astronomical facts suggest that humanity (including “post-humanity”) could survive for billions or trillions of years (Adams 2008), and could thus produce enormous amounts of good.6 But the value produced by our future depends on our development trajectory. If humanity destroys itself with powerful technologies in the 21st century, then nearly all that future value is lost. And if we survive but develop along a trajectory dominated by conflict and poor decisions, then the future could be much less good than if our trajectory is dominated by altruism and wisdom. Moreover, some of our actions today can have “ripple effects”7 which determine the trajectory of human development, because many outcomes are path-dependent. Hence, actions which directly or indirectly precipitate particular trajectory changes (e.g. mitigating existential risks) can have vastly more value (in expectation) than actions with merely proximate benefits (e.g. saving the lives of 20 wild animals). Beckstead calls this the “rough future-shaping argument.”

If we accept the normative assumptions lurking behind this argument (e.g. risk neutrality; see Beckstead’s dissertation), then the far future is enormously valuable (if it goes at least as well on average as the past century), and existential risk reduction is much more important than producing proximate benefits (e.g. global health, poverty reduction) or speeding up development (which could in fact increase existential risks, and even if it doesn’t, has lower expected value than existential risk reduction).

However, Beckstead’s conclusion is not necessarily that existential risk reduction should be our global priority, because

there may be other ways to have a large, persistent effect on the far future without reducing existential risk… Some persistent changes in values and social norms could make the future [some fraction] better or worse… Sure, succeeding in preventing an existential catastrophe would be better than making a smaller trajectory change, but creating a small positive trajectory change may be significantly easier.

Instead, Beckstead’s arguments suggest that “what matters most for shaping the far future is producing positive trajectory changes and avoiding negative ones.” Existential risk reduction is one important kind of positive trajectory change that could turn out to be the intervention with the highest expected value.

One important clarification is in order. It could turn out to be that working toward proximate benefits or development acceleration does more good than “direct” efforts for trajectory change, if working toward proximate benefits or development acceleration turns out to have major ripple effects which produce important trajectory change. For example, perhaps an “ordinary altruistic effort” like solving India’s iodine deficiency problem would cause there to be thousands of “extra” world-class elite thinkers two generations from now, which could increase humanity’s chances of intelligently navigating the crucial 21st century and spreading to the stars. (I don’t think this is likely; I suggest it merely for illustration.)

For the sake of argument, suppose you agree with Beckstead’s core thesis that “what matters most (in expectation) is that we do what is best (in expectation) for the general trajectory along which our descendants develop.” Suppose you also think, as I do, that machine superintelligence is probably inevitable.8

In that case, you might think that Friendly AI research is a uniquely foreseeable and impactful way to shape the far future in an enormously positive way, because “our effects on the far future must almost entirely pass through our effects on the development of machine superintelligence.” All other developing trends might be overridden by the overwhelming effectiveness of machine superintelligence — and specifically, by the values that were (explicitly or implicitly, directly or indirectly) written into the machine superintelligence(s).

If that’s right, our situation is a bit like sending an interstellar probe to colonize distant solar systems before they recede beyond the cosmological horizon and can thus never be reached from Earth again due to the expansion of the universe. Anything on Earth that doesn’t affect the content of the probe will have no impact on those solar systems. (See also this comment.)


Potential defeaters

The rough argument above — in favor of Friendly AI research as an efficient form of effective altruism — deserves to be “fleshed out” in more detail.9

Potential defeaters should also be examined:

In future blog posts, members of the effective altruist community (including myself) will expand on the original argument and examine potential defeaters.


My thanks to those who provided feedback on this post: Carl Shulman, Nick Beckstead, Jonah Sinick, and Eliezer Yudkowsky.

  1. In this post, I talk about the value of humanity in general creating Friendly AI, though MIRI co-founder Eliezer Yudkowsky usually talks about MIRI in particular — or at least, a functional equivalent — creating Friendly AI. This is because I am not as confident as Yudkowsky that it is best for MIRI to attempt to build Friendly AI. When updating MIRI’s bylaws in early 2013, Yudkowsky and I came to a compromise on the language of MIRI’s mission statement, which now reads: “[MIRI] exists to ensure that the creation of smarter-than-human intelligence has a positive impact. Thus, the charitable purpose of [MIRI] is to: (a) perform research relevant to ensuring that smarter-than-human intelligence has a positive impact; (b) raise awareness of this important issue; (c) advise researchers, leaders and laypeople around the world; and (d) as necessary, implement a smarter-than-human intelligence with humane, stable goals” (emphasis added). My own hope is that it will not be necessary for MIRI (or a functional equivalent) to attempt to build Friendly AI itself. But of course I must remain open to the possibility that this will be the wisest course of action as the first creation of AI draws nearer. There is also the question of capability: few people think that a non-profit research organization has much chance of being the first to build AI. I worry, however, that the world’s elites will not find it fashionable to take this problem seriously until the creation of AI is only a few decades away, at which time it will be especially difficult to develop the mathematics of Friendly AI in time, and humanity will be forced to take a gamble on its very survival with powerful AIs we have little reason to trust. 
  2. One might think of effective altruism as a straightforward application of decision theory to the subject of philanthropy. Philanthropic agents of all kinds (individuals, groups, foundations, etc.) ask themselves: “How can we choose philanthropic acts (e.g. donations) which (in expectation) will do as much good as possible, given what we care about?” The consensus recommendation for all kinds of choices under uncertainty, including philanthropic choices, is to maximize expected utility (Chater & Oaksford 2012; Peterson 2004; Stein 1996; Schmidt 1998:19). Different philanthropic agents value different things, but decision theory suggests that each of them can get the most of what they want if they each maximize their expected utility. Choices which maximize expected utility are in this sense “optimal,” and thus another term for effective altruism is “optimal philanthropy.” Note that effective altruism in this sense is not too dissimilar from earlier approaches to philanthropy, including high-impact philanthropy (making “the biggest difference possible, given the amount of capital invested“), strategic philanthropy, effective philanthropy, and wise philanthropy. Note also that effective altruism does not say that a philanthropic agent should specify complete utility and probability functions over outcomes and then compute the philanthropic act with the highest expected utility — that is impractical for bounded agents. We must keep in mind the distinction between normative, descriptive, and prescriptive models of decision-making (Baron 2007): “normative models tell us how to evaluate… decisions in terms of their departure from an ideal standard. Descriptive models specify what people in a particular culture actually do and how they deviate from the normative models. Prescriptive models are designs or inventions, whose purpose is to bring the results of actual thinking into closer conformity to the normative model.” The prescriptive question — about what bounded philanthropic agents should do to maximize expected utility with their philanthropic choices — tends to be extremely complicated, and is the subject of most of the research performed by the effective altruism community. 
  3. See, for example: Efficient Charity, Efficient Charity: Do Unto Others, Politics as Charity, Heuristics and Biases in Charity, Public Choice and the Altruist’s Burden, On Charities and Linear Utility, Optimal Philanthropy for Human Beings, Purchase Fuzzies and Utilons Separately, Money: The Unit of Caring, Optimizing Fuzzies and Utilons: The Altruism Chip Jar, Efficient Philanthropy: Local vs. Global Approaches, The Effectiveness of Developing World Aid, Against Cryonics & For Cost-Effective Charity, Bayesian Adjustment Does Not Defeat Existential Risk Charity, How to Save the World, and What is Optimal Philanthropy? 
  4. I believe Beckstead and Bostrom have done the research community an enormous service in creating a framework, a shared language, for discussing trajectory changes, existential risks, and machine superintelligence. When discussing these topics with my colleagues, it has often been the case that the first hour of conversation is spent merely trying to understand what the other person is saying — how they are using the terms and concepts they employ. Beckstead’s and Bostrom’s recent work should enable clearer and more efficient communication between researchers, and therefore greater research productivity. Though I am not aware of any controlled, experimental studies on the effect of shared language on research productivity, a shared language is widely considered to be of great benefit for any field of research, and I shall provide a few examples of this claim which appear in print. Fuzzi et al. (2006): “The use of inconsistent terms can easily lead to misunderstandings and confusion in the communication between specialists from different [disciplines] of atmospheric and climate research, and may thus potentially inhibit scientific progress.” Hinkel (2008): “Technical languages enable their users, e.g. members of a scientific discipline, to communicate efficiently about a domain of interest.” Madin et al. (2007): “terminological ambiguity slows scientific progress, leads to redundant research efforts, and ultimately impedes advances towards a unified foundation for ecological science.” 
  5. In addition to Beckstead’s thesis, see also A Proposed Adjustment to the Astronomical Waste Argument
  6. Beckstead doesn’t mention this, but I would like to point out that moral realism is not required for Beckstead’s arguments to go through. In fact, I generally accept Beckstead’s arguments even though most philosophers would not consider me a moral realist, though to some degree that is a semantic debate (Muehlhauser 2011; Joyce 2012). If you’re a moral realist and you believe your intuitive moral judgments are data about what is morally true, then Beckstead’s arguments (if successful) have something to say about what is morally true, and about what you should do if you want to act in morally good ways. If you’re a moral anti-realist but you think your intuitive judgments are data about what you value — or about what you would value if you had more time to think about your values and how to resolve the contradictions among them — then Beckstead’s arguments (if successful) have something to say about what you value, and about what you should do if you want to help achieve what you value. 
  7. Karnofsky calls these “flow-through effects.” 
  8. See Bostrom (forthcoming) for an extended argument. Perhaps the most likely defeater for machine superintelligence is that global catastrophe may halt scientific progress before human-level AI is created. 
  9. Beckstead, in personal communication, suggested (but didn’t necessarily endorse) the following formalization of the rough argument sketched in the main text of the blog post: “(1) To a first approximation, the future of humanity is all that matters. (2) To a much greater extent than anything else, the future of humanity is highly sensitive to how machine intelligence unfolds. (3) Therefore, there is a very strong presumption in favor of working on any project which makes machine intelligence unfold in a better way. (4) FAI research is the most promising route to making machine intelligence unfold in a better way. (5) Therefore, there is a very strong presumption in favor of doing FAI research.” Beckstead (2013) examines the case for (1). Bostrom (forthcoming), in large part, examines the case for (2). Premise (3) informally follows from (1) and (2), and the conclusion (5) informally follows from (3) and (4). Premise (4) appears to me to be the most dubious part of the argument, and the least explored in the extant literature. 
  • David Pearce

    “(1) To a first approximation, the future of humanity is all that matters.”
    Is it wise to formalise anthropocentric bias? Insofar as we aspire to impartial benevolence, perhaps use instead:
    “(1) To a first approximation, the future of sentience is all that matters.”
    Maybe the first formulation is a corollary of the second; this remains to be shown.

  • jbash

    Supposing I don’t care if humanity spreads to the stars? What do you and Bostrom and Beckstead have for me and people like me? I mean, there must be some other people like me…

    On a brief glance over Beckstead, and on general familiarity with the rest:

    Like Yudkowski, I believe human ethical and aesthetic intuitions are complex. I’m not sure if this next part is shared with Yudkowski, but I personally see no reason to expect to ever be able to develop a logically consistent formulation of ethics that doesn’t lead to throwing out some intuitions. And throwing major intuitions out willy-nilly is probably worse than accepting inconsistency.

    So the best we can do in real, practical ethics seems to be to muddle along, apply intuitions where they intuitively seem to apply, and deal with the inevitable border wars between them as best we can. Different sets of intuitions belong to different magisteria, if you will.

    In this particular magisterium, I don’t care much about humanity; I care about humans (and certain kinds of AIs, aliens, successors, maybe animals, etc). For me, the ethic that intuitively applies is to X-risk is what Beckstead identifies as (relatively) “Strict Asymmetric Actualism”. I’m not willing to deal in the potential experiences of potential people. I’m interested in the actual experience of actual people. That includes people who come into being in the future… but only if they actually do come into being.

    So, for example, if, in the “voluntary extinction” scenario, everybody happily decided to stop reproducing tomorrow, I wouldn’t be very bothered. Any concern I did have would be over in some other magisterium somewhere, and frankly I’d need relatively sensitive instruments to detect it. Beckstead seems to see acceptance of extinction as a huge intuitive problem… but frankly it seems to me to be kind of weird to worry about it. Species don’t matter; people do.

    On the other hand, if a bunch of people were made miserable, now or in the future, I’d be pretty upset about it. Once having come into existence, I’d certainly prefer people and AIs to have a good time. But only real ones, thanks.

    If I followed Beckstead’s view, it seems to me that I’d just have to spend all my time mourning the more or less infinite number of happy beings who will never exist no matter what. That seems to me to be a much worse intuitive knock against his system than acceptance of extinction is against mine.

    Where I have to admit to trouble is in the “miserable child” example (perhaps oddly, the “happy child” doesn’t give me any trouble at all). Although I think Beckstead is pretty cavalier in ignoring the very significant, widely accepted ethical systems that say it’s good and right to have the miserable child, I don’t subscribe to any of those systems, and the miserable child bothers me.

    I think what may be going on there is that individual children are in a separate magisterium from species survival. Or maybe I’m just applying a precautionary meta-ethic that drives me into an asymmetric view for certain cases. Perhaps what underlies that, in an “evolutionary” sense, might be some kind of heuristic for dealing with limited information about effects. Nonetheless, my intuitions are the ones I have, and it’s not clear understanding the source would license me to throw one out even if I were sure about it.

    Whatever the answer, I’m not really upset that I can’t reconcile my intuitions on the miserable child with my intutions on X-risk, mostly because it’s unreasonable to expect to be able to reconcile them in the first place. What Beckstead, and many others, are trying to create simply can’t exist.

    The good news is that these border wars mostly seem to arise in completely weird hypothetical cases. Having the miserable child would never really be a neutral event for the child’s mother. It seems as though our intutions have worked out their differences pretty well for most practical cases.

    Also, unrelated to the above, although it’s true that the future could be hugely good (not infinitely good; it’s a finite universe), I’m definitely convinced that some possible futures could be a lot worse than no future at all. Existential “risk” forecloses those futures.

    • lukeprog

      Several chapters of Beckstead’s thesis contain thought experiments meant to suggest that you might care more about future people than you think. Also, do you accept the standard “block universe” view of the universe, according to which future people exist the same way you do? Sometimes, people’s intuitions about what they value change after learning about that. In any case, this is something people disagree about, and Beckstead’s dissertation addresses the arguments back and forth in great detail.

  • Glue

    Very interesting comment, jbash. I basically agree: Who cares if potential future people don’t come into existence? They certainly don’t, and if we don’t either, then nobody does. The “x-risk” dogma is based on the assumption that ensuring that there will be a future is extremely valuable. But that’s doubtful on ethical as well as empirical grounds. (Even if bringing potential future people into existence is hugely important, who says our future is net good? As you say, existential “risk” forecloses bad futures too.) What I didn’t get is why your view on the miserable child conflicts with your view on x-risk?

  • Tim Tyler

    Few take such a global perspective. Typically those that do so, don’t leave very many offspring – so we should not expect to see very many organisms with such concerns around. Those vulnerable to this sort of meme will typically be those with weakened memetic immune systems which are vulnerable to this sort of imaginary superstimulus.

  • Vlastimil

    I have wondered about the objection that “Friendly AI research is not (today) a particularly efficient way to positively affect the development of machine superintelligence”, which runs against the premise that “FAI research is [today] the most promising route to making machine intelligence unfold in a better way.”

    Where has this issue been discussed?

    Some people expect AI soon but believe that making AI friendly will be a relatively trivial task, in comparison with the creation of some AI or other. No need of funding today, in advance, any special AI safety research, they think. AI friendliness will be an important, but easy thing.

    Similarly, there are people who expect AI soon, but admit that nobody has any exact idea how to guarantee that AI will be nice to humans. So what?, they retort; When we will know how to program an AI, we will know how to program a friendly AI. But at this moment, they think, trying to set out goals of friendly AI would be sheer speculation and waste of money.

    Not that I am myself this happy-go-lucky kind of guy. I do claim that the challenge of friendly AI will be met ambulando, in the process of creating some AI or other. And I do not claim that that present-day AI safety research is useless because premature. But I guess there are people who think so. Is their position uninformed? Maybe. Is it unintelligible or patently absurd? No.

    So what should one say to the objection that the problem of friendly AI will solve itself while people are work on other things?