Friendly AI Research as Effective Altruism

MIRI was founded in 2000 on the premise that creating ((In this post, I talk about the value of humanity in general creating Friendly AI, though MIRI co-founder Eliezer Yudkowsky usually talks about MIRI in particular — or at least, a functional equivalent — creating Friendly AI. This is because I am not as confident as Yudkowsky that it is best for MIRI to attempt to build Friendly AI. When updating MIRI’s bylaws in early 2013, Yudkowsky and I came to a compromise on the language of MIRI’s mission statement, which now reads: “[MIRI] exists to ensure that the creation of smarter-than-human intelligence has a positive impact. Thus, the charitable purpose of [MIRI] is to: (a) perform research relevant to ensuring that smarter-than-human intelligence has a positive impact; (b) raise awareness of this important issue; (c) advise researchers, leaders and laypeople around the world; and (d) as necessary, implement a smarter-than-human intelligence with humane, stable goals” (emphasis added). My own hope is that it will not be necessary for MIRI (or a functional equivalent) to attempt to build Friendly AI itself. But of course I must remain open to the possibility that this will be the wisest course of action as the first creation of AI draws nearer. There is also the question of capability: few people think that a non-profit research organization has much chance of being the first to build AI. I worry, however, that the world’s elites will not find it fashionable to take this problem seriously until the creation of AI is only a few decades away, at which time it will be especially difficult to develop the mathematics of Friendly AI in time, and humanity will be forced to take a gamble on its very survival with powerful AIs we have little reason to trust.)) Friendly AI might be a particularly efficient way to do as much good as possible.

Some developments since then include:

The field of “effective altruism” — trying not just to do good but to do as much good as possible ((One might think of effective altruism as a straightforward application of decision theory to the subject of philanthropy. Philanthropic agents of all kinds (individuals, groups, foundations, etc.) ask themselves: “How can we choose philanthropic acts (e.g. donations) which (in expectation) will do as much good as possible, given what we care about?” The consensus recommendation for all kinds of choices under uncertainty, including philanthropic choices, is to maximize expected utility (Chater & Oaksford 2012; Peterson 2004; Stein 1996; Schmidt 1998:19). Different philanthropic agents value different things, but decision theory suggests that each of them can get the most of what they want if they each maximize their expected utility. Choices which maximize expected utility are in this sense “optimal,” and thus another term for effective altruism is “optimal philanthropy.” Note that effective altruism in this sense is not too dissimilar from earlier approaches to philanthropy, including high-impact philanthropy (making “the biggest difference possible, given the amount of capital invested“), strategic philanthropy, effective philanthropy, and wise philanthropy. Note also that effective altruism does not say that a philanthropic agent should specify complete utility and probability functions over outcomes and then compute the philanthropic act with the highest expected utility — that is impractical for bounded agents. We must keep in mind the distinction between normative, descriptive, and prescriptive models of decision-making (Baron 2007): “normative models tell us how to evaluate… decisions in terms of their departure from an ideal standard. Descriptive models specify what people in a particular culture actually do and how they deviate from the normative models. Prescriptive models are designs or inventions, whose purpose is to bring the results of actual thinking into closer conformity to the normative model.” The prescriptive question — about what bounded philanthropic agents should do to maximize expected utility with their philanthropic choices — tends to be extremely complicated, and is the subject of most of the research performed by the effective altruism community.)) — has seen more publicity and better research than ever before, in particular through the work of GiveWell, the Center for Effective Altruism, the philosopher Peter Singer, and the community at Less Wrong. ((See, for example: Efficient Charity, Efficient Charity: Do Unto Others, Politics as Charity, Heuristics and Biases in Charity, Public Choice and the Altruist’s Burden, On Charities and Linear Utility, Optimal Philanthropy for Human Beings, Purchase Fuzzies and Utilons Separately, Money: The Unit of Caring, Optimizing Fuzzies and Utilons: The Altruism Chip Jar, Efficient Philanthropy: Local vs. Global Approaches, The Effectiveness of Developing World Aid, Against Cryonics & For Cost-Effective Charity, Bayesian Adjustment Does Not Defeat Existential Risk Charity, How to Save the World, and What is Optimal Philanthropy?))
In his recent PhD dissertation, Nick Beckstead has clarified the assumptions behind the claim that shaping the far future (e.g. via Friendly AI) is overwhelmingly important.
Due to research performed by MIRI, the Future of Humanity Institute (FHI), and others, our strategic situation with regard to machine superintelligence is more clearly understood, and FHI’s Nick Bostrom has organized much of this work in a forthcoming book. ((I believe Beckstead and Bostrom have done the research community an enormous service in creating a framework, a shared language, for discussing trajectory changes, existential risks, and machine superintelligence. When discussing these topics with my colleagues, it has often been the case that the first hour of conversation is spent merely trying to understand what the other person is saying — how they are using the terms and concepts they employ. Beckstead’s and Bostrom’s recent work should enable clearer and more efficient communication between researchers, and therefore greater research productivity. Though I am not aware of any controlled, experimental studies on the effect of shared language on research productivity, a shared language is widely considered to be of great benefit for any field of research, and I shall provide a few examples of this claim which appear in print. Fuzzi et al. (2006): “The use of inconsistent terms can easily lead to misunderstandings and confusion in the communication between specialists from different [disciplines] of atmospheric and climate research, and may thus potentially inhibit scientiﬁc progress.” Hinkel (2008): “Technical languages enable their users, e.g. members of a scientiﬁc discipline, to communicate eﬃciently about a domain of interest.” Madin et al. (2007): “terminological ambiguity slows scientiﬁc progress, leads to redundant research efforts, and ultimately impedes advances towards a uniﬁed foundation for ecological science.”))
MIRI’s Eliezer Yudkowsky has begun to describe in more detail which open research problems constitute “Friendly AI research,” in his view.

Given these developments, we are in a better position than ever before to assess the value of Friendly AI research as effective altruism.

Still, this is a difficult question. It is challenging enough to evaluate the cost-effectiveness of anti-malaria nets or direct cash transfers. Evaluating the cost-effectiveness of attempts to shape the far future (e.g. via Friendly AI) is even more difficult than that. Hence, this short post sketches an argument that can be given in favor of Friendly AI research as effective altruism, to enable future discussion, and is not intended as a thorough analysis.

An argument for Friendly AI research as effective altruism

Beckstead (2013) argues ((In addition to Beckstead’s thesis, see also A Proposed Adjustment to the Astronomical Waste Argument.)) for the following thesis:

From a global perspective, what matters most (in expectation) is that we do what is best (in expectation) for the general trajectory along which our descendants develop over the coming millions, billions, and trillions of years.

Why think this? Astronomical facts suggest that humanity (including “post-humanity”) could survive for billions or trillions of years (Adams 2008), and could thus produce enormous amounts of good. ((Beckstead doesn’t mention this, but I would like to point out that moral realism is not required for Beckstead’s arguments to go through. In fact, I generally accept Beckstead’s arguments even though most philosophers would not consider me a moral realist, though to some degree that is a semantic debate (Muehlhauser 2011; Joyce 2012). If you’re a moral realist and you believe your intuitive moral judgments are data about what is morally true, then Beckstead’s arguments (if successful) have something to say about what is morally true, and about what you should do if you want to act in morally good ways. If you’re a moral anti-realist but you think your intuitive judgments are data about what you value — or about what you would value if you had more time to think about your values and how to resolve the contradictions among them — then Beckstead’s arguments (if successful) have something to say about what you value, and about what you should do if you want to help achieve what you value.)) But the value produced by our future depends on our development trajectory. If humanity destroys itself with powerful technologies in the 21st century, then nearly all that future value is lost. And if we survive but develop along a trajectory dominated by conflict and poor decisions, then the future could be much less good than if our trajectory is dominated by altruism and wisdom. Moreover, some of our actions today can have “ripple effects” ((Karnofsky calls these “flow-through effects.”)) which determine the trajectory of human development, because many outcomes are path-dependent. Hence, actions which directly or indirectly precipitate particular trajectory changes (e.g. mitigating existential risks) can have vastly more value (in expectation) than actions with merely proximate benefits (e.g. saving the lives of 20 wild animals). Beckstead calls this the “rough future-shaping argument.”

If we accept the normative assumptions lurking behind this argument (e.g. risk neutrality; see Beckstead’s dissertation), then the far future is enormously valuable (if it goes at least as well on average as the past century), and existential risk reduction is much more important than producing proximate benefits (e.g. global health, poverty reduction) or speeding up development (which could in fact increase existential risks, and even if it doesn’t, has lower expected value than existential risk reduction).

However, Beckstead’s conclusion is not necessarily that existential risk reduction should be our global priority, because

there may be other ways to have a large, persistent effect on the far future without reducing existential risk… Some persistent changes in values and social norms could make the future [some fraction] better or worse… Sure, succeeding in preventing an existential catastrophe would be better than making a smaller trajectory change, but creating a small positive trajectory change may be significantly easier.

Instead, Beckstead’s arguments suggest that “what matters most for shaping the far future is producing positive trajectory changes and avoiding negative ones.” Existential risk reduction is one important kind of positive trajectory change that could turn out to be the intervention with the highest expected value.

One important clarification is in order. It could turn out to be that working toward proximate benefits or development acceleration does more good than “direct” efforts for trajectory change, if working toward proximate benefits or development acceleration turns out to have major ripple effects which produce important trajectory change. For example, perhaps an “ordinary altruistic effort” like solving India’s iodine deficiency problem would cause there to be thousands of “extra” world-class elite thinkers two generations from now, which could increase humanity’s chances of intelligently navigating the crucial 21st century and spreading to the stars. (I don’t think this is likely; I suggest it merely for illustration.)

For the sake of argument, suppose you agree with Beckstead’s core thesis that “what matters most (in expectation) is that we do what is best (in expectation) for the general trajectory along which our descendants develop.” Suppose you also think, as I do, that machine superintelligence is probably inevitable. ((See Bostrom (forthcoming) for an extended argument. Perhaps the most likely defeater for machine superintelligence is that global catastrophe may halt scientific progress before human-level AI is created.))

In that case, you might think that Friendly AI research is a uniquely foreseeable and impactful way to shape the far future in an enormously positive way, because “our effects on the far future must almost entirely pass through our effects on the development of machine superintelligence.” All other developing trends might be overridden by the overwhelming effectiveness of machine superintelligence — and specifically, by the values that were (explicitly or implicitly, directly or indirectly) written into the machine superintelligence(s).

If that’s right, our situation is a bit like sending an interstellar probe to colonize distant solar systems before they recede beyond the cosmological horizon and can thus never be reached from Earth again due to the expansion of the universe. Anything on Earth that doesn’t affect the content of the probe will have no impact on those solar systems. (See also this comment.)

Potential defeaters

The rough argument above — in favor of Friendly AI research as an efficient form of effective altruism — deserves to be “fleshed out” in more detail. ((Beckstead, in personal communication, suggested (but didn’t necessarily endorse) the following formalization of the rough argument sketched in the main text of the blog post: “(1) To a first approximation, the future of humanity is all that matters. (2) To a much greater extent than anything else, the future of humanity is highly sensitive to how machine intelligence unfolds. (3) Therefore, there is a very strong presumption in favor of working on any project which makes machine intelligence unfold in a better way. (4) FAI research is the most promising route to making machine intelligence unfold in a better way. (5) Therefore, there is a very strong presumption in favor of doing FAI research.” Beckstead (2013) examines the case for (1). Bostrom (forthcoming), in large part, examines the case for (2). Premise (3) informally follows from (1) and (2), and the conclusion (5) informally follows from (3) and (4). Premise (4) appears to me to be the most dubious part of the argument, and the least explored in the extant literature.))

Potential defeaters should also be examined:

Perhaps we ought to reject one or more of the normative assumptions behind Beckstead’s rough future-shaping argument.
Perhaps it’s not true that “our effects on the far future must almost entirely pass through our effects on the development of machine superintelligence.”
Perhaps Friendly AI research is not (today) a particularly efficient way to positively affect the development of machine superintelligence. Competing interventions may include: (1) AI risk strategy research, (2) improving technological forecasting, (3) improving science in general, (4) improving and spreading effective altruism and rationality, and (5) many others.

In future blog posts, members of the effective altruist community (including myself) will expand on the original argument and examine potential defeaters.

Acknowledgements

My thanks to those who provided feedback on this post: Carl Shulman, Nick Beckstead, Jonah Sinick, and Eliezer Yudkowsky.

Browse

Friendly AI Research as Effective Altruism

An argument for Friendly AI research as effective altruism

Potential defeaters

Acknowledgements

Categories