# AI and Effective Altruism

|   |  Analysis

MIRI is a research nonprofit specializing in a poorly-explored set of problems in theoretical computer science. GiveDirectly is a cash transfer service that gives money to poor households in East Africa. What kind of conference would bring together representatives from such disparate organizations — alongside policy analysts, philanthropists, philosophers, and many more?

Effective Altruism Global, which is beginning its Oxford session in a few hours, is that kind of conference. Effective altruism (EA) is a diverse community of do-gooders with a common interest in bringing the tools of science to bear on the world’s biggest problems. EA organizations like GiveDirectly, the Centre for Effective Altruism, and the charity evaluator GiveWell have made a big splash by calling for new standards of transparency and humanitarian impact in the nonprofit sector.

What is MIRI’s connection to effective altruism? In what sense is safety research in artificial intelligence “altruism,” and why do we assign a high probability to this being a critically important area of computer science in the coming decades? I’ll give quick answers to each of those questions below.

#### MIRI and effective altruism

Why is MIRI associated with EA? In large part because effective altruists and MIRI use the same kind of criteria in deciding what work to prioritize.

MIRI’s mission, to develop the formal tools needed to make smarter-than-human AI systems useful and safe, comes from our big-picture view that scientific and technological advances will be among the largest determiners of human welfare, as they have been historically. Automating intellectual labor is therefore likely to be a uniquely high-impact line of research — both for good and for ill. (See Four Background Claims.) Which open problems we work on then falls out of our efforts to identify tractable and neglected theoretical prerequisites for aligning the goals of AI systems with our values. (See MIRI’s Approach.)

Daniel Dewey, Nick Bostrom, Elon Musk, Nate Soares, and Stuart Russell
discuss AI risk at the EA Global conference. Photo by Robbie Shade.

MIRI is far from the only group that uses criteria like these to identify important cause areas and interventions, and these groups have found that banding together is a useful way to have an even larger impact. Because members of these groups aren’t permanently wedded to a single cause area, and because we assign a lot of value to our common outlook in its own right, we can readily share resources and work together to promote the many exciting ideas that are springing out from this outlook. Hence the effective altruist community.

One example of this useful exchange was MIRI’s previous Executive Director, Luke Muehlhauser, leaving MIRI in June to investigate nutrition science and other areas for potential philanthropic opportunities under the Open Philanthropy Project, an offshoot of GiveWell.1 In turn, OpenPhil has helped fund a large AI grants program that MIRI participated in.

GiveWell/OpenPhil staff have given us extremely useful critical feedback in the past, and we’ve had a number of conversations with them over the years (1, 2, 3, 4, 5). Although they work on a much broader range of topics than MIRI does and they don’t share all of our views, their interest in finding interventions that are “important, tractable and relatively uncrowded” has led them to pick out AI as an important area to investigate for reasons that overlap with MIRI’s. (See OpenPhil’s March update on global catastrophic risk and their newly released overview document on potential risks from advanced artificial intelligence.)

Most EAs work on areas other than AI risk, and MIRI’s approach is far from the only plausible way to have an outsized impact on human welfare. Because we attempt to base our decisions on broadly EA considerations, however — and therefore end up promoting EA-like philosophical commitments when we explain the reasoning behind our research approach — we’ve ended up forming strong ties to many other people with an interest in identifying high-impact humanitarian interventions.

#### High-stakes and high-probability risks

A surprisingly common misconception about EA cause areas is that they break down into three groups: high-probability crises afflicting the global poor; medium-probability crises afflicting non-human animals; and low-probability global catastrophes. The assumption (for example, in Dylan Matthews’ recent Vox article) is that this is the argument for working on AI safety or biosecurity: there’s a very small chance of disaster occurring, but disaster would be so terrible if it did occur that it’s worth investigating just in case.

This misunderstands MIRI’s position — and, I believe, the position of people interested in technological risk at the Future of Humanity Institute and a number of other organizations. We believe that existential risk from misaligned autonomous AI systems is high-probability if we do nothing to avert it, and we base our case for MIRI on that view; if we thought that the risks from AI were very unlikely to arise, we would deprioritize AI alignment research in favor of other urgent research projects.

As a result, we expect EAs who strongly disagree with us about the likely future trajectory of the field of AI to work on areas other than AI risk. We don’t think EAs should donate to MIRI “just in case,” and we reject arguments based on “Pascal’s Mugging.” (“Pascal’s Mugging” is the name MIRI researchers coined for decision-making that mistakenly focuses on infinitesimally small probabilities of superexponentially vast benefits.)2

As Stuart Russell writes, “Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research – the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius.” Thousands of person-hours are pouring into research to increase the general capabilities of AI systems, with the aim of building systems that can outperform humans in arbitrary cognitive tasks.

We don’t know when such efforts will succeed, but we expect them to succeed eventually — possibly in the next few decades, and quite plausibly during this century. Shoring up safety guarantees for autonomous AI systems would allow us to reap many more of the benefits from advances in AI while significantly reducing the probability of a global disaster over the long term.

MIRI’s mission of making smarter-than-human AI technology reliably beneficial is ambitious, but it’s ambitious in the fashion of goals like “prevent global warming” or “abolish factory farming.” Working toward such goals usually means making incremental progress that other actors can build on — more like setting aside $x of each month’s paycheck for a child’s college fund than like buying a series of once-off$x lottery tickets.

A particular \$100 is unlikely to make a large once-off impact on your child’s career prospects, but it can still be a wise investment. No single charity working against global warming is going to solve the entire problem, but that doesn’t make charitable donations useless. Although MIRI is a small organization, our work represents early progress toward more robust, transparent, and beneficial AI systems, which can then be built on by other groups and integrated into AI system design.3

Rather than saying that AI-mediated catastrophes are high-probability and stopping there, though, I would say that such catastrophes are high-probability conditional on AI research continuing on its current trajectory. Disaster isn’t necessarily high-probability if the field of AI shifts to include alignment work along with capabilities work among its key focuses.

It’s because we consider AI disasters neither unlikely nor unavoidable that we think technical work in this area is important. From the perspective of aspiring effective altruists, the most essential risks to work on will be ones that are highly likely to occur in the near future if we do nothing, but substantially less likely to occur if we work on the problem and get existing research communities and scientific institutions involved.

Principles like these apply outside the domain of AI, and although MIRI is currently the only organization specializing in long-term technical research on AI alignment, we’re one of a large and growing number of organizations that attempt to put these underlying EA principles into practice in one fashion or another. And to that extent, although effective altruists disagree about the best way to improve the world, we ultimately find ourselves on the same team.

1. Although effective altruism is sometimes divided into separate far-future, animal welfare, global poverty, and “meta” cause areas, this has always been a somewhat artificial division. Toby Ord, the founder of the poverty relief organization Giving What We Can, is one of the leading scholars studying existential risk and holds a position at the Future of Humanity Institute. David Pearce, one of the strongest proponents of animal activism within EA, is best known for his futurism. Peter Singer is famous for his early promotion of global poverty causes as well as his promotion of animal welfare. And Anna Salamon, the Executive Director of the “meta”-focused Center for Applied Rationality, is a former MIRI researcher.
2. Quoting MIRI senior researcher Eliezer Yudkowsky in 2013:

I abjure, refute, and disclaim all forms of Pascalian reasoning and multiplying tiny probabilities by large impacts when it comes to existential risk. We live on a planet with upcoming prospects of, among other things, human intelligence enhancement, molecular nanotechnology, sufficiently advanced biotechnology, brain-computer interfaces, and of course Artificial Intelligence in several guises. If something has only a tiny chance of impacting the fate of the world, there should be something with a larger probability of an equally huge impact to worry about instead. […]

To clarify, “Don’t multiply tiny probabilities by large impacts” is something that I apply to large-scale projects and lines of historical probability. On a very large scale, if you think FAI [Friendly AI] stands a serious chance of saving the world, then humanity should dump a bunch of effort into it, and if nobody’s dumping effort into it then you should dump more effort than currently into it. On a smaller scale, to compare two x-risk mitigation projects in demand of money, you need to estimate something about marginal impacts of the next added effort (where the common currency of utilons should probably not be lives saved, but “probability of an ok outcome”, i.e., the probability of ending up with a happy intergalactic civilization). In this case the average marginal added dollar can only account for a very tiny slice of probability, but this is not Pascal’s Wager. Large efforts with a success-or-failure criterion are rightly, justly, and unavoidably going to end up with small marginally increased probabilities of success per added small unit of effort. It would only be Pascal’s Wager if the whole route-to-an-OK-outcome were assigned a tiny probability, and then a large payoff used to shut down further discussion of whether the next unit of effort should go there or to a different x-risk.

3. Nick Bostrom made a similar point at EA Global: that AI is an important cause even though any one individual’s actions are unlikely to make a decisive difference. In a panel on artificial superintelligence, Bostrom said that he thought people had a “low” (as opposed to “high” or “medium”) probability of making a difference on AI risk, which Matthews and a number of others appear to have taken to mean that Bostrom thinks AI is a speculative cause area. When I asked Bostrom about his intended meaning myself, however, he elaborated:

The point I was making in the EA global comment was the probability that you (for any ‘you’ in the audience) will save the world from an AI catastrophe is very small, not that the probability of AI catastrophe is very small. Thus working on AI risk is similar to volunteering for a presidential election campaign.

Did you like this post? You may enjoy our other Analysis posts, including:

• Sarah Markham

The optimisation of utility functions is analogous to situational ethics in which relevant markers of good/bad effect are identified and maximised/minimised. It might be useful to consider adaptive utility functions with environment specific parameters together with a capacity for experiential learning, for example after event analysis of the relationship between parameters and outcomes given actual outcome(s).