AGI Impact Experts and Friendly AI Experts

MIRI’s mission is “to ensure that the creation of smarter-than-human intelligence has a positive impact.” A central strategy for achieving this mission is to find and train what one might call “AGI impact experts” and “Friendly AI experts.”

AGI impact experts develop skills related to predicting technological development (e.g. building computational models of AI development or reasoning about intelligence explosion microeconomics), predicting AGI’s likely impact on society, and identifying which interventions are most likely to increase humanity’s chances of safely navigating the creation of AGI. For overviews, see Bostrom & Yudkowsky (2013); Muehlhauser & Salamon (2013).

Friendly AI experts develop skills useful for the development of mathematical architectures that can enable AGIs to be trustworthy (or “human-friendly”). This work is carried out at MIRI research workshops and in various publications, e.g. Christiano et al. (2013); Hibbard (2013). Note that the term “Friendly AI” was selected (in part) to avoid the suggestion that we understand the subject very well — a phrase like “Ethical AI” might sound like the kind of thing one can learn a lot about by looking it up in an encyclopedia, but our present understanding of trustworthy AI is too impoverished for that.

Now, what do we mean by “expert”?

Reliably superior performance on representative tasks

An expert is “a person with a high degree of skill in or knowledge of a certain subject.” Some domains (e.g. chess) provide objective measures of expertise, while other domains rely on peer recognition (e.g. philosophy). However, as Ericsson (2006) notes:

people recognized by their peers as experts do not always display superior performance on domain-related tasks. Sometimes they are no better than novices even on tasks that are central to the expertise, such as selecting stocks with superior future value, treatment of psychotherapy patients, and forecasts.

Thus, we should specify that the kind of expertise we want in AGI impact experts and Friendly AI experts is what Ericsson (2006) calls “Expertise as Reliably Superior Performance on Representative Tasks” (RSPRT). It won’t do humanity much good to have a bunch of peer-credentialed “AGI impact experts” who aren’t really any better than laypeople at predicting AGI outcomes, or a bunch of “Friendly AI experts” who aren’t much good at generating new FAI-relevant math results.

As an example of expertise as RSPRT, consider chess. Do chess ratings reliably track with RSPRT? Yes they do. For example, chess ratings are highly correlated with the ability to select the best move for presented chess positions (de Groot 1978; Ericsson & Lehmann 1996; Van der Maas & Wagenmakers 2005).

Similar methods have been used to confirm “expertise as RSPRT” in medicine (Ericsson 2004, 2007), sport (2007), sport (Côté et al. 2012), Scrabble (Tuffiash et al. 2007), and music (Lehmann & Grüber 2006).

So, what are some “representative tasks” for which AGI impact experts and FAI experts should demonstrate superior performance?

Scholastic expertise

At the very least, we’d hope AGI impact experts and Friendly AI experts would have a kind of scholastic expertise in AGI impact and Friendly AI. That is, they should know what the basic debates are about, which arguments and counter-arguments are often given, and who gives them. Generally, experts in everything from Zoroastrian theology to theoretical time travel at least have this kind of expertise.

For example, Nick Bostrom has researched AGI impact on and off for more than a decade, and has written extensively on the subject. Both in conversation and through his writings, Bostrom demonstrates pretty solid scholastic expertise in AGI impact.

AI researchers, in contrast, do not tend to be familiar with the basic debates, arguments, and counterarguments related to AGI impact. (Why would they be? That’s not their job.) Thus, it’s hard to see much value in, say, the projections about AGI impact from the AAAI Presidential Panel on Long-Term AI Futures, which included no participants with known scholastic expertise in AGI impact — and only one participant who is (barely) involved in the broader machine ethics community (Alan Mackworth).

But maybe we shouldn’t place any value on the opinions of those who do have scholastic expertise in the subject, either. Maybe those with scholastic expertise can’t reliably demonstrate superior performance on anything more practical than merely knowing which arguments and counterarguments are in play.

Ideally, AGI impact experts and FAI experts should do more than demonstrate scholastic expertise. What other examples of RSPRT expertise should be relevant to both AGI impact experts and FAI experts?

Sensitivity to evidence

In general, humans don’t accurately update their beliefs in response to medium-sized bits of evidence, like a perfectly rational agent would. That’s why we need science, where our method is to “amass such an enormous mountain of evidence* that… scientists cannot ignore it.”

But there usually aren’t “mountains” of evidence available when testing hypotheses about the design of future technologies and their likely impact. As explained elsewhere: “The less evidence you have, or the harder it is to interpret, the more rationality you need to get the right answer. (As likelihood ratios get smaller, your priors need to be better and your updates more accurate.)”

Can human rationality be improved? Based on a couple decades of “debiasing” research (Larrick 2004), my guess is that we probably can, but we haven’t tried very hard yet.

Why think there is low-hanging fruit in the field of rationality training? Very few people, if any, put as much effort into improving their rationality as our best musicians and athletes put into improving their musical and athletic abilities. The best musicians practice 4 hours per day over many years (Ericsson et al. 1993); champion swimmer Michael Phelps spent 3-6 hours per day in the pool; Sun Microsystems co-founder Bill Joy practiced programming 10 hours per day in college (Gladwell 2008, p. 46); and during one period, chess champion Bobby Fischer reportedly practiced chess 14 hours a day. But who spends 4-10 hours per day doing calibration training or building up good rationality habits?

Ideally, both AGI impact experts and Friendly AI experts would train good rationality habits so as to increase their sensitivity to evidence, so that they can reason productively about future technologies without first needing to amass (unavailable) “mountains of evidence.”

What might an FAI expert look like?

Next, let’s look at the specific skills needed for FAI expertise in particular. Clearly, such experts must be able to generate new results in math. And luckily, math research skill is more easily measurable and “objective” than, say, psychology or philosophy research skill.

What other kinds of expertise might we want in an FAI expert?

Yudkowsky described an FAI expert like this:

A Friendly AI [expert] is somebody who specializes in seeing the correspondence of mathematical structures to What Happens in the Real World. It’s somebody who looks at Hutter’s specification of AIXI and reads the actual equations… and sees, “Oh, this AI will try to gain control of its reward channel,” as well as numerous subtler issues like, “This AI presumes a Cartesian boundary separating itself from the environment; it may drop an anvil on its own head.” Similarly, working on TDT means e.g. looking at a mathematical specification of decision theory, and seeing “Oh, this is vulnerable to blackmail” and coming up with a mathematical counter-specification of an AI that isn’t so vulnerable to blackmail.

…If you want to have a sensible discussion about which AI designs are safer, there are specialized skills you can apply to that discussion, [such as the skill described above,] as built up over years of study and practice by someone who specializes in answering that sort of question.

Let me give some examples of people who, as Yudkowsky put it, “specialize in seeing the correspondence of mathematical structures to What Happens in the Real World.” (In particular, we’re interested in the consequences of mathematical objects with a kind of “general intelligence,” not so much the real world consequences of narrow-domain algorithms like Stuxnet.) To the extent that AGI behavior can be modeled with mathematics, this is a crucial skill.

Yudkowsky read Hutter’s specification of AIXI and saw “Oh, this AI will try to gain control of its reward channel” and “the AI presumes a Cartesian boundary separating itself from the environment; it may drop an anvil on its own head,” but he didn’t write down technical demonstrations of these facts.

Laurent Orseau (AgroParisTech) and Mark Ring (IDSIA) independently demonstrated those problems (AIXI-like agents hacking their own reward channels, and the challenge of the Cartesian boundary) in Ring & Orseau (2011) and Orseau & Ring (2011). They also worked toward formalizing the latter problem in Orseau & Ring (2012), as did Bill Hibbard (University of Wisconsin) in Hibbard (2012).

Examples of this kind of work from MIRI’s research fellows or research associates include Dewey (2011), de Blanc (2011), and Yudkowsky (2010).

This skill may be difficult to measure objectively, but that is true of many of the skills that university administrators (or recruiters for hedge funds and technology companies) try to identify in mathematical researchers. And yet these groups have much success in locating the best and brightest. So perhaps there is some hope for identifying people with this skill.

There are other tasks on which FAI experts should demonstrate “reliably superior performance.” For example, they must be able to formalize philosophical concepts. Here again there is no standard measure for the skill, but we have many past examples from which to learn. The last century was a pretty productive one for turning previously mysterious philosophical concepts into formal ones. See Kolmogorov (1965) on complexity and simplicity, Solomonoff (1964a, 1964b) on induction, Von Neumann and Morgenstern (1947) on rationality, Shannon (1948) on information, and Tennenholtz’s development of “program equilibrium” (for an overview, see Wooldridge 2012).

Readers interested in developing Friendly AI expertise should consider taking the courses (or reading the textbooks) listed in Course Recommendations for MIRI Researchers.

What might an AGI impact expert look like?

To begin, AGI impact experts should demonstrate reliably superior performance at forecasting technological progress, especially AI progress.

Unfortunately, we haven’t yet discovered reliable methods for successful long-term technological forecasting (Muehlhauser & Salamon 2012), and both experts and laypeople are particularly bad at predicting AI (Armstrong & Sotala 2012). The price-performance formulation of Moore’s Law has been surprisingly robust across time, but one cannot predict specific technologies from this trend without making additional assumptions that are (in most cases) less robust than Moore’s Law. Famous tech forecaster Ray Kurzweil claims good accuracy, but these claims are probably overstated.

None of this should be surprising: good forecasting performance seems to depend (among other things) on regular feedback on one’s predictions, and quick feedback isn’t available when making long-term forecasts.

Luckily, there are many opportunities for forecasters to improve their performance. Horowitz & Tetlock (2012), based on their own empirical research and prediction training, offer some advice on the subject:

Explicit quantification: “The best way to become a better-calibrated appraiser of long-term futures is to get in the habit of making quantitative probability estimates that can be objectively scored for accuracy over long stretches of time. Explicit quantification enables explicit accuracy feedback, which enables learning.”
Signposting the future: Thinking through specific scenarios can be useful if those scenarios “come with clear diagnostic signposts that policymakers can use to gauge whether they are moving toward or away from one scenario or another… Falsifiable hypotheses bring high-flying scenario abstractions back to Earth.”
Leveraging aggregation: “the average forecast is often more accurate than the vast majority of the individual forecasts that went into computing the average…. [Forecasters] should also get into the habit that some of the better forecasters in [an IARPA forecasting tournament called ACE] have gotten into: comparing their predictions to group averages, weighted-averaging algorithms, prediction markets, and financial markets.

Armstrong & Sotala (2012) add that it can be helpful to decompose the phenomena into many parts and make predictions about each of the parts, as feedback may be available for at least some of the parts. This is the approach to AI prediction taken by The Uncertain Future (Rayhawk et al. 2009).

Armstrong & Sotala also make a distinction between “grind” — lots of hard work and money — and “insight” — entirely new unexpected ideas. Grind is moderately easy to predict, while insight is difficult to forecast. Grind predictions could be more reliable to the extent that a phenomenon is mostly about grind, and doesn’t require new conceptual breakthroughs. For example, while AI appears to be an “insight” technology, whole brain emulation may be largely a “grind” technology, and therefore more easily predicted.

There is much more to say about the skills needed for AGI impact expertise, but for now I leave the reader with the examples above, and also the following passage from Bostrom (1997):

[Recently] a starring role has developed on the intellectual stage for which the actor is still wanting. This is the role of the generalised scientist, or the polymath, who has insights into many areas of science and the ability to use these insights to work out solutions to those more complicated problems which are usually considered too difficult for scientists and are therefore either consigned to politicians and the popular press, or just ignored. The sad thing is that ignoring these problems won’t make them go away, and… some of them are challenges to the very survival of intelligent life.

[One such problem is] superintelligence… [which] takes on practical urgency when many experts think that we will soon have the ability to create superintelligence.

What questions could [this discipline] deal with? Well, questions like: How much would the predictive power for various fields increase if we increase the processing speed of a human-like mind a million times? If we extend the short-term or long-term memory? If we increase the neural population and the connection density? What other capacities would a superintelligence have? …Can we know anything about the motivation of a superintelligence? Would it be feasible to preprogram it to be good or philanthropic, or would such rules be hard to reconcile with the flexibility of its cognitive processes? Would a superintelligence, given the desire to do so, be able to outwit humans into promoting its own aims even if we had originally taken strict precautions to avoid being manipulated? Could one use one superintelligence to control another? …How would our human self-perception and aspirations change if were forced to abdicate the throne of wisdom…? How would we individuate between superminds if they could communicate and fuse and subdivide with enormous speed? Will a notion of personal identity still apply to such interconnected minds? …Could we then be able to compete with the superintelligences, if we were accelerated and augmented with extra memory etc., or would such profound reorganisation be necessary that we would no longer feel we were humans? Would that matter?

Maybe these are not the right questions to ask, but they are at least a start.

Concluding thoughts

MIRI exists entirely to host such experts and enable their research, and FHI and CSER share that focus among others. All three organizations are funding-limited, but to some degree they are also person-limited, because there are so few people in the world actively developing AGI impact expertise or Friendly AI expertise. The goal of this post is to light the path for those who may want to contribute to this important research program.

Notes

My thanks to Carl Shulman, Kaj Sotala, Eliezer Yudkowsky, Louie Helm, and Benjamin Noble for their helpful comments.

Browse