Four Articles Added to Research Page

 |   |  Papers

Four older articles have been added to our research page.

The first is the early draft of Christiano et al.’s “Definability of ‘Truth’ in Probabilistic Logic” previously discussed here and here. The draft was last updated on April 2, 2013.

The second paper is a cleaned-up version of an article originally published in December 2012 by Luke Muehlhauser and Chris Williamson to Less Wrong: “in December 2012 by Luke Muehlhauser and Chris Williamson to Less Wrong: “Ideal Advisor Theories and Personal CEV.”

The third and fourth papers were originally published by Bill Hibbard in the AGI 2012 Conference Proceedings: “AGI 2012 Conference Proceedings: “Avoiding Unintended AI Behaviors” and “Decision Support for Safe AI Design.” Hibbard wrote these articles before he became a MIRI research associate, but he gave us permission to include them on our research page because (1) he became a MIRI research associate during the AGI-12 conference at which the articles were published, (2) the articles were partly inspired by a public dialogue with Luke Muehlhauser, and (3) the articles build on MIRI’s paper “public dialogue with Luke Muehlhauser, and (3) the articles build on MIRI’s paper “Intelligence Explosion and Machine Ethics.”

As mentioned in our December 2012 newsletter, “Avoiding Unintended AI Behaviors” was awarded MIRI’s $1000 Turing Prize for Best AGI Safety Paper. The prize was awarded in honor of Alan Turing, who not only discovered some of the key ideas of machine intelligence, but also grasped its importance, writing that “…it seems probable that once [human-level machine thinking] has started, it would not take long to outstrip our feeble powers… At some stage therefore we should have to expect the machines to take control…”

When Will AI Be Created?

 |   |  Analysis

Strong AI appears to be the topic of the week. Kevin Drum at Mother Jones thinks AIs will be as smart as humans by 2040. Karl Smith at Forbes and “M.S.” at The Economist seem to roughly concur with Drum on this timeline. Moshe Vardi, the editor-in-chief of the world’s most-read computer science magazine, predicts that “by 2045 machines will be able to do if not any work that humans can do, then a very significant fraction of the work that humans can do.”

But predicting AI is more difficult than many people think.

To explore these difficulties, let’s start with a 2009 conversation between MIRI researcher Eliezer Yudkowsky and MIT computer scientist Scott Aaronson, author of the excellent Quantum Computing Since Democritus. Early in that dialogue, Yudkowsky asked:

It seems pretty obvious to me that at some point in [one to ten decades] we’re going to build an AI smart enough to improve itself, and [it will] “foom” upward in intelligence, and by the time it exhausts available avenues for improvement it will be a “superintelligence” [relative] to us. Do you feel this is obvious?

Aaronson replied:

The idea that we could build computers that are smarter than us… and that those computers could build still smarter computers… until we reach the physical limits of what kind of intelligence is possible… that we could build things that are to us as we are to ants — all of this is compatible with the laws of physics… and I can’t find a reason of principle that it couldn’t eventually come to pass…

The main thing we disagree about is the time scale… a few thousand years [before AI] seems more reasonable to me.

Those two estimates — several decades vs. “a few thousand years” — have wildly different policy implications.

If there’s a good chance that AI will replace humans at the steering wheel of history in the next several decades, then we’d better put our gloves on and get to work making sure that this event has a positive rather than negative impact. But if we can be pretty confident that AI is thousands of years away, then we needn’t worry about AI for now, and we should focus on other global priorities. Thus it appears that “When will AI be created?” is a question with high value of information for our species.

Let’s take a moment to review the forecasting work that has been done, and see what conclusions we might draw about when AI will likely be created.

Read more »

Advise MIRI with Your Domain-Specific Expertise

 |   |  News

MIRI currently has a few dozen volunteer advisors on a wide range of subjects, but we need more! If you’d like to help MIRI pursue its mission more efficiently, please sign up to be a MIRI advisor.

If you sign up, we will occasionally ask you questions, or send you early drafts of upcoming writings for feedback.

We don’t always want technical advice (“Well, you can do that with a relativized arithmetical hierarchy…”); often, we just want to understand how different groups of experts respond to our writing (“The tone of this paragraph rubs me the wrong way because…”).

At the moment, we are most in need of advisors on the following subjects:

Even if you don’t have much time to help, please sign up! We will of course respect your own limits on availability.

Five theses, two lemmas, and a couple of strategic implications

 |   |  Analysis

MIRI’s primary concern about self-improving AI isn’t so much that it might be created by ‘bad’ actors rather than ‘good’ actors in the global sphere; rather most of our concern is in remedying the situation in which no one knows at all how to create a self-modifying AI with known, stable preferences.  (This is why we see the main problem in terms of doing research and encouraging others to perform relevant research, rather than trying to stop ‘bad’ actors from creating AI.)

This, and a number of other basic strategic views, can be summed up as a consequence of 5 theses about purely factual questions about AI, and 2 lemmas we think are implied by them, as follows:

Intelligence explosion thesis. A sufficiently smart AI will be able to realize large, reinvestable cognitive returns from things it can do on a short timescale, like improving its own cognitive algorithms or purchasing/stealing lots of server time. The intelligence explosion will hit very high levels of intelligence before it runs out of things it can do on a short timescale. See: Chalmers (2010); Muehlhauser & Salamon (2013); Yudkowsky (2013).

Orthogonality thesis. Mind design space is huge enough to contain agents with almost any set of preferences, and such agents can be instrumentally rational about achieving those preferences, and have great computational power. For example, mind design space theoretically contains powerful, instrumentally rational agents which act as expected paperclip maximizers and always consequentialistically choose the option which leads to the greatest number of expected paperclips. See: Bostrom (2012)Armstrong (2013).

Convergent instrumental goals thesis. Most utility functions will generate a subset of instrumental goals which follow from most possible final goals. For example, if you want to build a galaxy full of happy sentient beings, you will need matter and energy, and the same is also true if you want to make paperclips. This thesis is why we’re worried about very powerful entities even if they have no explicit dislike of us: “The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else.” Note though that by the Orthogonality Thesis you can always have an agent which explicitly, terminally prefers not to do any particular thing — an AI which does love you will not want to break you apart for spare atoms. See: Omohundro (2008); Bostrom (2012).

Complexity of value thesis. It takes a large chunk of Kolmogorov complexity to describe even idealized human preferences. That is, what we ‘should’ do  is a computationally complex mathematical object even after we take the limit of reflective equilibrium (judging your own thought processes) and other standard normative theories. A superintelligence with a randomly generated utility function would not do anything we see as worthwhile with the galaxy, because it is unlikely to accidentally hit on final preferences for having a diverse civilization of sentient beings leading interesting lives. See: Yudkowsky (2011); Muehlhauser & Helm (2013).

Fragility of value thesis. Getting a goal system 90% right does not give you 90% of the value, any more than correctly dialing 9 out of 10 digits of my phone number will connect you to somebody who’s 90% similar to Eliezer Yudkowsky. There are multiple dimensions for which eliminating that dimension of value would eliminate almost all value from the future. For example an alien species which shared almost all of human value except that their parameter setting for “boredom” was much lower, might devote most of their computational power to replaying a single peak, optimal experience over and over again with slightly different pixel colors (or the equivalent thereof). Friendly AI is more like a satisficing threshold than something where we’re trying to eke out successive 10% improvements. See: Yudkowsky (2009, 2011).

These five theses seem to imply two important lemmas:

Indirect normativity. Programming a self-improving machine intelligence to implement a grab-bag of things-that-seem-like-good-ideas will lead to a bad outcome, regardless of how good the apple pie and motherhood sounded. E.g., if you give the AI a final goal to “make people happy” it’ll just turn people’s pleasure centers up to maximum. “Indirectly normative” is Bostrom’s term for an AI that calculates the ‘right’ thing to do via, e.g., looking at human beings and modeling their decision processes and idealizing those decision processes (e.g. what you would-want if you knew everything the AI knew and understood your own decision processes, reflective equilibria, ideal advisior theories, and so on), rather than being told a direct set of ‘good ideas’ by the programmers. Indirect normativity is how you deal with Complexity and Fragility. If you can succeed at indirect normativity, then small variances in essentially good intentions may not matter much — that is, if two different projects do indirect normativity correctly, but one project has 20% nicer and kinder researchers, we could still hope that the end results would be of around equal expected value. See: Muehlhauser & Helm (2013).

Large bounded extra difficulty of Friendliness. You can build a Friendly AI (by the Orthogonality Thesis), but you need a lot of work and cleverness to get the goal system right. Probably more importantly, the rest of the AI needs to meet a higher standard of cleanness in order for the goal system to remain invariant through a billion sequential self-modifications. Any sufficiently smart AI to do clean self-modification will tend to do so regardless, but the problem is that intelligence explosion might get started with AIs substantially less smart than that — for example, with AIs that rewrite themselves using genetic algorithms or other such means that don’t preserve a set of consequentialist preferences. In this case, building a Friendly AI could mean that our AI has to be smarter about self-modification than the minimal AI that could undergo an intelligence explosion. See: Yudkowsky (2008) and Yudkowsky (2013).

These lemmas in turn have two major strategic implications:

  1. We have a lot of work to do on things like indirect normativity and stable self-improvement. At this stage a lot of this work looks really foundational — that is, we can’t describe how to do these things using infinite computing power, let alone finite computing power.  We should get started on this work as early as possible, since basic research often takes a lot of time.
  2. There needs to be a Friendly AI project that has some sort of boost over competing projects which don’t live up to a (very) high standard of Friendly AI work — a project which can successfully build a stable-goal-system self-improving AI, before a less-well-funded project hacks together a much sloppier self-improving AI.  Giant supercomputers may be less important to this than being able to bring together the smartest researchers (see the open question posed in Yudkowsky 2013) but the required advantage cannot be left up to chance.  Leaving things to default means that projects less careful about self-modification would have an advantage greater than casual altruism is likely to overcome.

AGI Impact Experts and Friendly AI Experts

 |   |  Analysis

MIRI’s mission is “to ensure that the creation of smarter-than-human intelligence has a positive impact.” A central strategy for achieving this mission is to find and train what one might call “AGI impact experts” and “Friendly AI experts.”

AGI impact experts develop skills related to predicting technological development (e.g. building computational models of AI development or reasoning about intelligence explosion microeconomics), predicting AGI’s likely impact on society, and identifying which interventions are most likely to increase humanity’s chances of safely navigating the creation of AGI. For overviews, see Bostrom & Yudkowsky (2013); Muehlhauser & Salamon (2013).

Friendly AI experts develop skills useful for the development of mathematical architectures that can enable AGIs to be trustworthy (or “human-friendly”). This work is carried out at MIRI research workshops and in various publications, e.g. Christiano et al. (2013); Hibbard (2013). Note that the term “Friendly AI” was selected (in part) to avoid the suggestion that we understand the subject very well — a phrase like “Ethical AI” might sound like the kind of thing one can learn a lot about by looking it up in an encyclopedia, but our present understanding of trustworthy AI is too impoverished for that.

Now, what do we mean by “expert”?


Read more »

“Intelligence Explosion Microeconomics” Released

 |   |  Papers

MIRI’s new, 93-page technical report by Eliezer Yudkowsky, “Intelligence Explosion Microeconomics,” has now been released. The report explains one of the open problems of our research program. Here’s the abstract:

I. J. Good’s thesis of the ‘intelligence explosion’ is that a sufficiently advanced machine intelligence could build a smarter version of itself, which could in turn build an even smarter version of itself, and that this process could continue enough to vastly exceed human intelligence. As Sandberg (2010) correctly notes, there are several attempts to lay down return-on-investment formulas intended to represent sharp speedups in economic or technological growth, but very little attempt has been made to deal formally with I. J. Good’s intelligence explosion thesis as such.

I identify the key issue as returns on cognitive reinvestment – the ability to invest more computing power, faster computers, or improved cognitive algorithms to yield cognitive labor which produces larger brains, faster brains, or better mind designs. There are many phenomena in the world which have been argued as evidentially relevant to this question, from the observed course of hominid evolution, to Moore’s Law, to the competence over time of machine chess-playing systems, and many more. I go into some depth on the sort of debates which then arise on how to interpret such evidence. I propose that the next step forward in analyzing positions on the intelligence explosion would be to formalize return-on-investment curves, so that each stance can say formally which possible microfoundations they hold to be falsified by historical observations already made. More generally, I pose multiple open questions of ‘returns on cognitive reinvestment’ or ‘intelligence explosion microeconomics’. Although such questions have received little attention thus far, they seem highly relevant to policy choices affecting the outcomes for Earth-originating intelligent life.

The preferred place for public discussion of this research is here. There is also a private mailing list for technical discussants, which you can apply to join here.

“Singularity Hypotheses” Published

 |   |  Papers

singularity hypothesesSingularity Hypotheses: A Scientific and Philosophical Assessment has now been published by Springer, in hardcover and ebook forms.

The book contains 20 chapters about the prospect of machine superintelligence, including 4 chapters by MIRI researchers and research associates.

“Intelligence Explosion: Evidence and Import” (pdf) by Luke Muehlhauser and (previous MIRI researcher) Anna Salamon reviews

the evidence for and against three claims: that (1) there is a substantial chance we will create human-level AI before 2100, that (2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an “intelligence explosion,” and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it. We conclude with recommendations for increasing the odds of a controlled intelligence explosion relative to an uncontrolled intelligence explosion.

“Intelligence Explosion and Machine Ethics” (pdf) by Luke Muehlhauser and Louie Helm discusses the challenges of formal value systems for use in AI:

Many researchers have argued that a self-improving artificial intelligence (AI) could become so vastly more powerful than humans that we would not be able to stop it from achieving its goals. If so, and if the AI’s goals differ from ours, then this could be disastrous for humans. One proposed solution is to program the AI’s goal system to want what we want before the AI self-improves beyond our capacity to control it. Unfortunately, it is difficult to specify what we want. After clarifying what we mean by “intelligence,” we offer a series of “intuition pumps” from the field of moral philosophy for our conclusion that human values are complex and difficult to specify. We then survey the evidence from the psychology of motivation, moral psychology, and neuroeconomics that supports our position. We conclude by recommending ideal preference theories of value as a promising approach for developing a machine ethics suitable for navigating an intelligence explosion or “technological singularity.”

“Friendly Artificial Intelligence” by Eliezer Yudkowsky is a shortened version of Yudkowsky (2008).

Finally, “Artificial General Intelligence and the Human Mental Model” (pdf) by Roman Yampolskiy and (MIRI research associate) Joshua Fox  reviews the dangers of anthropomorphizing machine intelligences:

When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.

The book also includes brief, critical responses to most chapters, including responses written by Eliezer Yudkowsky and (previous MIRI staffer) Michael Anissimov.

Altair’s Timeless Decision Theory Paper Published

 |   |  Papers

Altair paper frontDuring his time as a research fellow for MIRI, Alex Altair wrote a paper on Timeless Decision Theory (TDT) that has now been published:  “A Comparison of Decision Algorithms on Newcomblike Problems.”

Altair’s paper is both more succinct and also more precise in its formulation of TDT than Yudkowsky’s earlier paper “Timeless Decision Theory.” Thus, Altair’s paper should serve as a handy introduction to TDT for philosophers, computer scientists, and mathematicians, while Yudkowsky’s paper remains required reading for anyone interested to develop TDT further, for it covers more ground than Altair’s paper.

Altair’s abstract reads:

When formulated using Bayesian networks, two standard decision algorithms (Evidential Decision Theory and Causal Decision Theory) can be shown to fail systematically when faced with aspects of the prisoner’s dilemma and so-called “Newcomblike” problems. We describe a new form of decision algorithm, called Timeless Decision Theory, which consistently wins on these problems.

We may submit to a journal later, but we’ve published the current version to our website so that readers won’t need to wait two years (from submission to acceptance to publication) to read it.

For a gentle introduction to the entire field of normative decision theory (including TDT), see Muehlhauser and Williamson’s Decision Theory FAQ.