2014 in review

 |   |  MIRI Strategy

It’s time for my review of MIRI in 2014.1 A post about our next strategic plan will follow in the next couple months, and I’ve included some details about ongoing projects at the end of this review.


2014 Summary

Since early 2013, MIRI’s core goal has been to help create a new field of research devoted to the technical challenges of getting good outcomes from future AI agents with highly general capabilities, including the capability to recursively self-improve.2

Launching a new field has been a team effort. In 2013, MIRI decided to focus on its comparative advantage in defining open problems and making technical progress on them. We’ve been fortunate to coordinate with other actors in this space — FHI, CSER, FLI, and others — who have leveraged their comparative advantages in conducting public outreach, building coalitions, pitching the field to grantmakers, interfacing with policymakers, and more.3

MIRI began 2014 with several open problems identified, and with some progress made toward solving them, but with very few people available to do the work. Hence, most of our research program effort in 2014 was aimed at attracting new researchers to the field and making it easier for them to learn the material and contribute. This was the primary motivation for our new technical agenda overview, the MIRIx program, our new research guide, and more (see below). Nick Bostrom’s Superintelligence was also quite helpful for explaining why this field of research should exist in the first place.

Today the field is much larger and healthier than it was at the beginning of 2014. MIRI now has four full-time technical researchers instead of just one. Around 85 people have attended one or more MIRIx workshops. There are so many promising researchers who have expressed interest in our technical research that ~25 of them have already confirmed interest and availability to attend a MIRI introductory workshop this summer, and this mostly doesn’t include people who have attended past MIRI workshops, nor have we sent out all the invites yet. Moreover, there are now several researchers we know who are plausible MIRI hires in the next 1-2 years.

I am extremely grateful to MIRI’s donors, without whom this progress would have been impossible.

The rest of this post provides a more detailed summary of our activities in 2014.

Read more »

  1. This year’s annual review is shorter than last year’s 5-part review of 2013, in part because 2013 was an unusually complicated focus-shifting year, and in part because, in retrospect, last year’s 5-part review simply took more effort to produce than it was worth. Also, because we recently finished switching to accrual accounting, I can now more easily provide annual reviews of each calendar year rather than of a March-through-February period. As such, this review of calendar year 2014 will overlap a bit with what was reported in the previous annual review (of March 2013 through February 2014). 
  2. Clearly there are forecasting and political challenges as well, and there are technical challenges related to ensuring good outcomes from nearer-term AI systems, but MIRI has chosen to specialize in the technical challenges of aligning superintelligence with human interests. See also: Friendly AI research as effective altruism and Why MIRI? 
  3. Obviously, the division of labor was more complex than I’ve described here. For example, FHI produced some technical research progress in 2014, and MIRI did some public outreach. 

New report: “An Introduction to Löb’s Theorem in MIRI Research”

 |   |  News

Lob in MIRI ResearchToday we publicly release a new technical report by Patrick LaVictoire, titled “An Introduction to Löb’s Theorem in MIRI Research.” The report’s introduction begins:

This expository note is devoted to answering the following question: why do many MIRI research papers cite a 1955 theorem of Martin Löb, and indeed, why does MIRI focus so heavily on mathematical logic? The short answer is that this theorem illustrates the basic kind of self-reference involved when an algorithm considers its own output as part of the universe, and it is thus germane to many kinds of research involving self-modifying agents, especially when formal verification is involved or when we want to cleanly prove things in model problems. For a longer answer, well, welcome!

I’ll assume you have some background doing mathematical proofs and writing computer programs, but I won’t assume any background in mathematical logic beyond knowing the usual logical operators, nor that you’ve even heard of Löb’s Theorem before.

If you’d like to discuss the article, please do so here.

Subscribe to the New Publications newsletter

Get notified every time a new technical paper is published.

Introducing the Intelligent Agent Foundations Forum

 |   |  News

IAFFToday we are proud to publicly launch the Intelligent Agent Foundations Forum (RSS), a forum devoted to technical discussion of the research problems outlined in MIRI’s technical agenda overview, along with similar research problems.

Patrick’s welcome post explains:

Broadly speaking, the topics of this forum concern the difficulties of value alignment- the problem of how to ensure that machine intelligences of various levels adequately understand and pursue the goals that their developers actually intended, rather than getting stuck on some proxy for the real goal or failing in other unexpected (and possibly dangerous) ways. As these failure modes are more devastating the farther we advance in building machine intelligences, MIRI’s goal is to work today on the foundations of goal systems and architectures that would work even when the machine intelligence has general creative problem-solving ability beyond that of its developers, and has the ability to modify itself or build successors.

The forum has been privately active for several months, so many interesting articles have already been posted, including:

Also see How to contribute.



Rationality: From AI to Zombies

 |   |  News

Rationality Angled Cover WebBetween 2006 and 2009, senior MIRI researcher Eliezer Yudkowsky wrote several hundred essays for the blogs Overcoming Bias and Less Wrong, collectively called “the Sequences.” With two days remaining until Yudkowsky concludes his other well-known rationality book, Harry Potter and the Methods of Rationality, we are releasing around 340 of his original blog posts as a series of six books, collected in one ebook volume under the title Rationality: From AI to Zombies.

Yudkowsky’s writings on rationality, which were previously scattered in a constellation of blog posts, have been cleaned up, organized, and collected together for the first time. This new version of the Sequences should serve as a more accessible long-form introduction to formative ideas behind MIRI, CFAR, and substantial parts of the rationalist and effective altruist communities.

While the books’ central focus is on applying probability theory and the sciences of mind to personal dilemmas and philosophical controversies, a considerable range of topics is covered. The six books explore rationality theory and applications from multiple angles:

I. Map and Territory. A lively introduction to the Bayesian conception of rational belief in cognitive science, and how it differs from other kinds of belief.

II. How to Actually Change Your Mind. A guide to overcoming confirmation bias and motivated cognition.

III. The Machine in the Ghost. A collection of essays on the general topic of minds, goals, and concepts.

IV. Mere Reality. Essays on science and the physical world, as they relate to rational inference.

V. Mere Goodness. A wide-ranging discussion of human values and ethics.

VI. Becoming Stronger. An autobiographical account of Yudkowsky’s philosophical mistakes, followed by a discussion of self-improvement and group rationality.

These essays are packaged together as a single electronic text, making it easier to investigate links between essays and search for keywords. The ebook is available on a pay-what-you-want basis (link), and on Amazon.com for $4.99 (link). In the coming months, we will also be releasing print versions of these six books, and Castify will be releasing the official audiobook version.

Bill Hibbard on Ethical Artificial Intelligence

 |   |  Conversations

Bill Hibbard portraitBill Hibbard is an Emeritus Senior Scientist at the University of Wisconsin-Madison Space Science and Engineering Center, currently working on issues of AI safety and unintended behaviors. He has a BA in Mathematics and MS and PhD in Computer Sciences, all from the University of Wisconsin-Madison. He is the author of Super-Intelligent Machines, “Avoiding Unintended AI Behaviors,” “Decision Support for Safe AI Design,” and “Ethical Artificial Intelligence.” He is also principal author of the Vis5D, Cave5D, and VisAD open source visualization systems.

Luke Muehlhauser: You recently released a self-published book, Ethical Artificial Intelligence, which “combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence.” Most of the book is devoted to the kind of exploratory engineering in AI that you and I described in a recent CACM article, such that you mathematically analyze the behavioral properties of classes of future AI agents, e.g. utility-maximizing agents.

Many AI scientists have the intuition that such early, exploratory work is very unlikely to pay off when we are so far from building an AGI, and don’t what an AGI will look like. For example, Michael Littman wrote:

…proposing specific mechanisms for combatting this amorphous threat [of AGI] is a bit like trying to engineer airbags before we’ve thought of the idea of cars. Safety has to be addressed in context and the context we’re talking about is still absurdly speculative.

How would you defend the value of the kind of work you do in Ethical Artificial Intelligence to Littman and others who share his skepticism?

Read more »

Fallenstein talk for APS March Meeting 2015

 |   |  News

Fallenstein APS talkMIRI researcher Benja Fallenstein recently delivered an invited talk at the March 2015 meeting of the American Physical Society in San Antonio, Texas. Their talk was one of four in a special session on artificial intelligence.

Fallenstein’s title was “Beneficial Smarter-than-human Intelligence: the Challenges and the Path Forward.” The slides are available here. Abstract:

Today, human-level machine intelligence is still in the domain of futurism, but there is every reason to expect that it will be developed eventually. A generally intelligent agent as smart or smarter than a human, and capable of improving itself further, would be a system we’d need to design for safety from the ground up: There is no reason to think that such an agent would be driven by human motivations like a lust for power; but almost any goals will be easier to meet with access to more resources, suggesting that most goals an agent might pursue, if they don’t explicitly include human welfare, would likely put its interests at odds with ours, by incentivizing it to try to acquire the physical resources currently being used by humanity. Moreover, since we might try to prevent this, such an agent would have an incentive to deceive its human operators about its true intentions, and to resist interventions to modify it to make it more aligned with humanity’s interests, making it difficult to test and debug its behavior. This suggests that in order to create a beneficial smarter-than-human agent, we will need to face three formidable challenges: How can we formally specify goals that are in fact beneficial? How can we create an agent that will reliably pursue the goals that we give it? And how can we ensure that this agent will not try to prevent us from modifying it if we find mistakes in its initial version? In order to become confident that such an agent behaves as intended, we will not only want to have a practical implementation that seems to meet these challenges, but to have a solid theoretical understanding of why it does so. In this talk, I will argue that even though human-level machine intelligence does not exist yet, there are foundational technical research questions in this area which we can and should begin to work on today. For example, probability theory provides a principled framework for representing uncertainty about the physical environment, which seems certain to be helpful to future work on beneficial smarter-than-human agents, but standard probability theory assumes omniscience about logical facts; no similar principled framework for representing uncertainty about the outputs of deterministic computations exists as yet, even though any smarter-than-human agent will certainly need to deal with uncertainty of this type. I will discuss this and other examples of ongoing foundational work.

Stuart Russell of UC Berkeley also gave a talk at this session, about the long-term future of AI.

March 2015 newsletter

 |   |  Newsletters



Machine Intelligence Research Institute

Research updates

News updates

Other news

As always, please don’t hesitate to let us know if you have any questions or comments.

Luke Muehlhauser
Executive Director



Davis on AI capability and motivation

 |   |  Analysis

In a review of Superintelligence, NYU computer scientist Ernest Davis voices disagreement with a number of claims he attributes to Nick Bostrom: that “intelligence is a potentially infinite quantity with a well-defined, one-dimensional value,” that a superintelligent AI could “easily resist and outsmart the united efforts of eight billion people” and achieve “virtual omnipotence,” and that “though achieving intelligence is more or less easy, giving a computer an ethical point of view is really hard.”

These are all stronger than Bostrom’s actual claims. For example, Bostrom never characterizes building a generally intelligent machine as “easy.” Nor does he say that intelligence can be infinite or that it can produce “omnipotence.” Humans’ intelligence and accumulated knowledge gives us a decisive advantage over chimpanzees, even though our power is limited in important ways. An AI need not be magical or all-powerful in order to have the same kind of decisive advantage over humanity.

Still, Davis’ article is one of the more substantive critiques of MIRI’s core assumptions that I have seen, and he addresses several deep issues that directly bear on AI forecasting and strategy. I’ll sketch out a response to his points here.

Read more »