New report: “Vingean Reflection: Reliable Reasoning for Self-Improving Agents”

 |   |  News

Vingean reflectionToday we release a new technical report by Benja Fallenstein and Nate Soares, “Vingean Reflection: Reliable Reasoning for Self-Improving Agents.” If you’d like to discuss the paper, please do so here.


Today, human-level machine intelligence is in the domain of futurism, but there is every reason to expect that it will be developed eventually. Once artificial agents become able to improve themselves further, they may far surpass human intelligence, making it vitally important to ensure that the result of an “intelligence explosion” is aligned with human interests. In this paper, we discuss one aspect of this challenge: ensuring that the initial agent’s reasoning about its future versions is reliable, even if these future versions are far more intelligent than the current reasoner. We refer to reasoning of this sort as Vingean reflection.

A self-improving agent must reason about the behavior of its smarter successors in abstract terms, since if it could predict their actions in detail, it would already be as smart as them. This is called the Vingean principle, and we argue that theoretical work on Vingean reflection should focus on formal models that reflect this principle. However, the framework of expected utility maximization, commonly used to model rational agents, fails to do so. We review a body of work which instead investigates agents that use formal proofs to reason about their successors. While it is unlikely that real-world agents would base their behavior entirely on formal proofs, this appears to be the best currently available formal model of abstract reasoning, and work in this setting may lead to insights applicable to more realistic approaches to Vingean reflection.

This is the 4th of six new major reports which describe and motivate MIRI’s current research agenda at a high level.

An improved “AI Impacts” website

 |   |  News

AI ImpactsRecently, MIRI received a targeted donation to improve the AI Impacts website initially created by frequent MIRI collaborator Paul Christiano and part-time MIRI researcher Katja Grace. Collaborating with Paul and Katja, we ported the old content to a more robust and navigable platform, and made some improvements to the content. You can see the result at

As explained in the site’s introductory blog post,

AI Impacts is premised on two ideas (at least!):

  • The details of the arrival of human-level artificial intelligence matter
    Seven years to prepare is very different from seventy years to prepare. A weeklong transition is very different from a decade-long transition. Brain emulations require different preparations than do synthetic AI minds. Etc.
  • Available data and reasoning can substantially educate our guesses about these details
    We can track progress in AI subfields. We can estimate the hardware represented by the human brain. We can detect the effect of additional labor on software progress. Etc.

Our goal is to assemble relevant evidence and considerations, and to synthesize reasonable views on questions such as when AI will surpass human-level capabilities, how rapid development will be at that point, what advance notice we might expect, and what kinds of AI are likely to reach human-level capabilities first.

The meat of the website is in its articles. Here are two examples to start with:

New report: “Questions of reasoning under logical uncertainty”

 |   |  News

Reasoning under LUToday we release a new technical report by Nate Soares and Benja Fallenstein, “Questions of reasoning under logical uncertainty.” If you’d like to discuss the paper, please do so here.


A logically uncertain reasoner would be able to reason as if they know both a programming language and a program, without knowing what the program outputs. Most practical reasoning involves some logical uncertainty, but no satisfactory theory of reasoning under logical uncertainty yet exists. A better theory of reasoning under logical uncertainty is needed in order to develop the tools necessary to construct highly reliable artificial reasoners. This paper introduces the topic, discusses a number of historical results, and describes a number of open problems.

This is the 3rd of six new major reports which describe and motivate MIRI’s current research agenda at a high level.

Brooks and Searle on AI volition and timelines

 |   |  Analysis

Nick Bostrom’s concerns about the future of AI have sparked a busy public discussion. His arguments were echoed by leading AI researcher Stuart Russell in “Transcending complacency on superintelligent machines” (co-authored with Stephen Hawking, Max Tegmark, and Frank Wilczek), and a number of journalists, scientists, and technologists have subsequently chimed in. Given the topic’s complexity, I’ve been surprised by the positivity and thoughtfulness of most of the coverage (some overused clichés aside).

Unfortunately, what most people probably take away from these articles is ‘Stephen Hawking thinks AI is scary!’, not the chains of reasoning that led Hawking, Russell, or others to their present views. When Elon Musk chimes in with his own concerns and cites Bostrom’s book Superintelligence: Paths, Dangers, Strategies, commenters seem to be more interested in immediately echoing or dismissing Musk’s worries than in looking into his source.

The end result is more of a referendum on people’s positive or negative associations with the word ‘AI’ than a debate over Bostrom’s substantive claims. If ‘AI’ calls to mind science fiction dystopias for you, the temptation is to squeeze real AI researchers into your ‘mad scientists poised to unleash an evil robot army’ stereotype. Equally, if ‘AI’ calls to mind your day job testing edge detection algorithms, that same urge to force new data into old patterns makes it tempting to squeeze Bostrom and Hawking into the ‘naïve technophobes worried about the evil robot uprising’ stereotype.

Thus roboticist Rodney Brooks’ recent blog post “Artificial intelligence is a tool, not a threat” does an excellent job dispelling common myths about the cutting edge of AI, and philosopher John Searle’s review of Superintelligence draws out some important ambiguities in our concepts of subjectivity and mind; but both writers scarcely intersect with Bostrom’s (or Russell’s, or Hawking’s) ideas. Both pattern-match Bostrom to the nearest available ‘evil robot panic’ stereotype, and stop there.

Brooks and Searle don’t appreciate how new the arguments in Superintelligence are. In the interest of making it easier to engage with these important topics, and less appealing to force the relevant technical and strategic questions into the model of decades-old debates, I’ll address three of the largest misunderstandings one might come away with after seeing Musk, Searle, Brooks, and others’ public comments: conflating present and future AI risks, conflating risk severity with risk imminence, and conflating risk from autonomous algorithmic decision-making with risk from human-style antisocial dispositions.

Read more »

Matthias Troyer on Quantum Computers

 |   |  Conversations


Dr. Matthias Troyer is a professor of Computational Physics at ETH Zürich. Before that, he finished University Studies in “Technischer Physik” at the Johannes Kepler Universität Linz, Austria, as well as Diploma in Physics and Interdisciplinary PhD thesis at the ETH Zürich.

His research interest and experience focuses on High Performance Scientific Simulations on architectures, quantum lattice models and relativistic and quantum systems. Troyer is known for leading the research team of the D-Wave One Computer System. He was awarded an Assistant Professorship by the Swiss National Science Foundation.

Luke Muehlhauser: Your tests of D-Wave’s (debated) quantum computer have gotten much attention recently. Our readers can get up to speed on that story via your arxiv preprint, its coverage at Scott Aaronson’s blog, and Will Bourne’s article for Inc. For now, though, I’d like to ask you about some other things.

If you’ll indulge me, I’ll ask you to put on a technological forecasting hat for a bit, and respond to a question I also asked Ronald de Wolf: “What is your subjective probability that we’ll have a 500-qubit quantum computer, which is uncontroversially a quantum computer, within the next 20 years? And, how do you reason about a question like that?”

Matthias Troyer: In order to have an uncontroversial quantum computer as you describe it we will need to take three steps. First we need to have at least ONE qubit that is long term stable. The next step is to couple two such qubits, and the final step is to scale to more qubits.

The hardest step is the first one, obtaining a single long-term stable qubit. Given intrinsic decoherence mechanisms that cannot be avoided in any real device, such a qubit will have to built from many (hundreds to thousands) of physical qubits. These physical qubits will each have a finite coherence time, but they will be coupled in such a way (using error correcting codes) as to jointly generate one long term stable “logical” qubit. These error correction codes require the physical qubits to be better than a certain threshold quality. Recently qubits started to approach these thresholds, and I am thus confident that within the next 5-10 years one will be able to couple them to form a long-time stable logical qubit.

Coupling two qubits is something that will happen on the same time scale. The remaining challenge will thus be to scale to your target size of e.g. 500 qubits. This may be a big engineering challenge but I do not see any fundamental stumbling block given that enough resources are invested. I am confident that this can be achieved is less than ten years once we have a single logical qubit. Overall I am thus very confident that a 500-qubit quantum computer will exist in 20 years.

Read more »

January 2015 Newsletter

 |   |  Newsletters


Machine Intelligence Research Institute

Thanks to the generosity of 80+ donors, we completed our winter 2014 matching challenge, raising $200,000 for our research program. Many, many thanks to all who contributed!

Research Updates

News Updates

Other Updates

  • Eric Horvitz has provided initial funding for a 100-year Stanford program to study the social impacts of artificial intelligence. The white paper lists 18 example research areas, two of which amount to what Nick Bostrom calls the superintelligence control problem, MIRI’s research focus. No word yet on how soon anyone funded through this program will study open questions relevant to superintelligence control.

As always, please don’t hesitate to let us know if you have any questions or comments.

Luke Muehlhauser
Executive Director


Our new technical research agenda overview

 |   |  News

technical agenda overviewToday we release a new overview of MIRI’s technical research agenda, “Aligning Superintelligence with Human Interests: A Technical Research Agenda,” by Nate Soares and Benja Fallenstein. The preferred place to discuss this report is here.

The report begins:

The characteristic that has enabled humanity to shape the world is not strength, not speed, but intelligence. Barring catastrophe, it seems clear that progress in AI will one day lead to the creation of agents meeting or exceeding human-level general intelligence, and this will likely lead to the eventual development of systems which are “superintelligent” in the sense of being “smarter than the best human brains in practically every field” (Bostrom 2014)…

…In order to ensure that the development of smarter-than-human intelligence has a positive impact on humanity, we must meet three formidable challenges: How can we create an agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? And how can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in the initial version are inevitable?

This agenda discusses technical research that is tractable today, which the authors think will make it easier to confront these three challenges in the future. Sections 2 through 4 motivate and discuss six research topics that we think are relevant to these challenges. Section 5 discusses our reasons for selecting these six areas in particular.

We call a smarter-than-human system that reliably pursues beneficial goals “aligned with human interests” or simply “aligned.” To become confident that an agent is aligned in this way, a practical implementation that merely seems to meet the challenges outlined above will not suffice. It is also necessary to gain a solid theoretical understanding of why that confidence is justified. This technical agenda argues that there is foundational research approachable today that will make it easier to develop aligned systems in the future, and describes ongoing work on some of these problems.

This report also refers to six key supporting papers which go into more detail for each major research problem area:

  1. Corrigibility
  2. Toward idealized decision theory
  3. Questions of reasoning under logical uncertainty
  4. Vingean reflection: reliable reasoning for self-improving agents
  5. Formalizing two problems of realistic world-models
  6. The value learning problem