A superintelligent machine would not automatically act as intended: it will act as programmed, but the fit between human intentions and formal specification could be poor. We discuss methods by which a system could be constructed to learn what to value. We highlight open problems specific to inductive value learning (from labeled training data), and raise a number of questions about the construction of systems which model the preferences of their operators and act accordingly.
This is the last of six new major reports which describe and motivate MIRI’s current research agenda at a high level.
Today we release a new technical report by Nate Soares, “Formalizing two problems of realistic world models.” If you’d like to discuss the paper, please do so here.
An intelligent agent embedded within the real world must reason about an environment which is larger than the agent, and learn how to achieve goals in that environment. We discuss attempts to formalize two problems: one of induction, where an agent must use sensory data to infer a universe which embeds (and computes) the agent, and one of interaction, where an agent must learn to achieve complex goals in the universe. We review related problems formalized by Solomonoff and Hutter, and explore challenges that arise when attempting to formalize analogous problems in a setting where the agent is embedded within the environment.
This is the 5th of six new major reports which describe and motivate MIRI’s current research agenda at a high level.
Today we release a new technical report by Benja Fallenstein and Nate Soares, “Vingean Reflection: Reliable Reasoning for Self-Improving Agents.” If you’d like to discuss the paper, please do so here.
Today, human-level machine intelligence is in the domain of futurism, but there is every reason to expect that it will be developed eventually. Once artificial agents become able to improve themselves further, they may far surpass human intelligence, making it vitally important to ensure that the result of an “intelligence explosion” is aligned with human interests. In this paper, we discuss one aspect of this challenge: ensuring that the initial agent’s reasoning about its future versions is reliable, even if these future versions are far more intelligent than the current reasoner. We refer to reasoning of this sort as Vingean reflection.
A self-improving agent must reason about the behavior of its smarter successors in abstract terms, since if it could predict their actions in detail, it would already be as smart as them. This is called the Vingean principle, and we argue that theoretical work on Vingean reflection should focus on formal models that reflect this principle. However, the framework of expected utility maximization, commonly used to model rational agents, fails to do so. We review a body of work which instead investigates agents that use formal proofs to reason about their successors. While it is unlikely that real-world agents would base their behavior entirely on formal proofs, this appears to be the best currently available formal model of abstract reasoning, and work in this setting may lead to insights applicable to more realistic approaches to Vingean reflection.
This is the 4th of six new major reports which describe and motivate MIRI’s current research agenda at a high level.
Recently, MIRI received a targeted donation to improve the AI Impacts website initially created by frequent MIRI collaborator Paul Christiano and part-time MIRI researcher Katja Grace. Collaborating with Paul and Katja, we ported the old content to a more robust and navigable platform, and made some improvements to the content. You can see the result at AIImpacts.org.
As explained in the site’s introductory blog post,
AI Impacts is premised on two ideas (at least!):
- The details of the arrival of human-level artificial intelligence matter
Seven years to prepare is very different from seventy years to prepare. A weeklong transition is very different from a decade-long transition. Brain emulations require different preparations than do synthetic AI minds. Etc.
- Available data and reasoning can substantially educate our guesses about these details
We can track progress in AI subfields. We can estimate the hardware represented by the human brain. We can detect the effect of additional labor on software progress. Etc.
Our goal is to assemble relevant evidence and considerations, and to synthesize reasonable views on questions such as when AI will surpass human-level capabilities, how rapid development will be at that point, what advance notice we might expect, and what kinds of AI are likely to reach human-level capabilities first.
The meat of the website is in its articles. Here are two examples to start with:
A logically uncertain reasoner would be able to reason as if they know both a programming language and a program, without knowing what the program outputs. Most practical reasoning involves some logical uncertainty, but no satisfactory theory of reasoning under logical uncertainty yet exists. A better theory of reasoning under logical uncertainty is needed in order to develop the tools necessary to construct highly reliable artificial reasoners. This paper introduces the topic, discusses a number of historical results, and describes a number of open problems.
This is the 3rd of six new major reports which describe and motivate MIRI’s current research agenda at a high level.
Nick Bostrom’s concerns about the future of AI have sparked a busy public discussion. His arguments were echoed by leading AI researcher Stuart Russell in “Transcending complacency on superintelligent machines” (co-authored with Stephen Hawking, Max Tegmark, and Frank Wilczek), and a number of journalists, scientists, and technologists have subsequently chimed in. Given the topic’s complexity, I’ve been surprised by the positivity and thoughtfulness of most of the coverage (some overused clichés aside).
Unfortunately, what most people probably take away from these articles is ‘Stephen Hawking thinks AI is scary!’, not the chains of reasoning that led Hawking, Russell, or others to their present views. When Elon Musk chimes in with his own concerns and cites Bostrom’s book Superintelligence: Paths, Dangers, Strategies, commenters seem to be more interested in immediately echoing or dismissing Musk’s worries than in looking into his source.
The end result is more of a referendum on people’s positive or negative associations with the word ‘AI’ than a debate over Bostrom’s substantive claims. If ‘AI’ calls to mind science fiction dystopias for you, the temptation is to squeeze real AI researchers into your ‘mad scientists poised to unleash an evil robot army’ stereotype. Equally, if ‘AI’ calls to mind your day job testing edge detection algorithms, that same urge to force new data into old patterns makes it tempting to squeeze Bostrom and Hawking into the ‘naïve technophobes worried about the evil robot uprising’ stereotype.
Thus roboticist Rodney Brooks’ recent blog post “Artificial intelligence is a tool, not a threat” does an excellent job dispelling common myths about the cutting edge of AI, and philosopher John Searle’s review of Superintelligence draws out some important ambiguities in our concepts of subjectivity and mind; but both writers scarcely intersect with Bostrom’s (or Russell’s, or Hawking’s) ideas. Both pattern-match Bostrom to the nearest available ‘evil robot panic’ stereotype, and stop there.
Brooks and Searle don’t appreciate how new the arguments in Superintelligence are. In the interest of making it easier to engage with these important topics, and less appealing to force the relevant technical and strategic questions into the model of decades-old debates, I’ll address three of the largest misunderstandings one might come away with after seeing Musk, Searle, Brooks, and others’ public comments: conflating present and future AI risks, conflating risk severity with risk imminence, and conflating risk from autonomous algorithmic decision-making with risk from human-style antisocial dispositions.
Dr. Matthias Troyer is a professor of Computational Physics at ETH Zürich. Before that, he finished University Studies in “Technischer Physik” at the Johannes Kepler Universität Linz, Austria, as well as Diploma in Physics and Interdisciplinary PhD thesis at the ETH Zürich.
His research interest and experience focuses on High Performance Scientific Simulations on architectures, quantum lattice models and relativistic and quantum systems. Troyer is known for leading the research team of the D-Wave One Computer System. He was awarded an Assistant Professorship by the Swiss National Science Foundation.
Luke Muehlhauser: Your tests of D-Wave’s (debated) quantum computer have gotten much attention recently. Our readers can get up to speed on that story via your arxiv preprint, its coverage at Scott Aaronson’s blog, and Will Bourne’s article for Inc. For now, though, I’d like to ask you about some other things.
If you’ll indulge me, I’ll ask you to put on a technological forecasting hat for a bit, and respond to a question I also asked Ronald de Wolf: “What is your subjective probability that we’ll have a 500-qubit quantum computer, which is uncontroversially a quantum computer, within the next 20 years? And, how do you reason about a question like that?”
Matthias Troyer: In order to have an uncontroversial quantum computer as you describe it we will need to take three steps. First we need to have at least ONE qubit that is long term stable. The next step is to couple two such qubits, and the final step is to scale to more qubits.
The hardest step is the first one, obtaining a single long-term stable qubit. Given intrinsic decoherence mechanisms that cannot be avoided in any real device, such a qubit will have to built from many (hundreds to thousands) of physical qubits. These physical qubits will each have a finite coherence time, but they will be coupled in such a way (using error correcting codes) as to jointly generate one long term stable “logical” qubit. These error correction codes require the physical qubits to be better than a certain threshold quality. Recently qubits started to approach these thresholds, and I am thus confident that within the next 5-10 years one will be able to couple them to form a long-time stable logical qubit.
Coupling two qubits is something that will happen on the same time scale. The remaining challenge will thus be to scale to your target size of e.g. 500 qubits. This may be a big engineering challenge but I do not see any fundamental stumbling block given that enough resources are invested. I am confident that this can be achieved is less than ten years once we have a single logical qubit. Overall I am thus very confident that a 500-qubit quantum computer will exist in 20 years.