Today we release a new overview of MIRI’s technical research agenda, “Aligning Superintelligence with Human Interests: A Technical Research Agenda,” by Nate Soares and Benja Fallenstein. The preferred place to discuss this report is here.
The report begins:
The characteristic that has enabled humanity to shape the world is not strength, not speed, but intelligence. Barring catastrophe, it seems clear that progress in AI will one day lead to the creation of agents meeting or exceeding human-level general intelligence, and this will likely lead to the eventual development of systems which are “superintelligent” in the sense of being “smarter than the best human brains in practically every field” (Bostrom 2014)…
…In order to ensure that the development of smarter-than-human intelligence has a positive impact on humanity, we must meet three formidable challenges: How can we create an agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? And how can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in the initial version are inevitable?
This agenda discusses technical research that is tractable today, which the authors think will make it easier to confront these three challenges in the future. Sections 2 through 4 motivate and discuss six research topics that we think are relevant to these challenges. Section 5 discusses our reasons for selecting these six areas in particular.
We call a smarter-than-human system that reliably pursues beneficial goals “aligned with human interests” or simply “aligned.” To become confident that an agent is aligned in this way, a practical implementation that merely seems to meet the challenges outlined above will not suffice. It is also necessary to gain a solid theoretical understanding of why that confidence is justified. This technical agenda argues that there is foundational research approachable today that will make it easier to develop aligned systems in the future, and describes ongoing work on some of these problems.
This report also refers to six key supporting papers which go into more detail for each major research problem area:
- Corrigibility
- Toward idealized decision theory
- Questions of reasoning under logical uncertainty
- Vingean reflection: reliable reasoning for self-improving agents
- Formalizing two problems of realistic world-models
- The value learning problem
Update July 15, 2016: Our overview paper is scheduled to be released in the Springer anthology The Technological Singularity: Managing the Journey in 2017, under the new title “Agent Foundations for Aligning Machine Intelligence with Human Interests.” The new title is intended to help distinguish this agenda from another research agenda we’ll be working on in parallel with the agent foundations agenda: “Value Alignment for Advanced Machine Learning Systems.”