New paper: “Alignment for advanced machine learning systems”
MIRI’s research to date has focused on the problems that we laid out in our late 2014 research agenda, and in particular on formalizing optimal reasoning for bounded, reflective decision-theoretic agents embedded in their environment. Our research team has since grown considerably, and we have made substantial progress on this agenda, including a major breakthrough in logical uncertainty that we will be announcing in the coming weeks.
Today we are announcing a new research agenda, “Alignment for advanced machine learning systems.” Going forward, about half of our time will be spent on this new agenda, while the other half is spent on our previous agenda. The abstract reads:
We survey eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers.
Open problems surveyed in this research proposal include: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incentivize a system to “not have an overly large impact” or “not have many side effects”? We discuss these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.
Co-authored by Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch, our new report discusses eight new lines of research (previously summarized here). Below, I’ll explain the rationale behind these problems, as well as how they tie in to our old research agenda and to the new “Concrete problems in AI safety” agenda spearheaded by Dario Amodei and Chris Olah of Google Brain.