The White House Office of Science and Technology Policy recently put out a request for information on “(1) The legal and governance implications of AI; (2) the use of AI for public good; (3) the safety and control issues for AI; (4) the social and economic implications of AI;” and a variety of related topics. I’ve reproduced MIRI’s submission to the RfI below:
I. Review of safety and control concerns
AI experts largely agree that AI research will eventually lead to the development of AI systems that surpass humans in general reasoning and decision-making ability. This is, after all, the goal of the field. However, there is widespread disagreement about how long it will take to cross that threshold, and what the relevant AI systems are likely to look like (autonomous agents, widely distributed decision support systems, human/AI teams, etc.).
Despite the uncertainty, a growing subset of the research community expects that advanced AI systems will give rise to a number of foreseeable safety and control difficulties, and that those difficulties can be preemptively addressed by technical research today. Stuart Russell, co-author of the leading undergraduate textbook in AI and professor at U.C. Berkeley, writes:
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.
Researchers’ worries about the impact of AI in the long term bear little relation to the doomsday scenarios most often depicted in Hollywood movies, in which “emergent consciousness” allows machines to throw off the shackles of their programmed goals and rebel. The concern is rather that such systems may pursue their programmed goals all too well, and that the programmed goals may not match the intended goals, or that the intended goals may have unintended negative consequences.
These challenges are not entirely novel. We can compare them to other principal-agent problems where incentive structures are designed with the hope that blind pursuit of those incentives promotes good outcomes. Historically, principal-agent problems have been difficult to solve even in domains where the people designing the incentive structures can rely on some amount of human goodwill and common sense. Consider the problem of designing tax codes to have reliably beneficial consequences, or the problem of designing regulations that reliably reduce corporate externalities. Advanced AI systems naively designed to optimize some objective function could result in unintended consequences that occur on digital timescales, but without goodwill and common sense to blunt the impact.
Given that researchers don’t know when breakthroughs will occur, and given that there are multiple lines of open technical research that can be pursued today to address these concerns, we believe it is prudent to begin serious work on those technical obstacles to improve the community’s preparedness.
II. Technical research directions for safety and control
There are several promising lines of technical research that may help ensure that the AI systems of the future have a positive social impact. We divide this research into three broad categories:
- Value specification (VS): research that aids in the design of objective functions that capture the intentions of the operators, and/or that describe socially beneficial goals. Example: cooperative inverse reinforcement learning, a formal model of AI agents that inductively learn the goals of other agents (e.g., human operators).
- High reliability (HR): research that aids in the design of AI systems that robustly, reliably, and verifiably pursue the given objectives. Example: the PAC learning framework, which gives statistical guarantees about the correctness of solutions to certain types of classification problems. This framework is a nice example of research done far in advance of the development of advanced AI systems that is nevertheless likely to aid in the design of systems that are robust and reliable.
- Error tolerance (ET): research that aids in the design of AI systems that are fail-safe and robust to design errors. Example: research into the design of objective functions that allow an agent to be shut down, but do not give that agent incentives to cause or prevent shutdown.
Our “Agent foundations for aligning machine intelligence with human interests” report discusses these three targets in depth, and outlines some neglected technical research topics that are likely to be relevant to the future design of robustly beneficial AI systems regardless of their specific architecture. Our “Alignment for advanced machine learning systems” report discusses technical research topics relevant to these questions under the stronger assumption that the advanced systems of the future will be qualitatively similar to modern-day machine learning (ML) systems. We also recommend a research proposal led by Dario Amodei and Chris Olah of Google Brain, “Concrete problems in AI safety,” for technical research problems that are applicable to near-future AI systems and are likely to also be applicable to more advanced systems down the road. Actionable research directions discussed in these agendas include (among many other topics):
– robust inverse reinforcement learning: designing reward-based agents to learn human values in contexts where observed behavior may reveal biases or ignorance in place of genuine preferences. (VS)
– safe exploration: designing reinforcement learning agents to efficiently learn about their environments without performing high-risk experiments. (ET)
– low-impact agents: specifying decision-making systems that deliberately avoid having a large impact, good or bad, on their environment. (ET)
There are also a number of research areas that would likely aid in the development of safe AI systems, but which are not well-integrated into the existing AI community. As an example, many of the techniques in use by the program verification and high-assurance software communities cannot be applied to modern ML algorithms. Fostering more collaboration between these communities is likely to make it easier for us to design AI systems suitable for use in safety-critical situations. Actionable research directions for ML analysis and verification include:
– algorithmic transparency: developing more formal tools for analyzing how and why ML algorithms perform as they do. (HR)
– type theory for program verification: developing high-assurance techniques for the re-use of verified code in new contexts. (HR)
– incremental re-verification: confirming the persistence of safety properties for adaptive systems. (HR)
Another category of important research for AI reliability is the development of basic theoretical tools for formally modeling intelligent agents. As an example, consider the interaction of probability theory (a theoretical tool for modeling uncertain reasoners) with modern machine learning algorithms. While modern ML systems do not strictly follow the axioms of probability theory, many of the theoretical guarantees that can be applied to them are probability-theoretic, taking the form “this agent will converge on a policy that is very close to the optimal policy, with very high probability.” Probability theory is an example of basic research that was developed far in advance of present-day ML techniques, but has proven important for attaining strong (statistical) guarantees about the behavior of ML systems. We believe that more basic research of this kind can be done, and that it could prove to be similarly valuable.
There are a number of other aspects of good reasoning where analogous foundations are lacking, such as situations where AI systems have to allocate attention given limited computational resources, or predict the behavior of computations that are too expensive to run, or analyze the effects of potential alterations to their hardware or software. Further research into basic theoretical models of ideal reasoning (including research into bounded rationality) could yield tools that would help attain stronger theoretical guarantees about AI systems’ behavior. Actionable research directions include:
– decision theory: giving a formal account of reasoning in settings where an agent must engage in metacognition, reflection, self-modification, or reasoning about violations of the agent/environment boundary. (HR)
– logical uncertainty: generalizing Bayesian probability theory to settings where agents are uncertain about mathematical (e.g., computational) facts. (HR)
We believe that there are numerous promising avenues of foundational research which, if successful, could make it possible to get very strong guarantees about the behavior of advanced AI systems — stronger than many currently think is possible, in a time when the most successful machine learning techniques are often poorly understood. We believe that bringing together researchers in machine learning, program verification, and the mathematical study of formal agents would be a large step towards ensuring that highly advanced AI systems will have a robustly beneficial impact on society.
III. Coordination prospects
It is difficult to say much with confidence about the long-term impact of AI. For now, we believe that the lines of technical research outlined above are the best available tool for addressing concerns about advanced AI systems, and for learning more about what needs to be done.
Looking ahead, we expect the risks associated with transformative AI systems in the long term to be exacerbated if the designers of such systems (be they private-sector, public-sector, or part of some international collaboration) act under excessive time pressure. It is our belief that any policy designed to ensure that the social impact of AI is beneficial should first and foremost ensure that transformative AI systems are deployed with careful consideration, rather than in fear or haste. If scientists and engineers are worried about losing a race to the finish, they will have more incentives to cut corners on safety and control, obviating the benefits of safety-conscious work.
In the long term, we recommend that policymakers make use of incentives to encourage designers of AI systems to work together cooperatively, perhaps through multinational and multicorporate collaborations, in order to discourage the development of race dynamics. In light of high levels of uncertainty about the future of AI among experts, and in light of the large potential of AI research to save lives, solve social problems, and serve the common good in the near future, we recommend against broad regulatory interventions in this space. We recommend that effort instead be put towards encouraging interdisciplinary technical research into the AI safety and control challenges that we have outlined above.