White House submissions and report on AI safety

 |   |  News

In May, the White House Office of Science and Technology Policy (OSTP) announced “a new series of workshops and an interagency working group to learn more about the benefits and risks of artificial intelligence.” They hosted a June Workshop on Safety and Control for AI (videos), along with three other workshops, and issued a general request for information on AI (see MIRI’s primary submission here).

The OSTP has now released a report summarizing its conclusions, “Preparing for the Future of Artificial Intelligence,” and the result is very promising. The OSTP acknowledges the ongoing discussion about AI risk, and recommends “investing in research on longer-term capabilities and how their challenges might be managed”:

General AI (sometimes called Artificial General Intelligence, or AGI) refers to a notional future AI system that exhibits apparently intelligent behavior at least as advanced as a person across the full range of cognitive tasks. A broad chasm seems to separate today’s Narrow AI from the much more difficult challenge of General AI. Attempts to reach General AI by expanding Narrow AI solutions have made little headway over many decades of research. The current consensus of the private-sector expert community, with which the NSTC Committee on Technology concurs, is that General AI will not be achieved for at least decades.14

People have long speculated on the implications of computers becoming more intelligent than humans. Some predict that a sufficiently intelligent AI could be tasked with developing even better, more intelligent systems, and that these in turn could be used to create systems with yet greater intelligence, and so on, leading in principle to an “intelligence explosion” or “singularity” in which machines quickly race far ahead of humans in intelligence.15

In a dystopian vision of this process, these super-intelligent machines would exceed the ability of humanity to understand or control. If computers could exert control over many critical systems, the result could be havoc, with humans no longer in control of their destiny at best and extinct at worst. This scenario has long been the subject of science fiction stories, and recent pronouncements from some influential industry leaders have highlighted these fears.

A more positive view of the future held by many researchers sees instead the development of intelligent systems that work well as helpers, assistants, trainers, and teammates of humans, and are designed to operate safely and ethically.

The NSTC Committee on Technology’s assessment is that long-term concerns about super-intelligent General AI should have little impact on current policy. The policies the Federal Government should adopt in the near-to-medium term if these fears are justified are almost exactly the same policies the Federal Government should adopt if they are not justified. The best way to build capacity for addressing the longer-term speculative risks is to attack the less extreme risks already seen today, such as current security, privacy, and safety risks, while investing in research on longer-term capabilities and how their challenges might be managed. Additionally, as research and applications in the field continue to mature, practitioners of AI in government and business should approach advances with appropriate consideration of the long-term societal and ethical questions – in additional to just the technical questions – that such advances portend. Although prudence dictates some attention to the possibility that harmful superintelligence might someday become possible, these concerns should not be the main driver of public policy for AI.

Later, the report discusses “methods for monitoring and forecasting AI developments”:

One potentially useful line of research is to survey expert judgments over time. As one example, a survey of AI researchers found that 80 percent of respondents believed that human-level General AI will eventually be achieved, and half believed it is at least 50 percent likely to be achieved by the year 2040. Most respondents also believed that General AI will eventually surpass humans in general intelligence.50 While these particular predictions are highly uncertain, as discussed above, such surveys of expert judgment are useful, especially when they are repeated frequently enough to measure changes in judgment over time. One way to elicit frequent judgments is to run “forecasting tournaments” such as prediction markets, in which participants have financial incentives to make accurate predictions.51 Other research has found that technology developments can often be accurately predicted by analyzing trends in publication and patent data52. […]

When asked during the outreach workshops and meetings how government could recognize milestones of progress in the field, especially those that indicate the arrival of General AI may be approaching, researchers tended to give three distinct but related types of answers:

1. Success at broader, less structured tasks: In this view, the transition from present Narrow AI to an eventual General AI will occur by gradually broadening the capabilities of Narrow AI systems so that a single system can cover a wider range of less structured tasks. An example milestone in this area would be a housecleaning robot that is as capable as a person at the full range of routine housecleaning tasks.

2. Unification of different “styles” of AI methods: In this view, AI currently relies on a set of separate methods or approaches, each useful for different types of applications. The path to General AI would involve a progressive unification of these methods. A milestone would involve finding a single method that is able to address a larger domain of applications that previously required multiple methods.

3. Solving specific technical challenges, such as transfer learning: In this view, the path to General AI does not lie in progressive broadening of scope, nor in unification of existing methods, but in progress on specific technical grand challenges, opening up new ways forward. The most commonly cited challenge is transfer learning, which has the goal of creating a machine learning algorithm whose result can be broadly applied (or transferred) to a range of new applications.

The report also discusses the open problems outlined in “Concrete Problems in AI Safety” and cites the MIRI paper “The Errors, Insights and Lessons of Famous AI Predictions – and What They Mean for the Future.”

In related news, Barack Obama recently answered some questions about AI risk and Nick Bostrom’s Superintelligence in a Wired interview. After saying that “we’re still a reasonably long way away” from general AI (video) and that his directive to his national security team is to worry more about near-term security concerns (video), Obama adds:

Now, I think, as a precaution — and all of us have spoken to folks like Elon Musk who are concerned about the superintelligent machine — there’s some prudence in thinking about benchmarks that would indicate some general intelligence developing on the horizon. And if we can see that coming, over the course of three decades, five decades, whatever the latest estimates are — if ever, because there are also arguments that this thing’s a lot more complicated than people make it out to be — then future generations, or our kids, or our grandkids, are going to be able to see it coming and figure it out.

There were also a number of interesting responses to the OSTP request for information. Since this document is long and unedited, I’ve sampled some of the responses pertaining to AI safety and long-term AI outcomes below. (Note that MIRI isn’t necessarily endorsing the responses by non-MIRI sources below, and a number of these excerpts are given important nuance by the surrounding text we’ve left out; if a response especially interests you, we recommend reading the original for added context.)


Respondent 77: JoEllen Lukavec Koester, GoodAI

[…] At GoodAI we are investigating suitable meta-objectives that would allow an open-ended, unsupervised evolution of the AGI system as well as guided learning – learning by imitating human experts and other forms of supervised learning. Some of these meta-objectives will be hard-coded from the start, but the system should be also able to learn and improve them on its own, that is, perform meta-learning, such that it learns to learn better in the future.

Teaching the AI system small skills using fine-grained, gradual learning from the beginning will allow us to have more control over the building blocks it will use later to solve novel problems. The system’s behaviour can therefore be more predictable. In this way, we can imprint some human thinking biases into the system, which will be useful for the future value alignment, one of the important aspects of AI safety. […]

Respondent 84: Andrew Critch, MIRI

[…] When we develop powerful reasoning systems deserving of the name “artificial general intelligence (AGI)”, we will need value alignment and/or control techniques that stand up to powerful optimization processes yielding what might appear as “creative” or “clever” ways for the machine to work around our constraints. Therefore, in training the scientists who will eventually develop it, more emphasis is needed on a “security mindset”: namely, to really know that a system will be secure, you need to search creatively for ways in which it might fail. Lawmakers and computer security professionals learn this lesson naturally, from experience with intelligent human adversaries finding loopholes in their control systems. In cybersecurity, it is common to devote a large fraction of R&D time toward actually trying to break into one’s own security system, as a way of finding loopholes.

In my estimation, machine learning researchers currently have less of this inclination than is needed for the safe long-term development of AGI. This can be attributed in part to how the field of machine learning has advanced rapidly of late: via a successful shift of attention toward data-driven (“machine learning”) rather than theoretically-driven (“good old fashioned AI”, “statistical learning theory”) approaches. In data science, it’s often faster to just build something and see what happens than to try to reason from first principles to figure out in advance what will happen. While useful at present, of course we should not approach the final development of super-intelligent machines with the same try-it-and-see methodology, and it makes sense to begin developing a theory now that can be used to reason about a super-intelligent machine in advance of its operation, even in testing phases. […]

Respondent 90: Ian Goodfellow, OpenAI

[…] Over the very long term, it will be important to build AI systems which understand and are aligned with their users’ values. We will need to develop techniques to build systems that can learn what we want and how to help us get it without needing specific rules. Researchers are beginning to investigate this challenge; public funding could help the community address the challenge early rather than trying to react to serious problems after they occur. […]

Respondent 94: Manuel Beltran, Boeing

[…] Advances in picking apart the brain will ultimately lead to, at best, partial brain emulation, at worst, whole brain emulation. If we can already model parts of the brain with software, neuromorphic chips, and artificial implants, the path to greater brain emulation is pretty well set. Unchecked, brain emulation will exasperate the Intellectual Divide to the point of enabling the emulation of the smartest, richest, and most powerful people. While not obvious, this will allow these individuals to scale their influence horizontally across time and space. This is not the vertical scaling that an AGI, or Superintelligence can achieve, but might be even more harmful to society because the actual intelligence of these people is limited, biased, and self-serving. Society must prepare for and mitigate the potential for the Intellectual Divide.

(5) The most pressing, fundamental questions in AI research, common to most or all scientific fields include the questions of ethics in pursuing an AGI. While the benefits of narrow AI are self-evident and should not be impeded, an AGI has dubious benefits and ominous consequences. There needs to be long term engagement on the ethical implications of an AGI, human brain emulation, and performance enhancing brain implants. […]

The AGI research community speaks of an AI that will far surpass human intellect. It is not clear how such an entity would assess its creators. Without meandering into the philosophical debates about how such an entity would benefit or harm humanity, one of the mitigations proposed by proponents of an AGI is that the AGI would be taught to “like” humanity. If there is machine learning to be accomplished along these lines, then the AGI research community requires training data that can be used for teaching the AGI to like humanity. This is a long term need that will overshadow all other activity and has already proven to be very labor intensive as we have seen from the first prototype AGI, Dr. Kristinn R. Thórisson’s Aera S1 at Reykjavik University in Iceland.

Respondent 97: Nick Bostrom, Future of Humanity Institute

[… W]e would like to highlight four “shovel ready” research topics that hold special promise for addressing long term concerns:

Scalable oversight: How can we ensure that learning algorithms behave as intended when the feedback signal becomes sparse or disappears? (See Christiano 2016). Resolving this would enable learning algorithms to behave as if under close human oversight even when operating with increased autonomy.

Interruptibility: How can we avoid the incentive for an intelligent algorithm to resist human interference in an attempt to maximise its future reward? (See our recent progress in collaboration with Google Deepmind in (Orseau & Armstrong 2016).) Resolving this would allow us to ensure that even high capability AI systems can be halted in an emergency.

Reward hacking: How can we design machine learning algorithms that avoid destructive solutions by taking their objective very literally? (See Ring & Orseau, 2011). Resolving this would prevent algorithms from finding unintended shortcuts to their goal (for example, by causing problems in order to get rewarded for solving them).

Value learning: How can we infer the preferences of human users automatically without direct feedback, especially if these users are not perfectly rational? (See Hadfield-Menell et al. 2016 and FHI’s approach to this problem in Evans et al. 2016). Resolving this would alleviate some of the problems above caused by the difficulty of precisely specifying robust objective functions. […]

Respondent 103: Tim Day, the Center for Advanced Technology and Innovation at the U.S. Chamber of Commerce

[…] AI operates within the parameters that humans permit. Hypothetical fears of rogue AI are based on the idea that machines can obtain sentience—a will and consciousness of its own. These suspicions fundamentally misunderstand what Artificial Intelligence is. AI is not a mechanical mystery, rather a human-designed technology that can detect and respond to errors and patterns depending on its operating algorithms and the data set presented to it. It is, however, necessary to scrutinize the way humans, whether through error or malicious intent, can wield AI harmfully. […]

Respondent 104: Alex Kozak, X [formerly Google X]

[…] More broadly, we generally agree that the research topics identified in “Concrete Problems in AI Safety,” a joint publication between Google researchers and others in the industry, are the right technical challenges for innovators to keep in mind in order to develop better and safer real-world products: avoiding negative side effects (e.g. avoiding systems disturbing their environment in pursuit of their goals), avoiding reward hacking (e.g. cleaning robots simply covering up messes rather than cleaning them), creating scalable oversight (i.e. creating systems that are independent enough not to need constant supervision), enabling safe exploration (i.e. limiting the range of exploratory actions a system might take to a safe domain), and creating robustness from distributional shift (i.e. creating systems that are capable of operating well outside their training environment). […]

Respondent 105: Stephen Smith, AAAI

[…] Research is urgently needed to develop and modify AI methods to make them safer and more robust. A discipline of AI Safety Engineering should be created and research in this area should be funded. This field can learn much by studying existing practices in safety engineering in other engineering fields, since loss of control of AI systems is no different from loss of control of other autonomous or semi-autonomous systems. […]

There are two key issues with control of autonomous systems: speed and scale. AI-based autonomy makes it possible for systems to make decisions far faster and on a much broader scale than humans can monitor those decisions. In some areas, such as high speed trading in financial markets, we have already witnessed an “arms race” to make decisions as quickly as possible. This is dangerous, and government should consider whether there are settings where decision-making speed and scale should be limited so that people can exercise oversight and control of these systems.

Most AI researchers are skeptical about the prospects of “superintelligent AI”, as put forth in Nick Bostrom’s recent book and reinforced over the past year in the popular media incommentaries by other prominent individuals from non-AI disciplines. Recent AI successes in narrowly structured problems (e.g., IBM’s Watson, Google DeepMind’s Alpha GO program) have led to the false perception that AI systems possess general, transferrable, human-level intelligence. There is a strong need for improving communication to the public and to policy makers about the real science of AI and its immediate benefits to society. AI research should not be curtailed because of false perceptions of threat and potential dystopian futures. […]

As we move toward applying AI systems in more mission critical types of decision-making settings, AI systems must consistently work according to values aligned with prospective human users and society. Yet it is still not clear how to embed ethical principles and moral values, or even professional codes of conduct, into machines. […]

Respondent 111: Ryan Hagemann, Niskanen Center

[…] AI is unlikely to herald the end times. It is not clear at this point whether a runaway malevolent AI, for example, is a real-world possibility. In the absence of any quantifiable risk along these lines government officials should refrain from framing discussions of AI in alarming terms that suggest that there is a known, rather than entirely speculative, risk. Fanciful doomsday scenarios belong in science fiction novels and high-school debate clubs, not in serious policy discussions about an existing, mundane, and beneficial technology. Ours is already “a world filled with narrowly-tailored artificial intelligence that no one recognizes. As the computer scientist John McCarthy once said: ‘As soon as it works, no one calls it AI anymore.’”

The beneficial consequences of advanced AI are on the horizon and potentially profound. A sampling of these possible benefits include: improved diagnostics and screening for autism; disease prevention through genomic pattern recognition; bridging the genotype-phenotype divide in genetics, allowing scientists to glean a clearer picture of the relationship between genetics and disease, which could introduce a wave of more effective personalized medical care; the development of new ways for the sight- and hearing-impaired to experience sight and sound. To be sure, many of these developments raise certain practical, safety, and ethical concerns. But there are already serious efforts underway by the private ventures developing these AI applications to anticipate and responsibly address these, as well as more speculative, concerns.

Consider OpenAI, “a non-profit artificial intelligence research company.” OpenAI’s goal “is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.” AI researchers are already thinking deeply and carefully about AI decision-making mechanisms in technologies like driverless cars, despite the fact that many of the most serious concerns about how autonomous AI agents make value-based choices are likely many decades out. Efforts like these showcase how the private sector and leading technology entrepreneurs are ahead of the curve when it comes to thinking about some of the more serious implications of developing true artificial general intelligence (AGI) and artificial superintelligence (ASI). It is important to note, however, that true AGI or ASI are unlikely to materialize in the near-term, and the mere possibility of their development should not blind policymakers to the many ways in which artificial narrow intelligence (ANI) has already improved the lives of countless individuals the world over. Virtual personal assistants, such as Siri and Cortana, or advanced search algorithms, such as Google’s search engine, are good examples of already useful applications of narrow AI. […]

The Future of Life Institute has observed that “our civilization will flourish as long as we win the race between the growing power of technology and the wisdom with which we manage it. In the case of AI technology … the best way to win that race is not to impede the former, but to accelerate the latter, by supporting AI safety research.” Government can play a positive and productive role in ensuring the best economic outcomes from developments in AI by promoting consumer education initiatives. By working with private sector developers, academics, and nonprofit policy specialists government agencies can remain constructively engaged in the AI dialogue, while not endangering ongoing developments in this technology.

Respondent 119: Sven Koenig, ACM Special Interest Group on Artificial Intelligence

[…] The public discourse around safety and control would benefit from demystifying AI. The media often concentrates on the big successes or failures of AI technologies, as well as scenarios conjured up in science fiction stories, and features the opinions of celebrity non-experts about future developments of AI technologies. As a result, parts of the public have developed a fear of AI systems developing superhuman intelligence, whereas most experts agree that AI technologies currently work well only in specialized domains, and notions of “superintelligences” and “technological singularity” that will result in AI systems developing super-human, broadly intelligent behavior is decades away and might never be realized. AI technologies have made steady progress over the years, yet there seem to be waves of exaggerated optimism and pessimism about what they can do. Both are harmful. For example, an exaggerated belief in their capabilities can result in AI systems being used (perhaps carelessly) in situations where they should not, potentially failing to fulfil expectations or even cause harm. The unavoidable disappointment can result in a backlash against AI research, and consequently fewer innovations. […]

Respondent 124: Huw Price, University of Cambridge, UK

[…] 3. In his first paper[1] Good tries to estimate the economic value of an ultraintelligent machine. Looking for a benchmark for productive brainpower, he settles impishly on John Maynard Keynes. He notes that Keynes’ value to the economy had been estimated at 100 thousand million British pounds, and suggests that the machine might be good for a million times that – a mega-Keynes, as he puts it.

4. But there’s a catch. “The sign is uncertain” – in other words, it is not clear whether this huge impact would be negative or positive: “The machines will create social problems, but they might also be able to solve them, in addition to those that have been created by microbes and men.” Most of all, Good insists that these questions need serious thought: “These remarks might appear fanciful to some readers, but to me they seem real and urgent, and worthy of emphasis outside science fiction.” […]

Respondent 136: Nate Soares, MIRI

[…] Researchers’ worries about the impact of AI in the long term bear little relation to the doomsday scenarios most often depicted in Hollywood movies, in which “emergent consciousness” allows machines to throw off the shackles of their programmed goals and rebel. The concern is rather that such systems may pursue their programmed goals all too well, and that the programmed goals may not match the intended goals, or that the intended goals may have unintended negative consequences. […]

We believe that there are numerous promising avenues of foundational research which, if successful, could make it possible to get very strong guarantees about the behavior of advanced AI systems — stronger than many currently think is possible, in a time when the most successful machine learning techniques are often poorly understood. We believe that bringing together researchers in machine learning, program verification, and the mathematical study of formal agents would be a large step towards ensuring that highly advanced AI systems will have a robustly beneficial impact on society. […]

In the long term, we recommend that policymakers make use of incentives to encourage designers of AI systems to work together cooperatively, perhaps through multinational and multicorporate collaborations, in order to discourage the development of race dynamics. In light of high levels of uncertainty about the future of AI among experts, and in light of the large potential of AI research to save lives, solve social problems, and serve the common good in the near future, we recommend against broad regulatory interventions in this space. We recommend that effort instead be put towards encouraging interdisciplinary technical research into the AI safety and control challenges that we have outlined above.

Respondent 145: Andrew Kim, Google Inc.

[…] No system is perfect, and errors will emerge. However, advances in our technical capabilities will expand our ability to meet these challenges.

To that end, we believe that solutions to these problems can and should be grounded in rigorous engineering research to provide the creators of these systems with approaches and tools they can use to tackle these problems. “Concrete Problems in AI Safety”, a recent paper from our researchers and others, takes this approach in questions around safety. We also applaud the work of researchers who – along with researchers like Moritz Hardt at Google – are looking at short-term questions of bias and discrimination. […]

Respondent 149: Anthony Aguirre, Future of Life Institute

[…S]ocietally beneficial values alignment of AI is not automatic. Crucially, AI systems are designed not just to enact a set of rules, but rather to accomplish a goal in ways that the programmer does not explicitly specify in advance. This leads to an unpredictability that can [lead] to adverse consequences. As AI pioneer Stuart Russell explains, “No matter how excellently an algorithm maximizes, and no matter how accurate its model of the world, a machine’s decisions may be ineffably stupid, in the eyes of an ordinary human, if its utility function is not well aligned with human values.” (2015).

Since humans rely heavily on shared tacit knowledge when discussing their values, it seems likely that attempts to represent human values formally will often leave out significant portions of what we think is important. This is addressed by the classic stories of the genie in the lantern, the sorcerer’s apprentice, and Midas’ touch. Fulfilling the letter of a goal with something far afield from the spirit of the goal like this is known as “perverse instantiation” (Bostrom [2014]). This can occur because the system’s programming or training has not explored some relevant dimensions that we really care about (Russell 2014). These are easy to miss because they are typically taken for granted by people, and even trying with a lot of effort and a lot of training data, people cannot reliably think of what they’ve forgotten to think about.

The complexity of some AI systems in the future (and even now) is likely to exceed human understanding, yet as these systems become more effective we will have efficiency pressures to be increasingly dependent on them, and to cede control to them. It becomes increasingly difficult to specify a set of explicit rules that is robustly in accord with our values, as the domain approaches a complex open world model, operates in the (necessarily complex) real world, and/or as tasks and environments become so complex as to exceed the capacity or scalability of human oversight[.] Thus more sophisticated approaches will be necessary to ensure that AI systems accomplish the goals they are given without adverse side effects. See references Russell, Dewey, and Tegmark (2015), Taylor (2016), and Amodei et al. for research threads addressing these issues. […]

We would argue that a “virtuous cycle” has now taken hold in AI research, where both public and private R&D leads to systems of significant economic value, which underwrites and incentivizes further research. This cycle can leave insufficiently funded, however, research on the wider implications of, safety of, ethics of, and policy implications of, AI systems that are outside the focus of corporate or even many academic research groups, but have a compelling public interest. FLI helped to develop a set of suggested “Research Priorities for Robust and Beneficial Artificial Intelligence” along these lines (available at http://futureoflife.org/data/documents/research_priorities.pdf); we also support AI safety-relevant research agendas from MIRI (https://intelligence.org/files/TechnicalAgenda.pdf) and as suggested in Amodei et al. (2016). We would advocate for increased funding of research in the areas described by all of these agendas, which address problems in the following research topics: abstract reasoning about superior agents, ambiguity identification, anomaly explanation, computational humility or non-self-centered world models, computational respect or safe exploration, computational sympathy, concept geometry, corrigibility or scalable control, feature identification, formal verification of machine learning models and AI systems, interpretability, logical uncertainty modeling, metareasoning, ontology identification/ refactoring/alignment, robust induction, security in learning source provenance, user modeling, and values modeling. […]


It’s exciting to see substantive discussion of AGI’s impact on society by the White House. The policy recommendations regarding AGI strike us as reasonable, and we expect these developments to help inspire a much more in-depth and sustained conversation about the future of AI among researchers in the field.