Fundamental Difficulties in Aligning Advanced AI


What is it?: A talk by Eliezer Yudkowsky given at New York University on October 14, 2016 for the The Ethics of Artificial Intelligence. The conference was hosted by the NYU Center for Mind, Brain and Consciousness in conjunction with the NYU Center for Bioethics.

Video: Ethics of AI: “AI and Human Values” section (37 minutes in)

Slides: Without transitions / With transitions

Abstract: What could possibly be hard about pointing a smarter-than-human artificial agent in the direction of intuitively intended values, meta-values, or simple tasks? A brief overview of the fundamental difficulties of AI alignment and why you should take them seriously.


Notes / references / resources for learning more.

Some relevant materials to topics raised at the conference:

  • Coherent Extrapolated Volition as an alignment target for the second AI ever built. A powerful AI that you are very confident of your ability to align on a complicated target, should be asked to do what everyone in the human species would mostly agree on wanting, if we knew everything the AI knew and thought as fast as the AI did.
  • Extrapolated volition as a metaethical account of normativity.
  • Intelligence Explosion Microeconomics–the case for fast, large capability gains.

The best general introductions to the topic of smarter-than-human artificial intelligence are plausibly Nick Bostrom’s Superintelligence and Stuart Armstrong’s Smarter Than Us. For a much shorter introduction, see Yudkowsky’s recent guest post on EconLog.

The shutdown-button problem is discussed in Soares et al.’s “Corrigibility“, and was first studied by Stuart Armstrong.

The Optimizer’s Curse was observed by James E. Smith and Robert Winkler.

For more about orthogonal final goals and convergent instrumental strategies, see Bostrom’s “The Superintelligent Will” (also reproduced in Superintelligence). Benson-Tilsen and Soares’ “Formalizing Convergent Instrumental Goals” provides a toy model.

The smile maximizer fable is discussed more fully in Soares’ “The Value Learning Problem.” See also the Arbital pages on Edge Instantiation, Context Disaster, and Nearest Unblocked Strategy.

See the MIRI FAQ and GiveWell’s report on potential risks from advanced AI for quick explanations of why AI is likely to be able to surpass human cognitive capabilities, among other topics. Bensinger’s When AI Accelerates AI notes general reasons to expect capability speedup, while “Intelligence Explosion Microeconomics” delves into the specific question of whether self-modifying AI is likely to result in accelerating AI progress.

Muehlhauser notes the analogy between computer security and AI alignment research in AI Risk and the Security Mindset.

MIRI’s technical research agenda summarizes many of the field’s core open problems.


Email contact@intelligence.org if you have any questions, and see intelligence.org/get-involved for information about opportunities to collaborate on AI alignment projects.