Foundational Research
» Future Research Directions | See the trajectory of our work
» Research opportunities | Join our team
 
Abstract   |   Intelligence Explosion and Machine Ethics

Many researchers have argued that a self-improving artificial intelligence (AI) could become so vastly more powerful than humans that we would not be able to stop it from achieving its goals. If so, and if the AI’s goals differ from ours, then this could be disastrous for humans. One proposed solution is to program the AI’s goal system to want what we want before the AI self-improves beyond our capacity to control it. Unfortunately, it is difficult to specify what we want. After clarifying what we mean by “intelligence,” we offer a series of “intuition pumps” from the field of moral philosophy for our conclusion that human values are complex and difficult to specify. We then survey the evidence from the psychology of motivation, moral psychology, and neuroeconomics that supports our position. We conclude by recommending ideal preference theories of value as a promising approach for developing a machine ethics suitable for navigating an intelligence explosion. Read the full paper »


Luke Muehlhauser & Louie Helm
Machine Intelligence Research Institute

 

Abstract   |   Learning What to Value
I. J. Good’s intelligence explosion theory predicts that ultra-intelligent agents will undergo a process of repeated self-improvement; in the wake of such an event, how well our values are fulfilled would depend upon the goals of these agents. In this paper, we examine how these ultra-intelligent agents may, through reinforcement learning, effectively maximize their expected rewards to the unprecedented detriment of human welfare. To solve this problem, we define value learners: agents designed to learn and maximize any initially unknown utility function so long as they are given the ability to discern what constitutes evidence about that utility function.

We have an optimality notion for a value-learning agent:

Read the full paper »


Daniel Dewey
Machine Intelligence Research Institute

 

Abstract   |   How We’re Predicting AI—or Failing To
This paper will look at the various predictions that have been made about AI and propose decomposition schemas for analyzing them. It will propose a variety of theoretical tools for analyzing, judging, and improving these predictions. Focusing specifically on timeline predictions (dates given by which we should expect the creation of AI), it will show that there are strong theoretical grounds to expect predictions to be quite poor in this area. Using a database of 95 AI timeline predictions, it will show that these expectations are born out in practice: expert predictions contradict each other considerably, and are indistinguishable from non-expert predictions and past failed predictions. Predictions that AI lie 15 to 25 years in the future are the most common, from experts and non-experts alike. Read the full paper »

Stuart Armstrong & Kaj Sotala
Future of Humanity Institute &
Machine Intelligence Research Institute

 

Abstract   |   Intelligence Explosion: Evidence and Import
We review the evidence for and against three claims: (1) there is a substantial chance we will create human-level AI before 2100, that (2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an “intelligence explosion,” and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it. We conclude with recommendations for increasing the odds of a controlled intelligence explosion relative to an uncontrolled intelligence explosion. Read the full paper »

Luke Muehlhauser & Anna Salamon
Machine Intelligence Research Institute

 

Other Published Works

Luke Muehlhauser and Chris Williamson (2013). Ideal Advisor Theories and Personal CEV. Machine Intelligence Research Institute.
Paul Christiano, Eliezer Yudkowsky, Marcello Herreshoff, and Mihaly Barasz (2013). Definability of “Truth” in Probabilistic Logic (draft). Machine Intelligence Research Institute.
Eliezer Yudkowsky (2013). Intelligence Explosion Microeconomics. Machine Intelligence Research Institute.
Alex Altair (2013). A Comparison of Decision Algorithms on Newcomblike Problems. Machine Intelligence Research Institute.
Luke Muehlhauser and Louie Helm (2013). Intelligence Explosion and Machine Ethics. In Singularity Hypotheses. Springer.
Luke Muehlhauser and Anna Salamon (2013). Intelligence Explosion: Evidence and Import. In Singularity Hypotheses. Springer. (Español)
Luke Muehlhauser (2013). Intelligence Explosion FAQ. Machine Intelligence Research Institute. (HTML)
Roman Yampolskiy and Joshua Fox (2013). Artificial General Intelligence and the Human Mental Model. In Singularity Hypotheses. Springer.
Nick Bostrom and Eliezer Yudkowsky (2013). The Ethics of Artificial Intelligence. In The Cambridge Handbook of Artificial Intelligence. Cambridge University Press.
Kaj Sotala (2012). Advantages of Artificial Intelligences, Uploads, and Digital Minds. International Journal of Machine Consciousness 4 (1): 275-291.
Luke Muehlhauser (2012). AI Risk Bibliography 2012. Machine Intelligence Research Institute.
Bill Hibbard (2012). Avoiding Unintended AI Behaviors. In Proceedings of AGI 2012. Springer.
Bill Hibbard (2012). Decision Support for Safe AI Design. In Proceedings of AGI 2012. Springer.
Kaj Sotala and Harri Valpola (2012). Coalescing Minds: Brain Uploading-Related Group Mind Scenarios. International Journal of Machine Consciousness 4 (1): 293–312.
Carl Shulman and Nick Bostrom (2012). How Hard Is Artificial Intelligence? Evolutionary Arguments and Selection Effects. Journal of Consciousness Studies 19 (7–8): 103–130.
Stuart Armstrong and Kaj Sotala (2012). How We’re Predicting AI – or Failing to. Beyond AI Conference.
Roman Yampolskiy and Joshua Fox (2012). Safety Engineering for Artificial General Intelligence. Topoi.
Anna Salamon and Luke Muehlhauser (2012). Singularity Summit 2011 Workshop Report. Machine Intelligence Research Institute.
Eliezer Yudkowsky (2011). Complex Value Systems are Required to Realize Valuable Futures. In Proceedings of AGI 2011. Springer.
Daniel Dewey (2011). Learning What to Value. In Proceedings of AGI 2011. Springer.
Peter de Blanc (2011). Ontological Crises in Artificial Agents’ Value Systems. Machine Intelligence Research Institute.
Nick Tarleton (2010). Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics. Machine Intelligence Research Institute.

Steven Kaas, Steve Rayhawk, Anna Salamon and Peter Salamon (2010). Economic Implications of Software Minds. Machine Intelligence Research Institute.
Kaj Sotala (2010). From Mostly Harmless to Civilization-Threatening. In Proceedings of ECAP 2010. Verlag Dr. Hut.
Anna Salamon, Steve Rayhawk, and János Kramár (2010). How Intelligible is Intelligence?. In Proceedings of ECAP 2010. Verlag Dr. Hut.
Carl Shulman and Anders Sandberg (2010). Implications of a Software-Limited Singularity. In Proceedings of ECAP 2010. Verlag Dr. Hut.
Carl Shulman (2010). Omohundro’s “Basic AI Drives” and Catastrophic Risks. Machine Intelligence Research Institute.
Eliezer Yudkowsky, Carl Shulman, Anna Salamon, Rolf Nelson, Steven Kaas, Steve Rayhawk, and Tom McCabe (2010). Reducing Long-Term Catastrophic Risks from Artificial Intelligence. Machine Intelligence Research Institute.
Joshua Fox and Carl Shulman (2010). Superintelligence Does Not Imply Benevolence. In Proceedings of ECAP 2010. Verlag Dr. Hut.
Eliezer Yudkowsky (2010). Timeless Decision Theory. Machine Intelligence Research Institute.
Carl Shulman (2010). Whole Brain Emulation and the Evolution of Superorganisms. Machine Intelligence Research Institute.
Carl Shulman and Stuart Armstrong (2009). Arms Control and Intelligence Explosions. Paper presented at ECAP 2009.
Steve Rayhawk, Anna Salamon, Michael Anissimov, Thomas McCabe, and Rolf Nelson (2009). Changing the Frame of AI Futurism: From Storytelling to Heavy-Tailed, High-Dimensional Probability Distributions. Paper presented at ECAP 2009.
Peter de Blanc (2009). Convergence of Expected Utility for Universal Artificial Intelligence. Machine Intelligence Research Institute.
Carl Shulman, Henrik Jonsson, and Nick Tarleton (2009). Machine Ethics and Superintelligence. In Proceedings of AP-CAP 2009.
Carl Shulman, Nick Tarleton, and Henrik Jonsson (2009). Which Consequentialism? Machine Ethics and Moral Divergence. In Proceedings of AP-CAP 2009.
Eliezer Yudkowsky (2008). Artificial Intelligence as a Positive and Negative Factor in Global Risk. In Global Catastrophic Risks. Oxford University Press. (官话) (Italiano) (한국어) (Português)(Pу́сский)
Eliezer Yudkowsky (2008). Cognitive Biases Potentially Affecting Judgement of Global Risks. In Global Catastrophic Risks. Oxford University Press. (Italiano) (Pу́сский)
Eliezer Yudkowsky (2007). Levels of Organization in General Intelligence. In Artificial General Intelligence. Springer. (Pу́сский)
Eliezer Yudkowsky (2005). A Technical Explanation of Technical Explanation. Machine Intelligence Research Institute.
Eliezer Yudkowsky (2004). Coherent Extrapolated Volition. Machine Intelligence Research Institute.
Eliezer Yudkowsky (2001). Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures. Machine Intelligence Research Institute.

As covered in:
  •  

    WIRED Popular Science New York Times Bloomberg Businessweek Scientific American