All MIRI Publications

Articles

2020 – 2021

S Garrabrant. 2021. “Temporal Inference with Finite Factored Sets.” arXiv: 2109.11513 [cs.AI].

S Garrabrant, D Herrmann, and J Lopez-Wild. 2021. “Cartesian Frames.” arXiv: 2109.10996 [math.CT].

E Hubinger. 2020. “An Overview of 11 Proposals for Building Safe Advanced AI.” arXiv:2012.07532 [cs.LG].

2019

A Demski and S Garrabrant. 2019. “Embedded Agency.” arXiv:1902.09469 [cs.AI].

E Hubinger, C van Merwijk, V Mikulik, J Skalse, and S Garrabrant. 2019. “Risks from Learned Optimization in Advanced Machine Learning Systems.” arXiv:1906.01820 [cs.AI].

V Kosoy. 2019. “Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help.” Presented at the Safe Machine Learning workshop at ICLR.

2018

S Armstrong and S Mindermann. 2018. “Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents.” In Advances in Neural Information Processing Systems 31.

D Manheim and S Garrabrant. 2018. “Categorizing Variants of Goodhart’s Law.” arXiv:1803.04585 [cs.AI].

2017

R Carey. 2018. “Incorrigibility in the CIRL Framework.” arXiv:1709.06275 [cs.AI]. Paper presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.

A Critch. 2017. “Toward Negotiable Reinforcement Learning: Shifting Priorities in Pareto Optimal Sequential Decision-Making.” arXiv:1701.01302 [cs.AI].

S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2017. “A Formal Approach to the Problem of Logical Non-Omniscience.” Paper presented at the 16th conference on Theoretical Aspects of Rationality and Knowledge.

K Grace, J Salvatier, A Dafoe, B Zhang, and O Evans. 2017. “When Will AI Exceed Human Performance? Evidence from AI Experts.” arXiv:1705.08807 [cs.AI].

V Kosoy. 2017. “Forecasting Using Incomplete Models.” arXiv:1705.04630 [cs.LG].

N Soares and B Levinstein. 2020. “Cheating Death in Damascus.” The Journal of Philosophy 117(5):237–266. Previously presented at the 14th Annual Formal Epistemology Workshop.

E Yudkowsky and N Soares. 2017. “Functional Decision Theory: A New Theory of Instrumental Rationality.” arXiv:1710.05060 [cs.AI].

2016

T Benson-Tilsen and N Soares. 2016. “Formalizing Convergent Instrumental Goals.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.

A Critch. 2019. “A Parametric, Resource-Bounded Generalization of Löb’s Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory.” arXiv:1602.04184 [cs:GT]. The Journal of Symbolic Logic 84(4):1368–1381. Previously published as “Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents.”

S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2016. “Logical Induction.” arXiv:1609.03543 [cs.AI].

S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2016. “Logical Induction (Abridged).” MIRI technical report 2016–2.

S Garrabrant, B Fallenstein, A Demski, and N Soares. 2016. “Inductive Coherence.” arXiv:1604.05288 [cs:AI]. Previously published as “Uniform Coherence.”

S Garrabrant, N Soares, and J Taylor. 2016. “Asymptotic Convergence in Online Learning with Unbounded Delays.” arXiv:1604.05280 [cs:LG].

V Kosoy and A Appel. 2020. “Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm.” arXiv:1608.04112 [cs.CC]. Forthcoming in Journal of Applied Logics.

J Leike, J Taylor, and B Fallenstein. 2016. “A Formal Solution to the Grain of Truth Problem.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.

L Orseau and S Armstrong. 2016. “Safely Interruptible Agents.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.

K Sotala. 2016. “Defining Human Values for Value Learners.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.

J Taylor. 2016. “Quantilizers: A Safer Alternative to Maximizers for Limited Optimization.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.

J Taylor, E Yudkowsky, P LaVictoire, and A Critch. 2016. “Alignment for Advanced Machine Learning Systems.” MIRI technical report 2016–1.

2015

B Fallenstein and R Kumar. 2015. “Proof-Producing Reflection for HOL: With an Application to Model Polymorphism.” In Interactive Theorem Proving: 6th International Conference, ITP 2015, Nanjing, China, August 24-27, 2015, Proceedings. Springer.

B Fallenstein and N Soares. 2015. “Vingean Reflection: Reliable Reasoning for Self-Improving Agents.” MIRI technical report 2015–2.

B Fallenstein, N Soares, and J Taylor. 2015. “Reflective Variants of Solomonoff Induction and AIXI.” In Proceedings of AGI 2015. Springer. Previously published as MIRI technical report 2015–8.

B Fallenstein, J Taylor, and P Christiano. 2015. “Reflective Oracles: A Foundation for Classical Game Theory.” arXiv:1508.04145 [cs.AI]. Previously published as MIRI technical report 2015–7. Published in abridged form as “Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence” in Proceedings of LORI 2015.

S Garrabrant, S Bhaskar, A Demski, J Garrabrant, G Koleszarik, and E Lloyd. 2016. “Asymptotic Logical Uncertainty and the Benford Test.” arXiv:1510.03370 [cs.LG]. Paper presented at the Ninth Conference on Artificial General Intelligence. Previously published as MIRI technical report 2015–11.

K Grace. 2015. “The Asilomar Conference: A Case Study in Risk Mitigation.” MIRI technical report 2015–9.

K Grace. 2015. “Leó Szilárd and the Danger of Nuclear Weapons: A Case Study in Risk Mitigation.” MIRI technical report 2015–10.

P LaVictoire. 2015. “An Introduction to Löb’s Theorem in MIRI Research.” MIRI technical report 2015–6.

N Soares. 2015. “Aligning Superintelligence with Human Interests: An Annotated Bibliography.” MIRI technical report 2015–5.

N Soares. 2015. “Formalizing Two Problems of Realistic World-Models.” MIRI technical report 2015–3.

N Soares. 2018. “The Value Learning Problem.” In Artificial Intelligence Safety and Security. Chapman and Hall. Previously presented at the IJCAI 2016 Ethics for Artificial Intelligence workshop, and published earlier as MIRI technical report 2015–4.

N Soares and B Fallenstein. 2015. “Questions of Reasoning under Logical Uncertainty.” MIRI technical report 2015–1.

N Soares and B Fallenstein. 2015. “Toward Idealized Decision Theory.” arXiv:1507.01986 [cs.AI]. Previously published as MIRI technical report 2014–7. Published in abridged form as “Two Attempts to Formalize Counterpossible Reasoning in Deterministic Settings” in Proceedings of AGI 2015.

K Sotala. 2015. “Concept Learning for Safe Autonomous AI.” Paper presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop.

2014

S Armstrong, K Sotala, and S Ó hÉigeartaigh. 2014. “The Errors, Insights and Lessons of Famous AI Predictions – and What They Mean for the Future.” Journal of Experimental & Theoretical Artificial Intelligence 26 (3): 317–342.

M Bárász, P Christiano, B Fallenstein, M Herreshoff, P LaVictoire, and E Yudkowsky. 2014. “Robust Cooperation on the Prisoner’s Dilemma: Program Equilibrium via Provability Logic.” arXiv:1401.5577 [cs.GT].

T Benson-Tilsen. 2014. “UDT with Known Search Order.” MIRI technical report 2014–4.

N Bostrom and E Yudkowsky. 2018. “The Ethics of Artificial Intelligence.” In Artificial Intelligence Safety and Security. Chapman and Hall. Previously published in The Cambridge Handbook of Artificial Intelligence (2014).

P Christiano. 2014. “Non-Omniscience, Probabilistic Inference, and Metamathematics.” MIRI technical report 2014–3.

B Fallenstein. 2014. “Procrastination in Probabilistic Logic.” Working paper.

B Fallenstein and N Soares. 2014. “Problems of Self-Reference in Self-Improving Space-Time Embedded Intelligence.” In Proceedings of AGI 2014. Springer.

B Fallenstein and N Stiennon. 2014. “‘Loudness’: On Priors over Preference Relations.” Brief technical note.

P LaVictoire, B Fallenstein, E Yudkowsky, M Bárász, P Christiano and M Herreshoff. 2014. “Program Equilibrium in the Prisoner’s Dilemma via Löb’s Theorem.” Paper presented at the AAAI 2014 Multiagent Interaction without Prior Coordination Workshop.

L Muehlhauser and N Bostrom. 2014. “Why We Need Friendly AI.” Think 13 (36): 42–47.

L Muehlhauser and B Hibbard. 2014. “Exploratory Engineering in AI.” Communications of the ACM 57 (9): 32–34.

C Shulman and N Bostrom. 2014. “Embryo Selection for Cognitive Enhancement: Curiosity or Game-Changer?” Global Policy 5 (1): 85–92.

N Soares. 2014. “Tiling Agents in Causal Graphs.” MIRI technical report 2014–5.

N Soares and B Fallenstein. 2014. “Botworld 1.1.” MIRI technical report 2014–2.

N Soares and B Fallenstein. 2017. “Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda.” In The Technological Singularity: Managing the Journey. Springer. Previously published as MIRI technical report 2014–8 under the name “Aligning Superintelligence with Human Interests: A Technical Research Agenda.”

N Soares, B Fallenstein, E Yudkowsky, and S Armstrong. 2015. “Corrigibility.” Paper presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop. Previously published as MIRI technical report 2014–6.

E Yudkowsky. 2014. “Distributions Allowing Tiling of Staged Subjective EU Maximizers.” MIRI technical report 2014–1.

2013

A Altair. 2013. “A Comparison of Decision Algorithms on Newcomblike Problems.” Working paper. MIRI.

S Armstrong, N Bostrom, and C Shulman. 2015. “Racing to the Precipice: A Model of Artificial Intelligence Development.” AI & Society (DOI 10.1007/s00146-015-0590-7): 1-6. Previously published as Future of Humanity Institute technical report 2013–1.

P Christiano, E Yudkowsky, M Herreshoff, and M Bárász. 2013. “Definability of “Truth” in Probabilistic Logic.” Draft. MIRI.

B Fallenstein. 2013. “The 5-and-10 Problem and the Tiling Agents Formalism.” MIRI technical report 2013–9.

B Fallenstein. 2013. “Decreasing Mathematical Strength in One Formalization of Parametric Polymorphism.” Brief technical note. MIRI.

B Fallenstein. 2013. “An Infinitely Descending Sequence of Sound Theories Each Proving the Next Consistent.” MIRI technical report 2013–6.

B Fallenstein and A Mennen. 2013. “Predicting AGI: What Can We Say When We Know So Little?” Working paper. MIRI.

K Grace. 2013. “Algorithmic Progress in Six Domains.” MIRI technical report 2013–3.

J Hahn. 2013. “Scientific Induction in Probabilistic Metamathematics.” MIRI technical report 2013–4.

L Muehlhauser. 2013. “Intelligence Explosion FAQ.” Working paper. MIRI. (HTML)

L Muehlhauser and L Helm. 2013. “Intelligence Explosion and Machine Ethics.” In Singularity Hypotheses. Springer.

L Muehlhauser and A Salamon. 2013. “Intelligence Explosion: Evidence and Import.” In Singularity Hypotheses. Springer. (Español) (Français) (Italiano)

L Muehlhauser and C Williamson. 2013. “Ideal Advisor Theories and Personal CEV.” Working paper. MIRI.

W Sawin and A Demski. 2013. “Computable Probability Distributions Which Converge on Believing True Π₁ Sentences Will Disbelieve True Π₂ Sentences.” MIRI technical Report 2013–10.

N Soares. 2013. “Fallenstein’s Monster.” MIRI technical report 2013–7.

K Sotala and R Yampolskiy. 2014. “Responses to Catastrophic AGI Risk: A Survey.” Physica Scripta 90 (1): 1-33. Previously published as MIRI technical report 2013–2.

N Stiennon. 2013. “Recursively-Defined Logical Theories Are Well-Defined.” MIRI technical report 2013–8.

R Yampolskiy and J Fox. 2013. “Artificial General Intelligence and the Human Mental Model.” In Singularity Hypotheses. Springer.

R Yampolskiy and J Fox. 2013. “Safety Engineering for Artificial General Intelligence.” Topoi 32 (2): 217–226.

E Yudkowsky. 2013. “Intelligence Explosion Microeconomics.” MIRI technical report 2013–1.

E Yudkowsky. 2013. “The Procrastination Paradox.” Brief technical note. MIRI.

E Yudkowsky and M Herreshoff. 2013. “Tiling Agents for Self-Modifying AI, and the Löbian Obstacle.” Draft. MIRI.

2012

S Armstrong and K Sotala. 2012. “How We’re Predicting AI – or Failing To.” In Beyond AI: Artificial Dreams. Pilsen: University of West Bohemia.

B Hibbard. 2012. “Avoiding Unintended AI Behaviors.” In Proceedings of AGI 2012. Springer.

B Hibbard. 2012. “Decision Support for Safe AI Design.” In Proceedings of AGI 2012. Springer.

L Muehlhauser. 2012. “AI Risk Bibliography 2012.” Working paper. MIRI.

A Salamon and L Muehlhauser. 2012. “Singularity Summit 2011 Workshop Report.” Working paper. MIRI.

C Shulman and N Bostrom. 2012. “How Hard Is Artificial Intelligence? Evolutionary Arguments and Selection Effects.” Journal of Consciousness Studies 19 (7–8): 103–130.

K Sotala. 2012. “Advantages of Artificial Intelligences, Uploads, and Digital Minds.” International Journal of Machine Consciousness 4 (1): 275-291.

K Sotala and H Valpola. 2012. “Coalescing Minds: Brain Uploading-Related Group Mind Scenarios.” International Journal of Machine Consciousness 4 (1): 293–312.

2011

P de Blanc. 2011. “Ontological Crises in Artificial Agents’ Value Systems.” arXiv:1105.3821 [cs.AI]

D Dewey. 2011. “Learning What to Value.” In Proceedings of AGI 2011. Springer.

E Yudkowsky. 2011. “Complex Value Systems Are Required to Realize Valuable Futures.” In Proceedings of AGI 2011. Springer.

2010

J Fox and C Shulman. 2010. “Superintelligence Does Not Imply Benevolence.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

S Kaas, S Rayhawk, A Salamon, and P Salamon. 2010. “Economic Implications of Software Minds.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

A Salamon, S Rayhawk, and J Kramár. 2010. “How Intelligible Is Intelligence?” In Proceedings of ECAP 2010. Verlag Dr. Hut.

C Shulman. 2010. “Omohundro’s ‘Basic AI Drives’ and Catastrophic Risks.” Working paper. MIRI.

C Shulman. 2010. “Whole Brain Emulation and the Evolution of Superorganisms.” Working paper. MIRI.

C Shulman and A Sandberg. 2010. “Implications of a Software-Limited Singularity.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

K Sotala. 2010. “From Mostly Harmless to Civilization-Threatening.” In Proceedings of ECAP 2010. Verlag Dr. Hut.

N Tarleton. 2010. “Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics.” Working paper. MIRI.

E Yudkowsky. 2010. “Timeless Decision Theory.” Working paper. MIRI.

E Yudkowsky, C Shulman, A Salamon, R Nelson, S Kaas, S Rayhawk, and T McCabe. 2010. “Reducing Long-Term Catastrophic Risks from Artificial Intelligence.” Working paper. MIRI.

2001-2009

P de Blanc. 2009. “Convergence of Expected Utility for Universal Artificial Intelligence.” arXiv:0907.5598 [cs.AI].

S Rayhawk, A Salamon, M Anissimov, T McCabe, and R Nelson. 2009. “Changing the Frame of AI Futurism: From Storytelling to Heavy-Tailed, High-Dimensional Probability Distributions.” Paper presented at ECAP 2009.

C Shulman and S Armstrong. 2009. “Arms Control and Intelligence Explosions.” Paper presented at ECAP 2009.

C Shulman, H Jonsson, and N Tarleton. 2009. “Machine Ethics and Superintelligence.” In Proceedings of AP-CAP 2009. University of Tokyo.

C Shulman, N Tarleton, and H Jonsson. 2009. “Which Consequentialism? Machine Ethics and Moral Divergence.” In Proceedings of AP-CAP 2009. University of Tokyo.

E Yudkowsky. 2008. “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” In Global Catastrophic Risks. Oxford University Press. Published in abridged form as “Friendly Artificial Intelligence” in Singularity Hypotheses. (官话) (Italiano) (한국어) (Português) (Pу́сский)

E Yudkowsky. 2008. “Cognitive Biases Potentially Affecting Judgement of Global Risks.” In Global Catastrophic Risks. Oxford University Press. (Italiano) (Pу́сский) (Portuguese)

E Yudkowsky. 2007. “Levels of Organization in General Intelligence.” In Artificial General Intelligence (Cognitive Technologies). Springer.

E Yudkowsky. 2004. “Coherent Extrapolated Volition.” Working paper. MIRI.

E Yudkowsky. 2001. “Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures.” Working paper. MIRI.

Books

Inadequate Equilibria: Where and How Civilizations Get Stuck

E Yudkowsky (2017)

When should you think that you may be able to do something unusually well? When you’re trying to outperform in a given area, it’s important that you have a sober understanding of your relative competencies. The story only ends there, however, if you’re fortunate enough to live in an adequate civilization.

Eliezer Yudkowsky’s Inadequate Equilibria is a sharp and lively guidebook for anyone questioning when and how they can know better, and do better, than the status quo. Freely mixing debates on the foundations of rational decision-making with tips for everyday life, Yudkowsky explores the central question of when we can (and can’t) expect to spot systemic inefficiencies, and exploit them.

Rationality: From AI to Zombies

E Yudkowsky (2015)

When human brains try to do things, they can run into some very strange problems. Self-deception, confirmation bias, magical thinking—it sometimes seems our ingenuity is boundless when it comes to shooting ourselves in the foot.

Map and Territory and the rest of the Rationality: From AI to Zombies series asks what a “martial art” of rationality would look like. In this series, Eliezer Yudkowsky explains the findings of cognitive science, and the ideas of naturalistic philosophy, that help provide a useful background for understanding MIRI’s research and for generally approaching ambitious problems.

Smarter Than Us: The Rise of Machine Intelligence

S Armstrong (2014)

What happens when machines become smarter than humans? Humans steer the future not because we’re the strongest or the fastest but because we’re the smartest. When machines become smarter than humans, we’ll be handing them the steering wheel. What promises—and perils—will these powerful machines present? Stuart Armstrong’s new book navigates these questions with clarity and wit.

Facing the Intelligence Explosion

L Muehlhauser (2013)

Sometime this century, machines will surpass human levels of intelligence and ability. This event—the “intelligence explosion”—will be the most important event in our history, and navigating it wisely will be the most important thing we can ever do.

Luminaries from Alan Turing and I. J. Good to Bill Joy and Stephen Hawking have warned us about this. Why do we think Hawking and company are right, and what can we do about it?

Facing the Intelligence Explosion is Muehlhauser’s attempt to answer these questions.

The Hanson-Yudkowsky AI-Foom Debate

R Hanson and E Yudkowsky (2013)

In late 2008, economist Robin Hanson and AI theorist Eliezer Yudkowsky conducted an online debate about the future of artificial intelligence, and in particular about whether generally intelligent AIs will be able to improve their own capabilities very quickly (a.k.a. “foom”). James Miller and Carl Shulman also contributed guest posts to the debate.

The original debate took place in a long series of blog posts, which are collected here. This book also includes a transcript of a 2011 in-person debate between Hanson and Yudkowsky on this subject, a summary of the debate written by Kaj Sotala, and a 2013 technical report on AI takeoff dynamics (“intelligence explosion microeconomics”) written by Yudkowsky.