The following is an incomplete list of papers released by the Machine Intelligence Research Institute that have been substantially edited since they were first put online. Papers are listed based on the year they were originally released.

 


 

2019

 

Risks from Learned Optimization in Advanced Machine Learning Systems

Authors: Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant.

See arXiv for differences between v1 (Jun 5, 2019) and v2 (Jun 11, 2019).

 

Embedded Agency

Authors: Abram Demski and Scott Garrabrant.

See arXiv for differences between v1 (Feb 25, 2019), v2 (Aug 25, 2020), and v3 (Oct 6, 2020).

This paper is based on series of 2018 slides and blog posts with more detailed notes on changes.

 


 

2017

 

Cheating Death in Damascus

Authors: Benjamin A. Levinstein and Nate Soares. (Authors for v1: Nate Soares and Benjamin A. Levinstein.)

 


 

2016

 

Logical Induction

Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor.

See arXiv for differences between v1 (Sep 12, 2016), v2 (Sep 19, 2016), v3 (Oct 2, 2016), and v4 (Dec 13, 2017).

 

Logical Induction (Abridged)

(Title for v1: “Logical Induction: Abridged version, early draft.”)

Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor.

 

Safely Interruptible Agents

Authors: Laurent Orseau and Stuart Armstrong.

 


 

2015

 

Asymptotic Logical Uncertainty and the Benford Test

Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Siddharth Bhaskar, Abram Demski, Joanna Garrabrant, George Koleszarik, and Evan Lloyd. (Authors for v1: Scott Garrabrant, Siddharth Bhaskar, Abram Demski, Joanna Garrabrant, George Koleszarik, and Evan Lloyd.)

 

The Value Learning Problem

Authors: Nate Soares.

  • v1 — January 29, 2015: MIRI technical report 2015–4.
  • v2 — March 5, 2016: Edited and subsequently presented at the IJCAI-16 Ethics for Artificial Intelligence workshop. Reprinted in 2018 in Artificial Intelligence Safety and Security.

 

Formalizing Two Problems of Realistic World Models

Authors: Nate Soares.

 


 

2014

 

Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda

(Title for v1 and v2: “Aligning Superintelligence with Human Interests: A Technical Research Agenda.”)

Authors: Nate Soares and Benya Fallenstein. (Authors for v1 and v2: Nate Soares and Benja Fallenstein.)

 


 

2012

 

How We’re Predicting AI – or Failing To

Authors: Stuart Armstrong and Kaj Sotala.

  • v1 — November 5, 2012: Published in Beyond AI: Artificial Dreams.
  • v2 — October 3, 2017: The original findings were based on a dataset error. A note was added to the draft to warn the reader about this.

 


 

2010

 

Timeless Decision Theory

Authors: Eliezer Yudkowsky.