Revisions

The following is an incomplete list of papers released by the Machine Intelligence Research Institute that have been substantially edited since they were first put online. Papers are listed based on the year they were originally released.

2019

Risks from Learned Optimization in Advanced Machine Learning Systems

Authors: Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant.

See arXiv for differences between v1 (Jun 5, 2019) and v2 (Jun 11, 2019).

Embedded Agency

Authors: Abram Demski and Scott Garrabrant.

See arXiv for differences between v1 (Feb 25, 2019), v2 (Aug 25, 2020), and v3 (Oct 6, 2020).

This paper is based on series of 2018 slides and blog posts with more detailed notes on changes.

2017

Cheating Death in Damascus

Authors: Benjamin A. Levinstein and Nate Soares. (Authors for v1: Nate Soares and Benjamin A. Levinstein.)

v1 — March 18, 2017: Presented at the Formal Epistemology Workshop (FEW) 2017.
v2 — November 25, 2019: Edited for The Journal of Philosophy 117:5. Also available on the JPhil website.

2016

Logical Induction

Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor.

See arXiv for differences between v1 (Sep 12, 2016), v2 (Sep 19, 2016), v3 (Oct 2, 2016), and v4 (Dec 13, 2017).

Logical Induction (Abridged)

(Title for v1: “Logical Induction: Abridged version, early draft.”)

Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor.

v1 — August 6, 2016: Draft circulated online.
v2 — September 12, 2016: MIRI technical report 2016–2.
v3 — September 12, 2016: Edited.
v4 — September 19, 2016: Edited.
v5 — November 30, 2020: Edited.

Safely Interruptible Agents

Authors: Laurent Orseau and Stuart Armstrong.

v1 — June 1, 2016: Presented at the 32nd Conference on Uncertainty in Artificial Intelligence. Also available on the UAI website.
v2 — October 28, 2016: Non-UAI copy edited.

2015

Asymptotic Logical Uncertainty and the Benford Test

Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Siddharth Bhaskar, Abram Demski, Joanna Garrabrant, George Koleszarik, and Evan Lloyd. (Authors for v1: Scott Garrabrant, Siddharth Bhaskar, Abram Demski, Joanna Garrabrant, George Koleszarik, and Evan Lloyd.)

v1 — October 12, 2015: MIRI technical report 2015-11; arXiv:1510.03370 [cs.LG]. Also available on the MIRI website.
v2 — June 12, 2016: Edited for the AGI-16 conference.

The Value Learning Problem

Authors: Nate Soares.

v1 — January 29, 2015: MIRI technical report 2015–4.
v2 — March 5, 2016: Edited and subsequently presented at the IJCAI-16 Ethics for Artificial Intelligence workshop. Reprinted in 2018 in Artificial Intelligence Safety and Security.

Formalizing Two Problems of Realistic World Models

Authors: Nate Soares.

v1 — January 22, 2015: MIRI technical report 2015–3.
v2 — June 17, 2016: Edited.

2014

Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda

(Title for v1 and v2: “Aligning Superintelligence with Human Interests: A Technical Research Agenda.”)

Authors: Nate Soares and Benya Fallenstein. (Authors for v1 and v2: Nate Soares and Benja Fallenstein.)

v1 — December 23, 2014: MIRI technical report 2014–8.
v2 — June 25, 2015: Edited for The Technological Singularity: Managing the Journey (published May 2017) and put online early.
v3 — July 15, 2016: Edited and renamed (to distinguish this agenda from the “Alignment for Advanced Machine Learning Systems” agenda). Changes incorporated into the Technological Singularity version.

2012

How We’re Predicting AI – or Failing To

Authors: Stuart Armstrong and Kaj Sotala.

v1 — November 5, 2012: Published in Beyond AI: Artificial Dreams.
v2 — October 3, 2017: The original findings were based on a dataset error. A note was added to the draft to warn the reader about this.

2010

Timeless Decision Theory

Authors: Eliezer Yudkowsky.

v1 — November 12, 2010: Working paper.
v2 — May 4, 2018: Edited.