Embedded Agency is a write-up by Abram Demski and Scott Garrabrant, available on the AI Alignment Forum here. There’s also a shorter version of the post as a hand-drawn sequence, and a lightly rewritten version on arXiv.

Embedded Agency was first released in 2018, with the arXiv version following in early 2019. In August 2020, Demski and Garrabrant substantially updated all versions.

We’ve included links and references below, listed in the order they come up in the relevant topic/section.

 


 

General

( Text Introduction  —  Illustrated Introduction  ———  MIRI Blog Afterword  —  LessWrong Afterword )

 

Further reading: “Security Mindset and Ordinary Paranoia”; “Agent Foundations for Aligning Machine Intelligence with Human Interests

 


 

Decision Theory

( Text Version  —  Illustrated Version )

 

 


 

Embedded World-Models

( Text Version  —  Illustrated Version )

 

Further reading: “The Problem with AIXI

 


 

Robust Delegation

( Text Version  —  Illustrated Version )

 

Further reading: “Problem of Fully Updated Deference

 


 

Subsystem Alignment

( Text Version  —  Illustrated Version )