Embedded Agency is a write-up by Abram Demski and Scott Garrabrant, available on the AI Alignment Forum here. There’s also a shorter version of the post as a hand-drawn sequence, and a lightly rewritten version on arXiv.

Embedded Agency was first released in 2018, with the arXiv version following in early 2019. In August 2020, Demski and Garrabrant substantially updated all versions.

We’ve included links and references below, listed in the order they come up in the relevant topic/section.




( Text Introduction  —  Illustrated Introduction  ———  MIRI Blog Afterword  —  LessWrong Afterword )


Further reading: “Security Mindset and Ordinary Paranoia”; “Agent Foundations for Aligning Machine Intelligence with Human Interests



Decision Theory

( Text Version  —  Illustrated Version )




Embedded World-Models

( Text Version  —  Illustrated Version )


Further reading: “The Problem with AIXI



Robust Delegation

( Text Version  —  Illustrated Version )


Further reading: “Problem of Fully Updated Deference



Subsystem Alignment

( Text Version  —  Illustrated Version )