Embedded Agency

Embedded Agency is a write-up by Abram Demski and Scott Garrabrant, available on the AI Alignment Forum here. There’s also a shorter version of the post as a hand-drawn sequence, and a lightly rewritten version on arXiv.

Embedded Agency was first released in 2018, with the arXiv version following in early 2019. In August 2020, Demski and Garrabrant substantially updated all versions.

We’ve included links and references below, listed in the order they come up in the relevant topic/section.

General

Text Introduction  —  Illustrated Introduction  ———  MIRI Blog Afterword  —  LessWrong Afterword ) Further reading: “Security Mindset and Ordinary Paranoia”; “Agent Foundations for Aligning Machine Intelligence with Human Interests

Decision Theory

Text Version  —  Illustrated Version )

Embedded World-Models

Text Version  —  Illustrated Version ) Further reading: “The Problem with AIXI

Robust Delegation

Text Version  —  Illustrated Version ) Further reading: “Problem of Fully Updated Deference

Subsystem Alignment

Text Version  —  Illustrated Version )