Embedded Agency is a write-up by Abram Demski and Scott Garrabrant, available on the AI Alignment Forum here. There’s also a shorter version of the post as a hand-drawn sequence, and a lightly rewritten version on arXiv.

Embedded Agency was first released in 2018, with the arXiv version following in early 2019. In August 2020, Demski and Garrabrant substantially updated all versions.

Further reading: “Security Mindset and Ordinary Paranoia”; “Agent Foundations for Aligning Machine Intelligence with Human Interests



Decision Theory

Embedded World-Models

Further reading: “The Problem with AIXI



Robust Delegation

Further reading: “Problem of Fully Updated Deference



Subsystem Alignment

( Text Version  —  Illustrated Version )