Embedded Agency is a write-up by Abram Demski and Scott Garrabrant, available on the AI Alignment Forum here. There’s also a shorter version of the post as a hand-drawn sequence, and a lightly rewritten version on arXiv.
We’ve included links and references below, listed in the order they come up in the relevant topic/section.
- Marcus Hutter. 2012. “One Decade of Universal Artificial Intelligence.” In Theoretical Foundations of Artificial General Intelligence 4.
- Nate Soares. 2017. “Ensuring Smarter-Than-Human Intelligence Has A Positive Outcome.” MIRI Blog.
- Eliezer Yudkowsky. 2018. “The Rocket Alignment Problem.” MIRI Blog.
- Eliezer Yudkowsky and Nate Soares. 2017. “Functional Decision Theory: A New Theory of Instrumental Rationality.” arXiv:1710.05060 [cs.AI].
- Scott Garrabrant. 2017. “Two Major Obstacles for Logical Inductor Decision Theory.” Intelligent Agent Foundations Forum.
- Patrick LaVictoire. 2015. An Introduction to Löb’s Theorem in MIRI Research. MIRI technical report 2015–6.
- Rob Bensinger. 2017. “Decisions Are For Making Bad Outcomes Inconsistent.” MIRI Blog.
- Wei Dai. 2009. “Towards a New Decision Theory.” Less Wrong.
- Vladimir Nesov. 2009. “Counterfactual Mugging.” Less Wrong.
- Abram Demski. 2018. “Toward a New Technical Explanation of Technical Explanation.” Less Wrong.
- Nate Soares. 2015. Formalizing Two Problems of Realistic World-Models. MIRI technical report 2015–3.
- Jan Leike. 2016. Nonparametric General Reinforcement Learning. PhD thesis, Australian National University.
- Laurent Orseau and Mark Ring. 2012. “Space-Time Embedded Intelligence.” In Artificial General Intelligence, 5th International Conference. Springer.
- Benja Fallenstein, Jessica Taylor, and Paul Christiano. 2015. “Reflective Oracles: A Foundation for Classical Game Theory.” arXiv:1508.04145 [cs.AI].
- Jan Leike, Jessica Taylor, and Benya Fallenstein. 2016. “A Formal Solution to the Grain of Truth Problem.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.
- Nate Soares and Benja Fallenstein. 2015. Questions of Reasoning under Logical Uncertainty. MIRI technical report 2015–1.
- Abram Demski. 2018. “An Untrollable Mathematician Illustrated.” Less Wrong.
- Eliezer Yudkowsky. 2017. “Coherent Decisions Imply Consistent Utilities.” Arbital.
- Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor. 2016. “Logical Induction.” arXiv:1609.03543 [cs.AI].
- Eliezer Yudkowsky. 2015. “Ontology Identification.” Arbital.
- Peter de Blanc. 2011. “Ontological Crises in Artificial Agents’ Value Systems.” arXiv:1105.3821 [cs.AI]
- Caspar Oesterheld. 2017. “Naturalized Induction – A Challenge for Evidential and Causal Decision Theory.” Less Wrong.
- Rob Bensinger. 2013. “Building Phenomenological Bridges.” Less Wrong.
- Thomas Nagel. 1986. The View from Nowhere. Oxford University Press.
- Stuart Armstrong and Sören Mindermann. 2017. “Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents.” arXiv:1712.05812 [cs.AI].
- Benja Fallenstein and Nate Soares. 2015. Vingean Reflection: Reliable Reasoning for Self-Improving Agents. MIRI technical report 2015–2.
- Eliezer Yudkowsky and Marcello Herreshoff. 2013. “Tiling Agents for Self-Modifying AI, and the Löbian Obstacle.” Draft.
- David Manheim and Scott Garrabrant. 2018. “Categorizing Variants of Goodhart’s Law.” arXiv:1803.04585 [cs.AI].
- Nate Soares. 2015/2018. “The Value Learning Problem.” In Artificial Intelligence Safety and Security. Chapman and Hall.
- Nate Soares, Benja Fallenstein, Eliezer Yudkowsky, and Stuart Armstrong. 2014/2015. “Corrigibility.” Paper presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop.
- Paul Christiano. 2016. “The Informed Oversight Problem.” AI Alignment.
- Dylan Hadfield-Menell, Stuart Russell, Pieter Abbeel, and Anca Dragan. 2016. “Cooperative Inverse Reinforcement Learning.” In Advances in Neural Information Processing Systems (NIPS) 29.
- Scott Garrabrant. 2017. “Logical Updatelessness as a Robust Delegation Problem.” Less Wrong.
- Eliezer Yudkowsky. 2015. “Complexity of Value.” Arbital.
- Scott Garrabrant. 2018. “Optimization Amplifies.” Less Wrong.
- Charles Goodhart. 1981. “Problems of Monetary Management: The UK Experience.” In Inflation, Depression, and Economic Policy in the West. Rowman & Littlefield.
- James Smith and Robert Winkler. 2006. “The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis.” In Management Science 52:3.
- Jessica Taylor. 2016. “Quantilizers: A Safer Alternative to Maximizers for Limited Optimization.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.
- Daniel Dewey. 2011. “Learning What to Value.” In Proceedings of AGI 2011. Springer.
- Abram Demski. 2017. “Stable Pointers to Value: An Agent Embedded in Its Own Utility Function.” Intelligent Agent Foundations Forum.
- Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, and Shane Legg. 2017. “Reinforcement Learning with a Corrupted Reward Channel.” In Proceedings of the 26th International Joint Conference on Artificial Intelligence.
- Paul Christiano, Buck Shlegeris, and Dario Amodei. 2018. “Supervising Strong Learners by Amplifying Weak Experts.” arXiv:1810.08575 [cs.LG].
- Eliezer Yudkowsky. 2017. “Non-Adversarial Principle.” Arbital.
- Scott Garrabrant. 2018. “Robustness to Scale.” Less Wrong.
- Eliezer Yudkowsky. 2015. “Omnipotence Test for AI Safety.” Arbital.
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial Nets.” In Advances in Neural Information Processing Systems (NIPS) 27.
- Eliezer Yudkowsky. 2016. “Optimization Daemons.” Arbital.
- Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Forthcoming. “The Inner Alignment Problem.” Draft.
- Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. “Concrete Problems in AI Safety.” arXiv:1606.06565 [cs.AI].