Updates
- New research posts: Simplified Preferences Needed, Simplified Preferences Sufficient; Smoothmin and Personal Identity; Example Population Ethics: Ordered Discounted Utility; A Theory of Human Values; A Concrete Proposal for Adversarial IDA
- MIRI has received a set of new grants from the Open Philanthropy Project and the Berkeley Existential Risk Initiative.
News and links
- From the DeepMind safety team and Alex Turner: Designing Agent Incentives to Avoid Side Effects.
- From Wei Dai: Three Ways That "Sufficiently Optimized Agents Appear Coherent" Can Be False; What's Wrong With These Analogies for Understanding Informed Oversight and IDA?; and The Main Sources of AI Risk?
- Other recent write-ups: Issa Rice's Comparison of Decision Theories; Paul Christiano's More Realistic Tales of Doom; and Linda Linsefors' The Game Theory of Blackmail.
- OpenAI's Geoffrey Irving describes AI safety via debate on FLI's AI Alignment Podcast.
- A webcomic's take on AI x-risk concepts: Seed.