Blog

Author: Rob Bensinger

July 2019 Newsletter

Hubinger et al.'s “Risks from Learned Optimization in Advanced Machine Learning Systems”, one of our new core resources on the alignment problem, is now available on arXiv, the AI Alignment Forum, and LessWrong. In other news, we received an Ethereum...

New paper: “Risks from learned optimization”

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant have a new paper out: “Risks from learned optimization in advanced machine learning systems.” The paper’s abstract: We analyze the type of learned optimization that occurs when a...

June 2019 Newsletter

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant have released the first two (of five) posts on “mesa-optimization”: The goal of this sequence is to analyze the type of learned optimization that occurs when a learned...

May 2019 Newsletter

Updates A new paper from MIRI researcher Vanessa Kosoy, presented at the ICLR SafeML workshop this week: "Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help." New research posts: Learning "Known" Information When the Information is Not Actually Known; Defeating Goodhart and the...

New paper: “Delegative reinforcement learning”

MIRI Research Associate Vanessa Kosoy has written a new paper, “Delegative reinforcement learning: Learning to avoid traps with a little help.” Kosoy will be presenting the paper at the ICLR 2019 SafeML workshop in two weeks. The abstract reads: Most...

April 2019 Newsletter

Updates New research posts: Simplified Preferences Needed, Simplified Preferences Sufficient; Smoothmin and Personal Identity; Example Population Ethics: Ordered Discounted Utility; A Theory of Human Values; A Concrete Proposal for Adversarial IDA MIRI has received a set of new grants from the Open Philanthropy Project and the Berkeley...