New paper: “Risks from learned optimization”

Posted by & filed under Papers.

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant have a new paper out: “Risks from learned optimization in advanced machine learning systems.” The paper’s abstract: We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to… Read more »

June 2019 Newsletter

Posted by & filed under Newsletters.

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant have released the first two (of five) posts on “mesa-optimization”: The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as… Read more »

May 2019 Newsletter

Posted by & filed under Newsletters.

Updates A new paper from MIRI researcher Vanessa Kosoy, presented at the ICLR SafeML workshop this week: "Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help." New research posts: Learning "Known" Information When the Information is Not Actually Known; Defeating Goodhart and the "Closest Unblocked Strategy" Problem; Reinforcement Learning with Imperceptible Rewards The Long-Term Future Fund has announced twenty-three new… Read more »

New paper: “Delegative reinforcement learning”

Posted by & filed under Papers.

MIRI Research Associate Vanessa Kosoy has written a new paper, “Delegative reinforcement learning: Learning to avoid traps with a little help.” Kosoy will be presenting the paper at the ICLR 2019 SafeML workshop in two weeks. The abstract reads: Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps…. Read more »

April 2019 Newsletter

Posted by & filed under Newsletters.

Updates New research posts: Simplified Preferences Needed, Simplified Preferences Sufficient; Smoothmin and Personal Identity; Example Population Ethics: Ordered Discounted Utility; A Theory of Human Values; A Concrete Proposal for Adversarial IDA MIRI has received a set of new grants from the Open Philanthropy Project and the Berkeley Existential Risk Initiative. News and links From the DeepMind safety team and Alex Turner: Designing Agent Incentives… Read more »

New grants from the Open Philanthropy Project and BERI

Posted by & filed under News.

I’m happy to announce that MIRI has received two major new grants: A two-year grant totaling $2,112,500 from the Open Philanthropy Project. A $600,000 grant from the Berkeley Existential Risk Initiative. The Open Philanthropy Project’s grant was awarded as part of the first round of grants recommended by their new committee for effective altruism support:… Read more »

March 2019 Newsletter

Posted by & filed under Newsletters.

Want to be in the reference class “people who solve the AI alignment problem”? We now have a guide on how to get started, based on our experience of what tends to make research groups successful. (Also on the AI Alignment Forum.) Other updates Demski and Garrabrant’s introduction to MIRI’s agent foundations research, “Embedded Agency,” is… Read more »

A new field guide for MIRIx

Posted by & filed under News.

We’ve just released a field guide for MIRIx groups, and for other people who want to get involved in AI alignment research. MIRIx is a program where MIRI helps cover basic expenses for outside groups that want to work on open problems in AI safety. You can start your own group or find information on… Read more »