Updates A new paper from MIRI researcher Vanessa Kosoy, presented at the ICLR SafeML workshop this week: "Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help." New research posts: Learning "Known" Information When the Information is Not Actually Known; Defeating Goodhart and the...
Updates New research posts: Simplified Preferences Needed, Simplified Preferences Sufficient; Smoothmin and Personal Identity; Example Population Ethics: Ordered Discounted Utility; A Theory of Human Values; A Concrete Proposal for Adversarial IDA MIRI has received a set of new grants from the Open Philanthropy Project and the Berkeley...
Want to be in the reference class “people who solve the AI alignment problem”? We now have a guide on how to get started, based on our experience of what tends to make research groups successful. (Also on the AI...
Updates Ramana Kumar and Scott Garrabrant argue that the AGI safety community should begin prioritizing “approaches that work well in the absence of human models”: [T]o the extent that human modelling is a good idea, it is important to do...
Our December fundraiser was a success, with 348 donors contributing just over $950,000. Supporters leveraged a variety of matching opportunities, including employer matching programs, WeTrust Spring’s Ethereum-matching campaign, Facebook’s Giving Tuesday event, and professional poker players Dan Smith, Aaron Merchak,...
Edward Kmett has joined the MIRI team! Edward is a prominent Haskell developer who popularized the use of lenses for functional programming, and currently maintains many of the libraries around the Haskell core libraries. I’m also happy to announce another...