MIRI Updates
“So far as I can presently estimate, now that we’ve had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we...
What is the function of a fire alarm? One might think that the function of a fire alarm is to provide you with important evidence about a fire existing, allowing you to change your policy accordingly and exit...
Research updates “Incorrigibility in the CIRL Framework”: a new paper by MIRI assistant researcher Ryan Carey responds to Hadfield-Menell et al.’s “The Off-Switch Game”. New at IAFF: The Three Levels of Goodhart’s Curse; Conditioning on Conditionals; Stable Pointers to Value:...
MIRI assistant research fellow Ryan Carey has a new paper out discussing situations where good performance in Cooperative Inverse Reinforcement Learning (CIRL) tasks fails to imply that software agents will assist or cooperate with programmers. The paper, titled “Incorrigibility in...
Research updates “A Formal Approach to the Problem of Logical Non-Omniscience”: We presented our work on logical induction at the 16th Conference on Theoretical Aspects of Rationality and Knowledge. New at IAFF: Smoking Lesion Steelman; “Like This World, But…”; Jessica...
A number of major mid-year MIRI updates: we received our largest donation to date, $1.01 million from an Ethereum investor! Our research priorities have also shifted somewhat, reflecting the addition of four new full-time researchers (Marcello Herreshoff, Sam Eisenstat, Tsvi...