MIRI Updates
MIRI senior researcher Eliezer Yudkowsky and executive director Nate Soares have a new introductory paper out on decision theory: “Functional decision theory: A new theory of instrumental rationality.” Abstract: This paper describes and motivates a new decision theory known as...
AlphaGo Zero uses 4 TPUs, is built entirely out of neural nets with no handcrafted features, doesn’t pretrain against expert games or anything else human, reaches a superhuman level after 3 days of self-play, and is the strongest version of...
“So far as I can presently estimate, now that we’ve had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we...
What is the function of a fire alarm? One might think that the function of a fire alarm is to provide you with important evidence about a fire existing, allowing you to change your policy accordingly and exit...
Research updates “Incorrigibility in the CIRL Framework”: a new paper by MIRI assistant researcher Ryan Carey responds to Hadfield-Menell et al.’s “The Off-Switch Game”. New at IAFF: The Three Levels of Goodhart’s Curse; Conditioning on Conditionals; Stable Pointers to Value:...
MIRI assistant research fellow Ryan Carey has a new paper out discussing situations where good performance in Cooperative Inverse Reinforcement Learning (CIRL) tasks fails to imply that software agents will assist or cooperate with programmers. The paper, titled “Incorrigibility in...