Blog

Day: August 31, 2017

New paper: “Incorrigibility in the CIRL Framework”

MIRI assistant research fellow Ryan Carey has a new paper out discussing situations where good performance in Cooperative Inverse Reinforcement Learning (CIRL) tasks fails to imply that software agents will assist or cooperate with programmers. The paper, titled “Incorrigibility in...