MIRI assistant research fellow Ryan Carey has a new paper out discussing situations where good performance in Cooperative Inverse Reinforcement Learning (CIRL) tasks fails to imply that software agents will assist or cooperate with programmers. The paper, titled “Incorrigibility in...