New paper: “Incorrigibility in the CIRL Framework”

Posted by & filed under Papers.

MIRI assistant research fellow Ryan Carey has a new paper out discussing situations where good performance in Cooperative Inverse Reinforcement Learning (CIRL) tasks fails to imply that software agents will assist or cooperate with programmers. The paper, titled “Incorrigibility in the CIRL Framework,” lays out four scenarios in which CIRL violates the four conditions for… Read more »

Response to Cegłowski on superintelligence

Posted by & filed under Analysis.

Web developer Maciej Cegłowski recently gave a talk on AI safety (video, text) arguing that we should be skeptical of the standard assumptions that go into working on this problem, and doubly skeptical of the extreme-sounding claims, attitudes, and policies these premises appear to lead to. I’ll give my reply to each of these points… Read more »