MIRI Updates
Misgeneralization as a misnomer
Here’s two different ways an AI can turn out unfriendly: You somehow build an AI that cares about “making people happy”. In training, it tells people jokes and buys people flowers and offers people an ear when they need one....
Pausing AI Developments Isn’t Enough. We Need to Shut it All Down
(Published in TIME on March 29.) An open letter published today calls for “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.” This 6-month moratorium would be better...
Truth and Advantage: Response to a draft of “AI safety seems hard to measure”
Status: This was a response to a draft of Holden’s cold take “AI safety seems hard to measure”. It sparked a further discussion, that Holden recently posted a summary of. The follow-up discussion ended up focusing on some issues in...
Deep Deceptiveness
Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don’t recognize it) in the recent plans written...
Yudkowsky on AGI risk on the Bankless podcast
Eliezer gave a very frank overview of his take on AI two weeks ago on the cryptocurrency show Bankless: I’ve posted a transcript of the show and a follow-up Q&A below. Thanks to Andrea_Miotti, remember, and vonk for help posting...
Comments on OpenAI’s "Planning for AGI and beyond"
Sam Altman shared me on a draft of his OpenAI blog post Planning for AGI and beyond, and I left some comments, reproduced below without typos and with some added hyperlinks. Where the final version of the OpenAI post differs...