Here’s two different ways an AI can turn out unfriendly: You somehow build an AI that cares about “making people happy”. In training, it tells people jokes and buys people flowers and offers people an ear when they need one. In deployment (and once it’s more capable), it forcibly puts each human in a separate… Read more »
Posts By: Nate Soares
Truth and Advantage: Response to a draft of “AI safety seems hard to measure”
Status: This was a response to a draft of Holden’s cold take “AI safety seems hard to measure”. It sparked a further discussion, that Holden recently posted a summary of. The follow-up discussion ended up focusing on some issues in AI alignment that I think are underserved, which Holden said were kinda orthogonal to the… Read more »
Deep Deceptiveness
Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don’t recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of… Read more »
Comments on OpenAI’s "Planning for AGI and beyond"
Sam Altman shared me on a draft of his OpenAI blog post Planning for AGI and beyond, and I left some comments, reproduced below without typos and with some added hyperlinks. Where the final version of the OpenAI post differs from the draft, I’ve noted that as well, making text Sam later cut red and… Read more »
Focus on the places where you feel shocked everyone’s dropping the ball
Writing down something I’ve found myself repeating in different conversations: If you’re looking for ways to help with the whole “the world looks pretty doomed” business, here’s my advice: look around for places where we’re all being total idiots. Look for places where everyone’s fretting about a problem that some part of you thinks it… Read more »
What I mean by “alignment is in large part about making cognition aimable at all”
(Epistemic status: attempting to clear up a misunderstanding about points I have attempted to make in the past. This post is not intended as an argument for those points.) I have long said that the lion’s share of the AI alignment problem seems to me to be about pointing powerful cognition at anything at all, rather… Read more »
A central AI alignment problem: capabilities generalization, and the sharp left turn
(This post was factored out of a larger post that I (Nate Soares) wrote, with help from Rob Bensinger, who also rearranged some pieces and added some text to smooth things out. I’m not terribly happy with it, but am posting it anyway (or, well, having Rob post it on my behalf while I travel)… Read more »
Visible Thoughts Project and Bounty Announcement
(Update Jan. 12, 2022: We released an FAQ last month, with more details. Last updated Jan. 7.) (Update Jan. 19, 2022: We now have an example of a successful partial run, which you can use to inform how you do your runs. Details.) (Update Mar. 14, 2023: As of now the limited $20,000 prizes are… Read more »