Misgeneralization as a misnomer

Posted by & filed under Analysis.

Here’s two different ways an AI can turn out unfriendly: You somehow build an AI that cares about “making people happy”. In training, it tells people jokes and buys people flowers and offers people an ear when they need one. In deployment (and once it’s more capable), it forcibly puts each human in a separate… Read more »

Deep Deceptiveness

Posted by & filed under Analysis.

Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don’t recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of… Read more »

Focus on the places where you feel shocked everyone’s dropping the ball

Posted by & filed under Analysis.

Writing down something I’ve found myself repeating in different conversations: If you’re looking for ways to help with the whole “the world looks pretty doomed” business, here’s my advice: look around for places where we’re all being total idiots. Look for places where everyone’s fretting about a problem that some part of you thinks it… Read more »

Visible Thoughts Project and Bounty Announcement

Posted by & filed under News.

(Update Jan. 12, 2022: We released an FAQ last month, with more details. Last updated Jan. 7.) (Update Jan. 19, 2022: We now have an example of a successful partial run, which you can use to inform how you do your runs. Details.) (Update Mar. 14, 2023: As of now the limited $20,000 prizes are… Read more »