Thoughts on the AI Safety Summit company policy requests and responses

Posted by & filed under Analysis.

Over the next two days, the UK government is hosting an AI Safety Summit focused on “the safe and responsible development of frontier AI”. They requested that seven companies (Amazon, Anthropic, DeepMind, Inflection, Meta, Microsoft, and OpenAI) “outline their AI Safety Policies across nine areas of AI Safety”. Below, I’ll give my thoughts on the… Read more »

Misgeneralization as a misnomer

Posted by & filed under Analysis.

Here’s two different ways an AI can turn out unfriendly: You somehow build an AI that cares about “making people happy”. In training, it tells people jokes and buys people flowers and offers people an ear when they need one. In deployment (and once it’s more capable), it forcibly puts each human in a separate… Read more »

Deep Deceptiveness

Posted by & filed under Analysis.

Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don’t recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of… Read more »

Focus on the places where you feel shocked everyone’s dropping the ball

Posted by & filed under Analysis.

Writing down something I’ve found myself repeating in different conversations: If you’re looking for ways to help with the whole “the world looks pretty doomed” business, here’s my advice: look around for places where we’re all being total idiots. Look for places where everyone’s fretting about a problem that some part of you thinks it… Read more »