Deep Deceptiveness

Posted by & filed under Analysis.

Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don’t recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of… Read more »

Focus on the places where you feel shocked everyone’s dropping the ball

Posted by & filed under Analysis.

Writing down something I’ve found myself repeating in different conversations: If you’re looking for ways to help with the whole “the world looks pretty doomed” business, here’s my advice: look around for places where we’re all being total idiots. Look for places where everyone’s fretting about a problem that some part of you thinks it… Read more »

Visible Thoughts Project and Bounty Announcement

Posted by & filed under News.

(Update Jan. 12, 2022: We released an FAQ last month, with more details. Last updated Jan. 7.) (Update Jan. 19, 2022: We now have an example of a successful partial run, which you can use to inform how you do your runs. Details.) (Update Mar. 14, 2023: As of now the limited $20,000 prizes are… Read more »

2018 Update: Our New Research Directions

Posted by & filed under MIRI Strategy, News.

For many years, MIRI’s goal has been to resolve enough fundamental confusions around alignment and intelligence to enable humanity to think clearly about technical AI safety risks—and to do this before this technology advances to the point of potential catastrophe. This goal has always seemed to us to be difficult, but possible. Last year, we… Read more »