Blog

Author: Nate Soares

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

Status: Vague, sorry. The point seems almost tautological to me, and yet also seems like the correct answer to the people going around saying “LLMs turned out to be not very want-y, when are the people who expected ‘agents’ going...

Thoughts on the AI Safety Summit company policy requests and responses

Over the next two days, the UK government is hosting an AI Safety Summit focused on “the safe and responsible development of frontier AI”. They requested that seven companies (Amazon, Anthropic, DeepMind, Inflection, Meta, Microsoft, and OpenAI) “outline their AI...

AI as a science, and three obstacles to alignment strategies

AI used to be a science. In the old days (back when AI didn’t work very well), people were attempting to develop a working theory of cognition. Those scientists didn’t succeed, and those days are behind us. For most people...

Misgeneralization as a misnomer

Here’s two different ways an AI can turn out unfriendly: You somehow build an AI that cares about “making people happy”. In training, it tells people jokes and buys people flowers and offers people an ear when they need one....

Truth and Advantage: Response to a draft of “AI safety seems hard to measure”

Status: This was a response to a draft of Holden’s cold take “AI safety seems hard to measure”. It sparked a further discussion, that Holden recently posted a summary of. The follow-up discussion ended up focusing on some issues in...

Deep Deceptiveness

Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don’t recognize it) in the recent plans written...