Subsystem Alignment

Posted by & filed under Analysis.

  You want to figure something out, but you don’t know how to do that yet. You have to somehow break up the task into sub-computations. There is no atomic act of “thinking”; intelligence must be built up of non-intelligent parts. The agent being made of parts is part of what made counterfactuals hard, since… Read more »

Embedded World-Models

Posted by & filed under Analysis.

  An agent which is larger than its environment can:   Hold an exact model of the environment in its head. Think through the consequences of every potential course of action. If it doesn’t know the environment perfectly, hold every possible way the environment could be in its head, as is the case with Bayesian… Read more »

Embedded Agents

Posted by & filed under Analysis.

  Suppose you want to build a robot to achieve some real-world goal for you—a goal that requires the robot to learn for itself and figure out a lot of things that you don’t already know.1 There’s a complicated engineering problem here. But there’s also a problem of figuring out what it even means to… Read more »

New paper: “Categorizing variants of Goodhart’s Law”

Posted by & filed under Papers.

Goodhart’s Law states that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” However, this is not a single phenomenon. In Goodhart Taxonomy, I proposed that there are (at least) four different mechanisms through which proxy measures break when you optimize for them: Regressional, Extremal, Causal, and… Read more »