Future of Humanity Institute Research Fellow Jan Leike and MIRI Research Fellows Jessica Taylor and Benya Fallenstein have just presented new results at UAI 2016 that resolve a longstanding open problem in game theory: “**A formal solution to the grain of truth problem**.”

Game theorists have techniques for specifying agents that eventually do well on iterated games against other agents, so long as their beliefs contain a “grain of truth” — nonzero prior probability assigned to the actual game they’re playing. Getting that grain of truth was previously an unsolved problem in multiplayer games, because agents can run into infinite regresses when they try to model agents that are modeling them in turn. This result shows how to break that loop: by means of reflective oracles.

In the process, Leike, Taylor, and Fallenstein provide a rigorous and general foundation for the study of multi-agent dilemmas. This work provides a surprising and somewhat satisfying basis for approximate Nash equilibria in repeated games, folding a variety of problems in decision and game theory into a common framework.

The paper’s abstract reads:

A Bayesian agent acting in a multi-agent environment learns to predict the other agents’ policies if its prior assigns positive probability to them (in other words, its prior contains a

grain of truth). Finding a reasonably large class of policies that contains the Bayes-optimal policies with respect to this class is known as thegrain of truth problem. Only small classes are known to have a grain of truth and the literature contains several related impossibility results.In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class. When the environment is unknown, Bayes-optimal agents may fail to act optimally even asymptotically. However, agents based on Thompson sampling converge to play ε-Nash equilibria in arbitrary unknown computable multi-agent environments. While these results are purely theoretical, we show that they can be computationally approximated arbitrarily closely.

Traditionally, when modeling computer programs that model the properties of other programs (such as when modeling an agent reasoning about a game), the first program is assumed to have access to an oracle (such as a halting oracle) that can answer arbitrary questions about the second program. This works, but it doesn’t help with modeling agents that can reason about *each other*.

While a halting oracle can predict the behavior of any isolated Turing machine, it cannot predict the behavior of another Turing machine that has access to a halting oracle. If this were possible, the second machine could use its oracle to figure out what the first machine-oracle pair thinks it will do, at which point it can do the opposite, setting up a liar paradox scenario. For analogous reasons, two agents with similar resources, operating in real-world environments without any halting oracles, cannot perfectly predict each other in full generality.

Game theorists know how to build formal models of asymmetric games between a weaker player and a stronger player, where the stronger player understands the weaker player’s strategy but not vice versa. For the reasons above, however, games between agents of similar strength have resisted full formalization. As a consequence of this, game theory has until now provided no method for *designing* agents that perform well on complex iterated games containing other agents of similar strength.

Usually, the way to build an ideal agent is to have the agent consider a large list of possible policies, predict how the world would respond to each policy, and then choose the best policy by some metric. However, in multi-player games, if your agent considers a big list of policies that both it and the opponent might play, then the best policy for the opponent is usually some alternative policy that was not in your list. (And if you add that policy to your list, then the new best policy for the opponent to play is now a new alternative that wasn’t in the list, and so on.)

This is the grain of truth problem, first posed by Kalai and Lehrer in 1993: define a class of policies that is large enough to be interesting and realistic, and for which *the best response to an agent that considers that policy class is inside the class*.^{1}

Taylor and Fallenstein have developed a formalism that enables a solution: *reflective* oracles capable of answering questions about agents with access to equivalently powerful oracles. Leike has led work on demonstrating that this formalism can solve the grain of truth problem, and in the process shows that the Bayes-optimal policy generally does not converge to a Nash equilibrium. Thompson sampling, however, does converge to a Nash equilibrium — a result that comes out of another paper presented at UAI 2016, Leike, Lattimore, Orseau, and Hutter’s “Thompson sampling is asymptotically optimal in general environments.”

The key feature of reflective oracles is that they avoid diagonalization and paradoxes by randomizing in the relevant cases.^{2} This allows agents with access to a reflective oracle to consistently reason about the behavior of arbitrary agents that also have access to a reflective oracle, which in turn makes it possible to model agents that converge to Nash equilibria by their own faculties (rather than by fiat or assumption).

This framework can be used to, e.g., define games between multiple copies of AIXI. As originally formulated, AIXI cannot entertain hypotheses about its own existence, or about the existence of similarly powerful agents; classical Bayes-optimal agents must be larger and more intelligent than their environments. With access to a reflective oracle, however, Fallenstein, Soares, and Taylor have shown that AIXI can meaningfully entertain hypotheses about itself and copies of itself while avoiding diagonalization.

The other main novelty of this paper is that reflective oracles turn out to be limit-computable, and so allow for approximation by anytime algorithms. The reflective oracles paradigm is therefore likely to be quite valuable for investigating game-theoretic questions involving generally intelligent agents that can understand and model each other.^{3}

#### Sign up to get updates on new MIRI technical results

*Get notified every time a new technical paper is published.*

- It isn’t hard to solve the grain of truth problem for policy classes that are very small. Consider a prisoner’s dilemma where the only strategies the other player can select take the form “cooperate until the opponent defects, then defect forever” or “cooperate
*n*times in a row (or until the opponent defects, whichever happens first) and then defect forever.” Leike, Taylor, and Fallenstein note:The Bayes-optimal behavior is to cooperate until the posterior belief that the other agent defects in the time step after the next is greater than some constant (depending on the discount function) and then defect afterwards.

But this is itself a strategy in the class under consideration. If both players are Bayes-optimal, then both will have a grain of truth (i.e., their actual strategy is assigned nonzero probability by the other player) “and therefore they converge to a Nash equilibrium: either they both cooperate forever or after some finite time they both defect forever.”

Slightly expanding the list of policies an agent might deploy, however, can make it hard to find a policy class that contains a grain of truth. For example, if “tit for tat” is added to the policy class, then, depending on the prior, the grain of truth may be lost. In this case, if the first agent thinks that the second agent is very likely “always defect” but maybe “tit for tat,” then the best policy might be something like “defect until they cooperate, then play tit for tat,” but this policy is not in the policy class. The question resolved by this paper is how to find priors that contain a grain of truth for much richer policy classes. ↩

- Specifically, reflective oracles output 1 if a specified machine returns 1 with probability greater than a specified probability
*p*, and they output 0 if the probability that the machine outputs 0 is greater than 1-*p*. When the probability is exactly*p*, however—or the machine has some probability of not halting, and*p*hits this probability mass—the oracle can output 0, 1, or randomize between the two. This allows reflective oracles to avoid probabilistic versions of the liar paradox: any attempt to ask the reflective oracle an unanswerable question will yield a meaningless placeholder answer. ↩ - Thanks to Tsvi Benson-Tilsen, Chana Messinger, Nate Soares, and Jan Leike for helping draft this announcement. ↩