MIRI Research Fellow Andrew Critch has written a new paper on cooperation between software agents in the Prisoner’s Dilemma, available on arXiv: “Parametric bounded Löb’s theorem and robust cooperation of bounded agents.” The abstract reads:

Löb’s theorem and Gödel’s theorem make predictions about the behavior of systems capable of self-reference with unbounded computational resources with which to write and evaluate proofs. However, in the real world, systems capable of self-reference will have limited memory and processing speed, so in this paper we introduce an effective version of Löb’s theorem which is applicable given such bounded resources. These results have powerful implications for the game theory of bounded agents who are able to write proofs about themselves and one another, including the capacity to out-perform classical Nash equilibria and correlated equilibria, attaining mutually cooperative program equilibrium in the Prisoner’s Dilemma. Previous cooperative program equilibria studied by Tennenholtz and Fortnow have depended on tests for program equality, a fragile condition, whereas “Löbian” cooperation is much more robust and agnostic of the opponent’s implementation.

Tennenholtz (2004) showed that cooperative equilibria exist in the Prisoner’s Dilemma between agents with transparent source code. This suggested that a number of results in classical game theory, where it is a commonplace that mutual defection is rational, might fail to generalize to settings where agents have strong guarantees about each other’s conditional behavior.

Tennenholtz’s version of program equilibrium, however, only established that rational cooperation was possible between agents with identical source code. Patrick LaVictoire and other researchers at MIRI supplied the additional result that more robust cooperation was possible between non-computable agents, and that it is possible to efficiently determine the outcomes of such games. However, some readers objected to the infinitary nature of the methods (for example, the use of halting oracles) and worried that not all of the results would carry over to finite computations.

Critch’s report demonstrates that robust cooperative equilibria exist for bounded agents. In the process, Critch proves a new generalization of Löb’s theorem, and therefore of Gödel’s second incompleteness theorem. This parametric version of Löb’s theorem holds for proofs that can be written out in *n* or fewer characters, where the parameter *n* can be set to any number. For more background on the result’s significance, see LaVictoire’s “Introduction to Löb’s theorem in MIRI research.”

The new Löb result shows that bounded agents face obstacles to self-referential reasoning similar to those faced by unbounded agents, and can also reap some of the same benefits. Importantly, this lemma will likely allow us to discuss many other self-referential phenomena going forward using finitary examples rather than infinite ones.

#### Sign up to get updates on new MIRI technical results

*Get notified every time a new technical paper is published.*