News and links
MIRI Research Fellow Andrew Critch has written a new paper on cooperation between software agents in the Prisoner’s Dilemma, available on arXiv: “Parametric bounded Löb’s theorem and robust cooperation of bounded agents.” The abstract reads:
Löb’s theorem and Gödel’s theorem make predictions about the behavior of systems capable of self-reference with unbounded computational resources with which to write and evaluate proofs. However, in the real world, systems capable of self-reference will have limited memory and processing speed, so in this paper we introduce an effective version of Löb’s theorem which is applicable given such bounded resources. These results have powerful implications for the game theory of bounded agents who are able to write proofs about themselves and one another, including the capacity to out-perform classical Nash equilibria and correlated equilibria, attaining mutually cooperative program equilibrium in the Prisoner’s Dilemma. Previous cooperative program equilibria studied by Tennenholtz and Fortnow have depended on tests for program equality, a fragile condition, whereas “Löbian” cooperation is much more robust and agnostic of the opponent’s implementation.
Tennenholtz (2004) showed that cooperative equilibria exist in the Prisoner’s Dilemma between agents with transparent source code. This suggested that a number of results in classical game theory, where it is a commonplace that mutual defection is rational, might fail to generalize to settings where agents have strong guarantees about each other’s conditional behavior.
Tennenholtz’s version of program equilibrium, however, only established that rational cooperation was possible between agents with identical source code. Patrick LaVictoire and other researchers at MIRI supplied the additional result that more robust cooperation was possible between non-computable agents, and that it is possible to efficiently determine the outcomes of such games. However, some readers objected to the infinitary nature of the methods (for example, the use of halting oracles) and worried that not all of the results would carry over to finite computations.
Critch’s report demonstrates that robust cooperative equilibria exist for bounded agents. In the process, Critch proves a new generalization of Löb’s theorem, and therefore of Gödel’s second incompleteness theorem. This parametric version of Löb’s theorem holds for proofs that can be written out in n or fewer characters, where the parameter n can be set to any number. For more background on the result’s significance, see LaVictoire’s “Introduction to Löb’s theorem in MIRI research.”
The new Löb result shows that bounded agents face obstacles to self-referential reasoning similar to those faced by unbounded agents, and can also reap some of the same benefits. Importantly, this lemma will likely allow us to discuss many other self-referential phenomena going forward using finitary examples rather than infinite ones.
Sign up to get updates on new MIRI technical results
Get notified every time a new technical paper is published.
I’m happy to announce that Malo Bourgon, formerly a program management analyst at MIRI, has taken on a new leadership role as our chief operating officer.
As MIRI’s second-in-command, Malo will be taking over a lot of the hands-on work of coordinating our day-to-day activities: supervising our ops team, planning events, managing our finances, and overseeing internal systems. He’ll also be assisting me in organizational strategy and outreach work.
Prior to joining MIRI, Malo studied electrical, software, and systems engineering at the University of Guelph in Ontario. His professional interests included climate change mitigation, and during his master’s, he worked on a project to reduce waste through online detection of inefficient electric motors. Malo started working for us shortly after completing his master’s in early 2012, which makes him MIRI’s longest-standing team member next to Eliezer Yudkowsky.
The Machine Intelligence Research Institute is accepting applicants to two summer programs: a three-week AI robustness and reliability colloquium series (co-run with the Oxford Future of Humanity Institute), and a two-week fellows program focused on helping new researchers contribute to MIRI’s technical agenda (co-run with the Center for Applied Rationality).
The Colloquium Series on Robust and Beneficial AI (CSRBAI), running from May 27 to June 18, is a new gathering of top researchers in academia and industry to tackle the kinds of technical questions featured in the Future of Life Institute’s long-term AI research priorities report and project grants, including transparency, error-tolerance, and preference specification in software systems.
The goal of the event is to spark new conversations and collaborations between safety-conscious AI scientists with a variety of backgrounds and research interests. Attendees will be invited to give and attend talks at MIRI’s Berkeley, California offices during Wednesday/Thursday/Friday colloquia, to participate in hands-on Saturday/Sunday workshops, and to drop by for open discussion days:
Scheduled speakers include Stuart Russell (May 27), UC Berkeley Professor of Computer Science and co-author of Artificial Intelligence: A Modern Approach, Tom Dietterich (May 27), AAAI President and OSU Director of Intelligent Systems, and Bart Selman (June 3), Cornell Professor of Computer Science.
Apply here to attend any portion of the event, as well as to propose a talk or discussion topic:
The 2016 MIRI Summer Fellows program, running from June 19 to July 3, doubles as a workshop for developing new problem-solving skills and mathematical intuitions, and a crash course on MIRI’s active research projects.
This is a smaller and more focused version of the Summer Fellows program we ran last year, which resulted in multiple new hires for us. As such, the program also functions as a high-intensity research retreat where MIRI staff and potential collaborators can get to know each other and work together on important open problems in AI. Apply here to attend the program:
Both programs are free of charge, including free room and board for all MIRI Summer Fellows program participants, free lunches and dinners for CSRBAI participants, and additional partial accommodations and travel assistance for select attendees. For additional information, see the CSRBAI event page and the MIRI Summer Fellows event page.
The Machine Intelligence Research Institute (MIRI) is accepting applications for a full-time research fellow to develop theorem provers with self-referential capabilities, beginning by implementing a strongly typed language within that very language. The goal of this research project will be to help us understand autonomous systems that can prove theorems about systems with similar deductive capabilities. Applicants should have experience programming in functional programming languages, with a preference for languages with dependent types, such as Agda, Coq, or Lean.
MIRI is a mathematics and computer science research institute specializing in long-term AI safety and robustness work. Our offices are in Berkeley, California, near the UC Berkeley campus.
News and links
Scientific American writer John Horgan recently interviewed MIRI’s senior researcher and co-founder, Eliezer Yudkowsky. The email interview touched on a wide range of topics, from politics and religion to existential risk and Bayesian models of rationality.
Although Eliezer isn’t speaking in an official capacity in the interview, a number of the questions discussed are likely to be interesting to people who follow MIRI’s work. We’ve reproduced the full interview below.
John Horgan: When someone at a party asks what you do, what do you tell her?
Eliezer Yudkowsky: Depending on the venue: “I’m a decision theorist”, or “I’m a cofounder of the Machine Intelligence Research Institute”, or if it wasn’t that kind of party, I’d talk about my fiction.
John: What’s your favorite AI film and why?
Eliezer: AI in film is universally awful. Ex Machina is as close to being an exception to this rule as it is realistic to ask.
John: Is college overrated?
Eliezer: It’d be very surprising if college were underrated, given the social desirability bias of endorsing college. So far as I know, there’s no reason to disbelieve the economists who say that college has mostly become a positional good, and that previous efforts to increase the volume of student loans just increased the cost of college and the burden of graduate debt.
John: Why do you write fiction?
Eliezer: To paraphrase Wondermark, “Well, first I tried not making it, but then that didn’t work.”
Beyond that, nonfiction conveys knowledge and fiction conveys experience. If you want to understand a proof of Bayes’s Rule, I can use diagrams. If I want you to feel what it is to use Bayesian reasoning, I have to write a story in which some character is doing that.
MIRI Research Associate Kaj Sotala recently presented a new paper, “Defining Human Values for Value Learners,” at the AAAI-16 AI, Society and Ethics workshop.
The abstract reads:
Hypothetical “value learning” AIs learn human values and then try to act according to those values. The design of such AIs, however, is hampered by the fact that there exists no satisfactory definition of what exactly human values are. After arguing that the standard concept of preference is insufficient as a definition, I draw on reinforcement learning theory, emotion research, and moral psychology to offer an alternative definition. In this definition, human values are conceptualized as mental representations that encode the brain’s value function (in the reinforcement learning sense) by being imbued with a context-sensitive affective gloss. I finish with a discussion of the implications that this hypothesis has on the design of value learners.
Economic treatments of agency standardly assume that preferences encode some consistent ordering over world-states revealed in agents’ choices. Real-world preferences, however, have structure that is not always captured in economic models. A person can have conflicting preferences about whether to study for an exam, for example, and the choice they end up making may depend on complex, context-sensitive psychological dynamics, rather than on a simple comparison of two numbers representing how much one wants to study or not study.
Sotala argues that our preferences are better understood in terms of evolutionary theory and reinforcement learning. Humans evolved to pursue activities that are likely to lead to certain outcomes — outcomes that tended to improve our ancestors’ fitness. We prefer those outcomes, even if they no longer actually maximize fitness; and we also prefer events that we have learned tend to produce such outcomes.
Affect and emotion, on Sotala’s account, psychologically mediate our preferences. We enjoy and desire states that are highly rewarding in our evolved reward function. Over time, we also learn to enjoy and desire states that seem likely to lead to high-reward states. On this view, our preferences function to group together events that lead on expectation to similarly rewarding outcomes for similar reasons; and over our lifetimes we come to inherently value states that lead to high reward, instead of just valuing such states instrumentally. Rather than directly mapping onto our rewards, our preferences map onto our expectation of rewards.
Sotala proposes that value learning systems informed by this model of human psychology could more reliably reconstruct human values. On this model, for example, we can expect human preferences to change as we find new ways to move toward high-reward states. New experiences can change which states my emotions categorize as “likely to lead to reward,” and they can thereby modify which states I enjoy and desire. Value learning systems that take these facts about humans’ psychological dynamics into account may be better equipped to take our likely future preferences into account, rather than optimizing for our current preferences alone.