We’ve uploaded the final set of videos from our recent Colloquium Series on Robust and Beneficial AI (CSRBAI) at the MIRI office, co-hosted with the Future of Humanity Institute. A full list of CSRBAI talks with public video or slides:
- Stuart Russell (UC Berkeley) — AI: The Story So Far (slides)
- Alan Fern (Oregon State University) — Toward Recognizing and Explaining Uncertainty (slides 1, slides 2)
- Francesca Rossi (IBM Research) — Moral Preferences (slides)
- Tom Dietterich (Oregon State University) — Issues Concerning AI Transparency (slides)
- Stefano Ermon (Stanford) — Probabilistic Inference and Accuracy Guarantees (slides)
- Paul Christiano (UC Berkeley) — Training an Aligned Reinforcement Learning Agent
- Jim Babcock — The AGI Containment Problem (slides)
- Bart Selman (Cornell) — Non-Human Intelligence (slides)
- Jessica Taylor (MIRI) — Alignment for Advanced Machine Learning Systems
- Dylan Hadfield-Menell (UC Berkeley) — The Off-Switch: Designing Corrigible, yet Functional, Artificial Agents (slides)
- Bas Steunebrink (IDSIA) — About Understanding, Meaning, and Values (slides)
- Jan Leike (Future of Humanity Institute) — General Reinforcement Learning (slides)
- Tom Everitt (Australian National University) — Avoiding Wireheading with Value Reinforcement Learning (slides)
- Michael Wellman (University of Michigan) — Autonomous Agents in Financial Markets: Implications and Risks (slides)
- Stefano Albrecht (UT Austin) — Learning to Distinguish Between Belief and Truth (slides)
- Stuart Armstrong (Future of Humanity Institute) — Reduced Impact AI and Other Alternatives to Friendliness (slides)
- Andrew Critch (MIRI) — Robust Cooperation of Bounded Agents
For a recap of talks from the earlier weeks at CSRBAI, see my previous blog posts on transparency, robustness and error tolerance, and preference specification. The last set of talks was part of the week focused on Agent Models and Multi-Agent Dilemmas:
Michael Wellman, Professor of Computer Science and Engineering at the University of Michigan, spoke about the implications and risks of autonomous agents in the financial markets (slides). Abstract:
Design for robust and beneficial AI is a topic for the future, but also of more immediate concern for the leading edge of autonomous agents emerging in many domains today. One area where AI is already ubiquitous is on financial markets, where a large fraction of trading is routinely initiated and conducted by algorithms. Models and observational studies have given us some insight on the implications of AI traders for market performance and stability. Design and regulation of market environments given the presence of AIs may also yield lessons for dealing with autonomous agents more generally.
Stefano Albrecht, a Postdoctoral Fellow in the Department of Computer Science at the University of Texas at Austin, spoke about “learning to distinguish between belief and truth” (slides). Abstract:
Intelligent agents routinely build models of other agents to facilitate the planning of their own actions. Sophisticated agents may also maintain beliefs over a set of alternative models. Unfortunately, these methods usually do not check the validity of their models during the interaction. Hence, an agent may learn and use incorrect models without ever realising it. In this talk, I will argue that robust agents should have both abilities: to construct models of other agents and contemplate the correctness of their models. I will present a method for behavioural hypothesis testing along with some experimental results. The talk will conclude with open problems and a possible research agenda.
Stuart Armstrong, from the Future of Humanity Institute in Oxford, spoke about “reduced impact AI” (slides). Abstract:
This talk will look at some of the ideas developed to create safe AI without solving the problem of friendliness. It will focus first on “reduced impact AI”, AIs designed to have little effect on the world – but from whom high impact can nevertheless be extracted. It will then delve into the new idea of AIs designed to have preferences over their own virtual worlds only, and look at the advantages – and limitations – of using indifference as a tool of AI control.
Lastly, Andrew Critch, a MIRI research fellow, spoke about robust cooperation in bounded agents. This talk is based on the paper “Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents.” Talk abstract:
The first interaction between a pair of agents who might destroy each other can resemble a one-shot prisoner’s dilemma. Consider such a game where each player is an algorithm with read-access to its opponent’s source code. Tennenholtz (2004) introduced an agent which cooperates iff its opponent’s source code is identical to its own, thus sometimes achieving mutual cooperation while remaining unexploitable in general. However, precise equality of programs is a fragile cooperative criterion. Here, I will exhibit a new and more robust cooperative criterion, inspired by ideas of LaVictoire, Barasz and others (2014), using a new theorem in provability logic for bounded reasoners.