Research Workshops
July 20–22, 2018 – Berkeley, California
2nd Workshop on Approaches in AI Alignment
CHAI Participants
Jordan Alexander
Lawrence Chan
James Drain
Aaron Tucker
Alex Turner
Unaffiliated Participants
Alex Gunning
MIRI Participants
Alex Appel
Daniel Demski
Evan Hubinger
Linda Linsefors
Alex Mennen
David Simmons
Alex Zhu
This weekend workshop brought together research interns from MIRI and UC Berkeley’s Center for Human-Compatible AI (CHAI) to discuss conceptual foundations and open problems in AI safety research.
November 18–19, 2017 – Berkeley, California
1st Workshop on Approaches in AI Alignment
Tsvi Benson-Tilsen (MIRI)
Paul Christiano (OpenAI)
Andrew Critch (UC Berkeley)
Wei Dai (independent)
Abram Demski (MIRI)
Sam Eisenstat (MIRI)
Scott Garrabrant (MIRI)
Richard Mallah (FLI, Cambridge Semantics)
Andreas Stuhlmüller (Stanford)
Jessica Taylor (independent)
This weekend workshop brought together researchers interested in understanding and exploring the intersection between MIRI’s Agent Foundations research agenda and Paul Christiano’s research.
April 1-2, 2017 – Berkeley, California
4th Workshop on Machine Learning and AI Safety
Tsvi Benson-Tilsen (MIRI)
Paul Christiano (OpenAI)
Andrew Critch (UC Berkeley)
Wei Dai (independent)
Abram Demski (MIRI)
Sam Eisenstat (MIRI)
Scott Garrabrant (MIRI)
Richard Mallah (FLI, Cambridge Semantics)
Andreas Stuhlmüller (Stanford)
Jessica Taylor (independent)
This workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.
This workshop was funded in part by a grant from the Artificial Intelligence Journal.
March 25–26, 2017 – Berkeley, California
Workshop on Agent Foundations and AI Safety
Alexander Appel (University of Nevada Reno)
Michael Dennis (UC Berkeley)
Sam Eisenstat (Google)
Matt Frank
Scott Garrabrant (MIRI)
Juan David Gil (MIT)
Patrick LaVictoire (MIRI)
Eliana Lorch (Thiel Fellow)
Eli Sennesh
Harry Slatyer (Google)
Alex Zhu
This two-day weekend workshop brought together researchers with interests in long-term theoretical AI safety research. The workshop covered the context and content of current AI safety research agendas and projects (with a focus on MIRI’s Agent Foundations technical agenda). It was geared for researchers who have technical backgrounds and who have not previously worked extensively with MIRI.
December 1-3, 2016 – Berkeley, California
3rd Workshop on Machine Learning and AI Safety
Ryan Carey (MIRI)
Cameron Freer (Gamalon and Borelian)
Scott Garrabrant (MIRI)
Marcello Herreshoff (Google)
Patrick LaVictoire (MIRI)
Moshe Looks (Google)
Jeremy Nixon (Spark)
Anand Srinivasan (AlphaSheets)
Jessica Taylor (MIRI)
Eliezer Yudkowsky (MIRI)
This small three-day workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.
Topics included zero-shot learning using shared embeddings, the differences between quantilization and regularization, generative adversarial networks and Goodhart’s Law, and mathematical formalizations of conservative concept learning.
November 11-13, 2016 – Berkeley, California
9th Workshop on Logic, Probability, and Reflection
Tsvi Benson-Tilsen (UC Berkeley)
Ryan Carey (MIRI)
Andrew Critch (MIRI)
Abram Demski (USC)
Sam Eisenstat (UC Berkeley)
Benya Fallenstein (MIRI)
Jack Gallagher
Scott Garrabrant (MIRI)
Marcello Herreshoff (Google)
Patrick LaVictoire (MIRI)
Nisan Stiennon (Google)
Jessica Taylor (MIRI)
Alex Zhu (MIT)
Participants at this three-day workshop — most of them veterans of past workshops — worked on a variety of problems related to MIRI’s Agent Foundations technical agenda.
Topics included safe exploration in rich domains, the difference between predicting a human and predicting HCH, and decision theories resulting from other decision theories self-modifying.
October 21-23, 2016 – Berkeley, California
2nd Workshop on Machine Learning and AI Safety
Ryan Carey (MIRI)
Sarah Constantin
Scott Garrabrant (MIRI)
Marcello Herreshoff (Google)
Patrick LaVictoire (MIRI)
William Saunders (Google)
Jessica Taylor (MIRI)
Eliezer Yudkowsky (MIRI)
This small three-day workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.
Topics included concept learning with different ontologies, problems for Task AGI, censored representations, and conservative concepts.
August 26-28, 2016 – Berkeley, California
1st Workshop on Machine Learning and AI Safety
Paul Christiano (UC Berkeley)
Daniel Filan (UC Berkeley)
Cameron Freer (Gamalon and Borelian)
Dylan Hadfield-Menell (UC Berkeley)
Victoria Krakovna (Harvard)
Janos Kramar (University of Montreal)
Patrick LaVictoire (MIRI)
Jelena Luketina (University of Montreal)
Richard Mallah (FLI, Cambridge Semantics)
Jessica Taylor (MIRI)
Eliezer Yudkowsky (MIRI)
This three-day workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.
Topics included learning human-interpretable and causal models of the environment; engineering cost functions based on impact measures to disincentivize side effects; designing robust metrics for the quality of a purported explanation of a plan; and developing a formal model of Goodhart’s Law which yields mild optimization.
June 17, 2016 – Berkeley, California
CSRBAI Workshop on Agent Models and Multi-Agent Dilemmas
Twenty participants attended from institutions including:
- USC Institute for Creative Technologies
- Carleton University
- Future of Humanity Institute
- Carnegie Mellon University
- Harvard
- Oxford University
- University College London
- Australian National University
- UC Berkeley
- UT Austin
- Princeton University
- Columbia University
The Colloquium Series on Robust and Beneficial AI included a series of workshops to facilitate conversations and collaborations between people interested in a number of different approaches to the technical challenges associated with AI robustness and reliability.
The fourth workshop of CSRBAI focused on the topics of designing agents that behave well in their environments, without ignoring the effects of the agent’s own actions on the environment or on other agents within the environment.
June 11-12, 2016 – Berkeley, California
CSRBAI Workshop on Preference Specification
Twenty participants attended from institutions including:
- Australian National University
- University College London
- Center for the Study of Existential Risk
- University of Oxford
- Future of Humanity Institute
- Carnegie Mellon University
- The Swiss AI Lab IDSIA
- Australian National University
- UC Berkeley
- Brown University
- University of Montreal
- USC Institute for Creative Technologies
The Colloquium Series on Robust and Beneficial AI included a series of workshops to facilitate conversations and collaborations between people interested in a number of different approaches to the technical challenges associated with AI robustness and reliability.
The third workshop of CSRBAI focused on the topic of preference specification for highly capable AI systems, in which the perennial problem of wanting code to “do what I mean, not what I said” becomes increasingly challenging.
June 4-5, 2016 – Berkeley, California
CSRBAI Workshop on Robustness and Error-Tolerance
Fourteen participants attended from institutions including:
- University College London
- Center for the Study of Existential Risk
- Future of Humanity Institute
- Carnegie Mellon University
- Australian National University
- UC Berkeley
- The Swiss AI Lab IDSIA
- Cornell University
- USC Institute for Creative Technologies
The Colloquium Series on Robust and Beneficial AI included a series of workshops to facilitate conversations and collaborations between people interested in a number of different approaches to the technical challenges associated with AI robustness and reliability.
The second workshop of CSRBAI focused on the topic of robustness and error-tolerance in AI systems, and how to ensure that when AI system fail, they fail gracefully and detectably.
May 28-29, 2016 – Berkeley, California
CSRBAI Workshop on Transparency
Twenty participants attended from institutions including:
- Oregon State University
- Australian National University
- Future of Humanity Institute
- Carnegie Mellon University
- IBM Research
- Montreal Institute for Learning Algorithms
- Google Research
- Stanford University
- UC Berkeley
- University College London
- Harvard
- Future of Life Institute
The Colloquium Series on Robust and Beneficial AI included a series of workshops to facilitate conversations and collaborations between people interested in a number of different approaches to the technical challenges associated with AI robustness and reliability.
The first workshop of CSRBAI focused on the topic of transparency in AI systems, and how we can increase transparency while maintaining capabilities.
April 1-3, 2016 – Berkeley, California
Self-Reference, Type Theory, and Formal Verification
- Miëtek Bak (Least Fixed)
- Benya Fallenstein (MIRI)
- Jack Gallagher (Gallabytes)
- Jason Gross (MIT)
- Ramana Kumar (Cambridge)
- Patrick LaVictoire (MIRI)
- Daniel Selsam (Stanford)
- Nathaniel Thomas (Stanford)
Participants worked on questions of self-reference in type theory and automated theorem provers, with the goal of studying systems that model themselves.
April 1-3, 2016 – Berkeley, California
Self-Reference, Type Theory, and Formal Verification
- Miëtek Bak (Least Fixed)
- Benya Fallenstein (MIRI)
- Jack Gallagher (Gallabytes)
- Jason Gross (MIT)
- Ramana Kumar (Cambridge)
- Patrick LaVictoire (MIRI)
- Daniel Selsam (Stanford)
- Nathaniel Thomas (Stanford)
Participants worked on questions of self-reference in type theory and automated theorem provers, with the goal of studying systems that model themselves.
August 28-30, 2015 – Berkeley, California
3rd Introductory Workshop on Logical Decision Theory
- Holger Dell (Saarland University)
- Owain Evans (MIT)
- Benya Fallenstein (MIRI)
- Benjamin Fox (Israel Defense Forces)
- Patrick LaVictoire (MIRI)
- Jonathan Lee (Cambridge)
- Ben Levinstein (Oxford)
- Jelena Luketina (Aalto)
- David Steinberg (U Maryland)
- Nate Soares (MIRI)
- Eliezer Yudkowsky (MIRI)
This was the sixth in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.
The topic of this workshop was decision theory, and projects begun at the workshop are discussed in the following post: Proof Length and Logical Counterfactuals Revisited
August 7–9, 2015 – Berkeley, California
2nd Introductory Workshop on Logical Uncertainty
- Pedro Carvalho (Instituto Superior Técnico)
- Adele Dewey-Lopez (SEED Platform Inc.)
- Benya Fallenstein (MIRI)
- John Fox (Oxford)
- Robert Krzyzanowski (UIC)
- Patrick LaVictoire (MIRI)
- Michele Reilly (Turing Inc.)
- Nate Soares (MIRI)
- Nathaniel Thomas (Stanford)
- Michael Westmoreland (Denison)
- Eliezer Yudkowsky (MIRI)
This was the fifth in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.
The topic of this workshop was logical uncertainty, and projects begun at the workshop are discussed in the following post: What’s logical coherence for anyway?
June 26–28, 2015 – Berkeley, California
1st Introductory Workshop on Vingean Reflection
- Siddharth Bhaskar (UCLA)
- Justin Brody (Goucher College)
- Abram Demski (USC)
- Benya Fallenstein (MIRI)
- Roko Jelavić (Ericsson)
- Seth Kurtenbach (U Missouri)
- Patrick LaVictoire (MIRI)
- Kenneth Presting (Renaissance Computing Institute)
- Jess Riedel (Perimeter Institute)
- Nate Soares (MIRI)
- Eliezer Yudkowsky (MIRI)
This was the fourth in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.
The topic of this workshop was Vingean reflection, and projects begun at the workshop are discussed in the following posts:
June 12–14, 2015 – Berkeley, California
2nd Introductory Workshop on Logical Decision Theory
- Manav Bhushan (Oxford)
- Paul Crowley (Google)
- Benya Fallenstein (MIRI)
- Preston Greene (NTU)
- Jason Gross (MIT)
- Nick Hay (UC Berkeley)
- Victoria Krakovna (Harvard)
- Patrick LaVictoire (MIRI)
- Jan Leike (Australian National University)
- Nate Soares (MIRI)
- Eliezer Yudkowsky (MIRI)
This was the third in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.
The topic of this workshop was decision theory, and projects begun at the workshop are discussed in the following post: Fixed point theorem in the finite and infinite case
May 29–31, 2015 – Berkeley, California
1st Introductory Workshop on Logical Uncertainty
- Sarah Constantin (Yale)
- Benya Fallenstein (MIRI)
- Jacob Hilton (University of Leeds)
- Vanessa Kosoy (Metaqube)
- Janos Kramar (Independent)
- Patrick LaVictoire (MIRI)
- Shivaram Lingamneni (UC Berkeley)
- Quinn Maurmann (Quidsi)
- Nate Soares (MIRI)
- Charlie Steiner (Independent)
- Eliezer Yudkowsky (MIRI)
This was the second in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.
The topic of this workshop was logical uncertainty, and projects begun at the workshop are discussed in the following posts:
May 4–6, 2015 – Berkeley, California
1st Introductory Workshop on Logical Decision Theory
- Sam Eisenstat (Twitter)
- Benya Fallenstein (MIRI)
- Scott Garrabrant (UCLA)
- George Hotz (Vicarious)
- Patrick LaVictoire (MIRI)
- Evan Lloyd (UCLA)
- Nate Soares (MIRI)
- Eliezer Yudkowsky (MIRI)
- Sebastien Zany (Independent)
This was the first in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.
The topic of this workshop was decision theory, and projects begun at the workshop are discussed in the following posts:
May 3–11, 2014 – Berkeley, CA
7th Workshop on Logic, Probability, and Reflection
- Mihály Bárász (Google)
- Paul Christiano (UC Berkeley)
- Benya Fallenstein (Bristol U)
- Marcello Herreshoff (Google)
- Patrick LaVictoire (Quixey)
- Nate Soares (Google)
- Nisan Stiennon (Stanford)
- Qiaochu Yuan (UC Berkeley)
- Eliezer Yudkowsky (MIRI)
Participants at this workshop — all of them veterans of past workshops — worked on a variety of problems related to Friendly AI. The first tech report from this workshop is available here.
December 14–20, 2013 – Berkeley, CA
6th Workshop on Logic, Probability, and Reflection
- Nate Ackerman (Harvard)
- John Baez (UC Riverside)
- Paul Christiano (UC Berkeley)
- Benya Fallenstein (Bristol U)
- Cameron Freer (MIT)
- Jeremy Hahn (Harvard)
- Wojtek Moczydlowski (Google)
- Michele Reilly (independent)
- Will Sawin (Princeton)
- Nate Soares (Google)
- Nisan Stiennon (Stanford)
- Gregory Wheeler (LMU Munich)
- Eliezer Yudkowsky (MIRI)
Participants at this workshop focused on the Löbian obstacle, probabilistic logic, and the intersection of logic and probability more generally. The results of this workshop are described here. See photos from the workshop here.
November 23-29, 2013 – Oxford, UK
5th Workshop on Logic, Probability, and Reflection
- Stuart Armstrong (Oxford)
- Mihály Bárász (Google)
- Catrin Campbell-Moore (LMU Munich)
- Daniel Dewey (Oxford)
- Benya Fallenstein (Bristol U)
- Jacob Hilton (Oxford)
- Ramana Kumar (Cambridge)
- Jan Leike (U Freiburg)
- Bas Steunebrink (IDSIA)
- Gregory Wheeler (LMU Munich)
- Eliezer Yudkowsky (MIRI)
Participants at this workshop investigated problems related to reflective agents, probabilistic logic, and priors over logical statements / the logical omniscience problem. Some results from this workshop were developed further at the December 2013 workshop and described here.
September 7-13, 2013 – Berkeley, CA
4th Workshop on Logic, Probability, and Reflection
- Paul Christiano (UC Berkeley)
- Wei Dai (independent)
- Gary Drescher (independent)
- Kenny Easwaran (USC)
- Cameron Freer (MIT)
- Patrick LaVictoire (Quixey)
- Ilya Shpitser (U Southampton)
- Vladimir Slepnev (Google)
- Nisan Stiennon (Stanford)
- Andreas Stuhlmüller (MIT & Stanford)
- Eliezer Yudkowsky (MIRI)
This workshop focused on a variety of open problems related to normative decision theory. Participants brainstormed “well-posed problems” in the area, built on LaVictoire et al.’s Löbian cooperation work, made some progress on formalizing updateless decision theory, and formulated additional toy problems such as the Ultimate Newcomb’s Problem.
These results are still being written up in various forms.
July 8-14, 2013 – Berkeley, CA
3rd Workshop on Logic, Probability, and Reflection
- Andrew Critch (PhD, UC Berkeley)
- Abram Demski (USC)
- Benya Fallenstein (Bristol U)
- Marcello Herreshoff (Google)
- Jonathan Lee (Cambridge)
- Will Sawin (Princeton)
- Qiaochu Yuan (UC Berkeley)
- Eliezer Yudkowsky (MIRI)
This workshop focused on a variety of issues related to the Löbian obstacle for self-modifying systems, and to Demski’s earlier work on logical prior probability. The primary result was a proof that attempting to create a probability distribution which performs scientific induction on Π1 statements, converging to probability 1 for the true versions of such statements, can create zero limiting probabilities assigned to true Π2 statements. This result is still being written up, but it has been discussed briefly in a blog post by Demski. Other bits of progress were developed at further workshops and described here.
April 3-24, 2013 – Berkeley, CA
3rd Workshop on Logic, Probability, and Reflection
- Andrew Critch (PhD, UC Berkeley)
- Abram Demski (USC)
- Benya Fallenstein (Bristol U)
- Marcello Herreshoff (Google)
- Jonathan Lee (Cambridge)
- Will Sawin (Princeton)
- Qiaochu Yuan (UC Berkeley)
- Eliezer Yudkowsky (MIRI)
This three-week workshop addressed multiple open research problems simultaneously. First, participants found an improved version of the reflection principle discovered in the previous workshop, though this progress is still being written up. Second, participants improved upon earlier work by LaVictoire, resulting in the paper “Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic.” Third, participants improved upon Benya Fallenstein’s parametric polymorphism approach to tackling the Löbian obstacle for self-modifying systems.