# Research Workshops

#### April 1-2, 2017 – Berkeley, California

## 4th Workshop on Machine Learning and AI Safety

- Anand SrinivasanAnand Srinivasan (MIT, AlphaSheets)
- Anthony RoseAnthony Rose (Uber, Texas A&M)
- Daniel HendrycksDaniel Hendrycks ()
- Eric LangloisEric Langlois ()
- Ethan PerezEthan Perez (Rice University)
- Holden LeeHolden Lee ()
- Jenna KainicJenna Kainic (NYU)
- Lawrence ChanLawrence Chan (University of Pennsylvania)

- Michael CohenMichael Cohen (Noodle.ai)
- Robert KrzyzanowskiRobert Krzyzanowski (University of Illinois)
- Long OuyangLong Ouyang ()
- Luke GreckiLuke Grecki (Shopkeep)
- Monica GatesMonica Gates (UC Berkeley )
- Jonathan KrauseJonathan Krause ()
- Jessica TaylorJessica Taylor (MIRI)
- Ryan CareyRyan Carey (MIRI)
- Patrick LaVictoirePatrick LaVictoire (MIRI)

This workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.

This workshop was funded in part by a grant from the Artificial Intelligence Journal.

#### March 25–26, 2017 – Berkeley, California

## Workshop on Agent Foundations and AI Safety

- Matt FrankMatt Frank ()
- Juan David GilJuan David Gil (MIT)
- Alexander AppelAlexander Appel (University of Nevada Reno)
- Eli SenneshEli Sennesh ()
- Harry SlatyerHarry Slatyer (Google)

- Sam EisenstatSam Eisenstat (Google)
- Alex ZhuAlex Zhu ()
- Eliana LorchEliana Lorch (Thiel Fellow)
- Michael DennisMichael Dennis (UC Berkeley)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Scott GarrabrantScott Garrabrant (MIRI)

This two-day weekend workshop brought together researchers with interests in long-term theoretical AI safety research. The workshop covered the context and content of current AI safety research agendas and projects (with a focus on MIRI’s Agent Foundations technical agenda). It was geared for researchers who have technical backgrounds and who have not previously worked extensively with MIRI.

#### December 1-3, 2016 – Berkeley, California

## 3rd Workshop on Machine Learning and AI Safety

- Anand SrinivasanAnand Srinivasan (AlphaSheets)
- Cameron FreerCameron Freer (Gamalon and Borelian)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)
- Jeremy NixonJeremy Nixon (Spark)
- Jessica TaylorJessica Taylor (MIRI)

- Marcello HerreshoffMarcello Herreshoff (Google)
- Moshe LooksMoshe Looks (Google)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Ryan CareyRyan Carey (MIRI)
- Scott GarrabrantScott Garrabrant (MIRI)

This small three-day workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.

Topics included zero-shot learning using shared embeddings, the differences between quantilization and regularization, generative adversarial networks and Goodhart’s Law, and mathematical formalizations of conservative concept learning.

#### November 11-13, 2016 – Berkeley, California

## 9th Workshop on Logic, Probability, and Reflection

- Abram DemskiAbram Demski (USC)
- Alex ZhuAlex Zhu (MIT)
- Andrew CritchAndrew Critch (MIRI)
- Benya FallensteinBenya Fallenstein (MIRI)
- Jack GallagherJack Gallagher ()
- Jessica TaylorJessica Taylor (MIRI)
- Marcello HerreshoffMarcello Herreshoff (Google)

- Nisan StiennonNisan Stiennon (Google)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Ryan CareyRyan Carey (MIRI)
- Sam EisenstatSam Eisenstat (UC Berkeley)
- Scott GarrabrantScott Garrabrant (MIRI)
- Tsvi Benson-TilsenTsvi Benson-Tilsen (UC Berkeley)

Participants at this three-day workshop — most of them veterans of past workshops — worked on a variety of problems related to MIRI’s Agent Foundations technical agenda.

Topics included safe exploration in rich domains, the difference between predicting a human and predicting HCH, and decision theories resulting from other decision theories self-modifying.

#### October 21-23, 2016 – Berkeley, California

## 2nd Workshop on Machine Learning and AI Safety

- Jessica TaylorJessica Taylor (MIRI)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Ryan CareyRyan Carey (MIRI)
- Scott GarrabrantScott Garrabrant (MIRI)

- Eliezer YudkowskyEliezer Yudkowsky (MIRI)
- Sarah ConstantinSarah Constantin ()
- Marcello HerreshoffMarcello Herreshoff (Google)
- William SaundersWilliam Saunders (Google)

This small three-day workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.

Topics included concept learning with different ontologies, problems for Task AGI, censored representations, and conservative concepts.

#### August 26-28, 2016 – Berkeley, California

## 1st Workshop on Machine Learning and AI Safety

- Cameron FreerCameron Freer (Gamalon and Borelian)
- Daniel FilanDaniel Filan (UC Berkeley)
- Dylan Hadfield-MenellDylan Hadfield-Menell (UC Berkeley)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)
- Janos KramarJanos Kramar (University of Montreal)

- Jelena LuketinaJelena Luketina (University of Montreal)
- Jessica TaylorJessica Taylor (MIRI)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Paul ChristianoPaul Christiano (UC Berkeley)
- Richard MallahRichard Mallah (FLI, Cambridge Semantics)
- Victoria KrakovnaVictoria Krakovna (Harvard)

This three-day workshop brought together researchers with machine learning backgrounds to work on long-term AI safety problems that can be modeled in current machine learning systems and frameworks, for instance those described in “Concrete Problems in AI Safety” and “Alignment for Advanced Machine Learning Systems”.

Topics included learning human-interpretable and causal models of the environment; engineering cost functions based on impact measures to disincentivize side effects; designing robust metrics for the quality of a purported explanation of a plan; and developing a formal model of Goodhart’s Law which yields mild optimization.

#### August 12-14, 2016 – Berkeley, California

## 8th Workshop on Logic, Probability, and Reflection

- Sam EisenstatSam Eisenstat (UC Berkeley)
- Tsvi Benson-TilsenTsvi Benson-Tilsen (UC Berkeley)
- Nate SoaresNate Soares (MIRI)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)
- Benya FallensteinBenya Fallenstein (MIRI)

- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Jessica TaylorJessica Taylor (MIRI)
- Andrew CritchAndrew Critch (MIRI)
- Scott GarrabrantScott Garrabrant (MIRI)

Participants at this workshop — all of them veterans of past workshops — worked on a variety of problems related to MIRI’s Agent Foundations technical agenda, with a focus on decision theory and the formal construction of logical counterfactuals.

#### June 17, 2016 – Berkeley, California

## CSRBAI Workshop on Agent Models and Multi-Agent Dilemmas

- USC Institute for Creative Technologies
- Carleton University
- Future of Humanity Institute
- Carnegie Mellon University
- Harvard
- Oxford University

- University College London
- Australian National University
- UC Berkeley
- UT Austin
- Princeton University
- Columbia University

The Colloquium Series on Robust and Beneficial AI included a series of workshops to facilitate conversations and collaborations between people interested in a number of different approaches to the technical challenges associated with AI robustness and reliability.

The fourth workshop of CSRBAI focused on the topics of designing agents that behave well in their environments, without ignoring the effects of the agent’s own actions on the environment or on other agents within the environment.

#### June 11-12, 2016 – Berkeley, California

## CSRBAI Workshop on Preference Specification

- Australian National University
- University College London
- Center for the Study of Existential Risk
- University of Oxford
- Future of Humanity Institute
- Carnegie Mellon University

- The Swiss AI Lab IDSIA
- Australian National University
- UC Berkeley
- Brown University
- University of Montreal
- USC Institute for Creative Technologies

The Colloquium Series on Robust and Beneficial AI included a series of workshops to facilitate conversations and collaborations between people interested in a number of different approaches to the technical challenges associated with AI robustness and reliability.

The third workshop of CSRBAI focused on the topic of preference specification for highly capable AI systems, in which the perennial problem of wanting code to “do what I mean, not what I said” becomes increasingly challenging.

#### June 4-5, 2016 – Berkeley, California

## CSRBAI Workshop on Robustness and Error-Tolerance

- University College London
- Center for the Study of Existential Risk
- Future of Humanity Institute
- Carnegie Mellon University

- Australian National University
- UC Berkeley
- The Swiss AI Lab IDSIA
- Cornell University
- USC Institute for Creative Technologies

The Colloquium Series on Robust and Beneficial AI included a series of workshops to facilitate conversations and collaborations between people interested in a number of different approaches to the technical challenges associated with AI robustness and reliability.

The second workshop of CSRBAI focused on the topic of robustness and error-tolerance in AI systems, and how to ensure that when AI system fail, they fail gracefully and detectably.

#### May 28-29, 2016 – Berkeley, California

## CSRBAI Workshop on Transparency

- Oregon State University
- Australian National University
- Future of Humanity Institute
- Carnegie Mellon University
- IBM Research
- Montreal Institute for Learning Algorithms

- Google Research
- Stanford University
- UC Berkeley
- University College London
- Harvard
- Future of Life Institute

The first workshop of CSRBAI focused on the topic of transparency in AI systems, and how we can increase transparency while maintaining capabilities.

#### April 1-3, 2016 – Berkeley, California

## Self-Reference, Type Theory, and Formal Verification

- Miëtek BakMiëtek Bak (Least Fixed)
- Benya FallensteinBenya Fallenstein (MIRI)
- Jack GallagherJack Gallagher (Gallabytes)
- Jason GrossJason Gross (MIT)

- Ramana KumarRamana Kumar (Cambridge)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Daniel SelsamDaniel Selsam (Stanford)
- Nathaniel ThomasNathaniel Thomas (Stanford)

Participants worked on questions of self-reference in type theory and automated theorem provers, with the goal of studying systems that model themselves.

#### August 28-30, 2015 – Berkeley, California

## 3rd Introductory Workshop on Logical Decision Theory

- Holger DellHolger Dell (Saarland University)
- Owain EvansOwain Evans (MIT)
- Benja FallensteinBenja Fallenstein (MIRI)
- Benjamin FoxBenjamin Fox (Israel Defense Forces)
- Patrick LaVictoirePatrick LaVictoire (MIRI)

- Jonathan LeeJonathan Lee (Cambridge)
- Ben LevinsteinBen Levinstein (Oxford)
- Jelena LuketinaJelena Luketina (Aalto)
- David SteinbergDavid Steinberg (U Maryland)
- Nate SoaresNate Soares (MIRI)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This was the sixth in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.

The topic of this workshop was decision theory, and projects begun at the workshop are discussed in the following post: Proof Length and Logical Counterfactuals Revisited

#### August 7–9, 2015 – Berkeley, California

## 2nd Introductory Workshop on Logical Uncertainty

- Pedro CarvalhoPedro Carvalho (Instituto Superior Técnico)
- Adele Dewey-LopezAdele Dewey-Lopez (SEED Platform Inc.)
- Benja FallensteinBenja Fallenstein (MIRI)
- John FoxJohn Fox (Oxford)
- Robert KrzyzanowskiRobert Krzyzanowski (UIC)

- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Michele ReillyMichele Reilly (Turing Inc.)
- Nate SoaresNate Soares (MIRI)
- Nathaniel ThomasNathaniel Thomas (Stanford)
- Michael WestmorelandMichael Westmoreland (Denison)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This was the fifth in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.

The topic of this workshop was logical uncertainty, and projects begun at the workshop are discussed in the following post: What’s logical coherence for anyway?

#### June 26–28, 2015 – Berkeley, California

## 1st Introductory Workshop on Vingean Reflection

- Siddharth BhaskarSiddharth Bhaskar (UCLA)
- Justin BrodyJustin Brody (Goucher College)
- Abram DemskiAbram Demski (USC)
- Benja FallensteinBenja Fallenstein (MIRI)
- Roko JelavićRoko Jelavić (Ericsson)
- Seth KurtenbachSeth Kurtenbach (U Missouri)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Kenneth PrestingKenneth Presting (Renaissance Computing Institute)
- Jess RiedelJess Riedel (Perimeter Institute)
- Nate SoaresNate Soares (MIRI)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This was the fourth in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.

The topic of this workshop was Vingean reflection, and projects begun at the workshop are discussed in the following posts:

#### June 12–14, 2015 – Berkeley, California

## 2nd Introductory Workshop on Logical Decision Theory

- Manav BhushanManav Bhushan (Oxford)
- Paul CrowleyPaul Crowley (Google)
- Benja FallensteinBenja Fallenstein (MIRI)
- Preston GreenePreston Greene (NTU)
- Jason GrossJason Gross (MIT)
- Nick HayNick Hay (UC Berkeley)

- Victoria KrakovnaVictoria Krakovna (Harvard)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Jan LeikeJan Leike (Australian National University)
- Nate SoaresNate Soares (MIRI)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This was the third in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.

The topic of this workshop was decision theory, and projects begun at the workshop are discussed in the following post: Fixed point theorem in the finite and infinite case

#### May 29–31, 2015 – Berkeley, California

## 1st Introductory Workshop on Logical Uncertainty

- Sarah ConstantinSarah Constantin (Yale)
- Benja FallensteinBenja Fallenstein (MIRI)
- Jacob HiltonJacob Hilton (University of Leeds)
- Vadim KosoyVadim Kosoy (Metaqube)
- Janos KramarJanos Kramar (Independent)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Shivaram LingamneniShivaram Lingamneni (UC Berkeley)
- Quinn MaurmannQuinn Maurmann (Quidsi)
- Nate SoaresNate Soares (MIRI)
- Charlie SteinerCharlie Steiner (Independent)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This was the second in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.

The topic of this workshop was logical uncertainty, and projects begun at the workshop are discussed in the following posts:

#### May 4–6, 2015 – Berkeley, California

## 1st Introductory Workshop on Logical Decision Theory

- Sam EisenstatSam Eisenstat (Twitter)
- Benja FallensteinBenja Fallenstein (MIRI)
- Scott GarrabrantScott Garrabrant (UCLA)
- George HotzGeorge Hotz (Vicarious)
- Patrick LaVictoirePatrick LaVictoire (MIRI)
- Evan LloydEvan Lloyd (UCLA)
- Nate SoaresNate Soares (MIRI)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)
- Sebastien ZanySebastien Zany (Independent)

This was the first in a series of introductory workshops, where MIRI brought together researchers with different backgrounds, discussed open problems in one of the technical agenda topics, and began projects and collaborations in that area.

The topic of this workshop was decision theory, and projects begun at the workshop are discussed in the following posts:

#### May 3–11, 2014 – Berkeley, CA

## 7th Workshop on Logic, Probability, and Reflection

- Mihály BárászMihály Bárász (Google)
- Paul ChristianoPaul Christiano (UC Berkeley)
- Benja FallensteinBenja Fallenstein (Bristol U)
- Marcello HerreshoffMarcello Herreshoff (Google)
- Patrick LaVictoirePatrick LaVictoire (Quixey)

- Nate SoaresNate Soares (Google)
- Nisan StiennonNisan Stiennon (Stanford)
- Qiaochu YuanQiaochu Yuan (UC Berkeley)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

Participants at this workshop — all of them veterans of past workshops — worked on a variety of problems related to Friendly AI. The first tech report from this workshop is available here.

#### December 14–20, 2013 – Berkeley, CA

## 6th Workshop on Logic, Probability, and Reflection

- Nate AckermanNate Ackerman (Harvard)
- John BaezJohn Baez (UC Riverside)
- Paul ChristianoPaul Christiano (UC Berkeley)
- Benja FallensteinBenja Fallenstein (Bristol U)
- Cameron FreerCameron Freer (MIT)
- Jeremy HahnJeremy Hahn (Harvard)
- Wojtek MoczydlowskiWojtek Moczydlowski (Google)

- Michele ReillyMichele Reilly (independent)
- Will SawinWill Sawin (Princeton)
- Nate SoaresNate Soares (Google)
- Nisan StiennonNisan Stiennon (Stanford)
- Gregory WheelerGregory Wheeler (LMU Munich)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

Participants at this workshop focused on the Löbian obstacle, probabilistic logic, and the intersection of logic and probability more generally. The results of this workshop are described here. See photos from the workshop here.

#### November 23-29, 2013 – Oxford, UK

## 5th Workshop on Logic, Probability, and Reflection

- Stuart ArmstrongStuart Armstrong (Oxford)
- Mihály BárászMihály Bárász (Google)
- Catrin Campbell-MooreCatrin Campbell-Moore (LMU Munich)
- Daniel DeweyDaniel Dewey (Oxford)
- Benja FallensteinBenja Fallenstein (Bristol U)

- Jacob HiltonJacob Hilton (Oxford)
- Ramana KumarRamana Kumar (Cambridge)
- Jan LeikeJan Leike (U Freiburg)
- Bas SteunebrinkBas Steunebrink (IDSIA)
- Gregory WheelerGregory Wheeler (LMU Munich)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

Participants at this workshop investigated problems related to reflective agents, probabilistic logic, and priors over logical statements / the logical omniscience problem. Some results from this workshop were developed further at the December 2013 workshop and described here.

#### September 7-13, 2013 – Berkeley, CA

## 4th Workshop on Logic, Probability, and Reflection

- Paul ChristianoPaul Christiano (UC Berkeley)
- Wei DaiWei Dai (independent)
- Gary DrescherGary Drescher (independent)
- Kenny EaswaranKenny Easwaran (USC)
- Cameron FreerCameron Freer (MIT)
- Patrick LaVictoirePatrick LaVictoire (Quixey)
- Ilya ShpitserIlya Shpitser (U Southampton)
- Vladimir SlepnevVladimir Slepnev (Google)
- Nisan StiennonNisan Stiennon (Stanford)
- Andreas StuhlmüllerAndreas Stuhlmüller (MIT & Stanford)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This workshop focused on a variety of open problems related to normative decision theory. Participants brainstormed “well-posed problems” in the area, built on LaVictoire et al.’s Löbian cooperation work, made some progress on formalizing updateless decision theory, and formulated additional toy problems such as the Ultimate Newcomb’s Problem.

These results are still being written up in various forms.

#### July 8-14, 2013 – Berkeley, CA

## 3rd Workshop on Logic, Probability, and Reflection

- Andrew CritchAndrew Critch (PhD, UC Berkeley)
- Abram DemskiAbram Demski (USC)
- Benja FallensteinBenja Fallenstein (Bristol U)
- Marcello HerreshoffMarcello Herreshoff (Google)
- Jonathan LeeJonathan Lee (Cambridge)
- Will SawinWill Sawin (Princeton)
- Qiaochu YuanQiaochu Yuan (UC Berkeley)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This workshop focused on a variety of issues related to the Löbian obstacle for self-modifying systems, and to Demski’s earlier work on logical prior probability. The primary result was a proof that attempting to create a probability distribution which performs scientific induction on Π_{1} statements, converging to probability 1 for the true versions of such statements, can create zero limiting probabilities assigned to true Π_{2} statements. This result is still being written up, but it has been discussed briefly in a blog post by Demski. Other bits of progress were developed at further workshops and described here.

#### April 3-24, 2013 – Berkeley, CA

## 2nd Workshop on Logic, Probability, and Reflection

- Stuart ArmstrongStuart Armstrong (Oxford)
- Mihály BárászMihály Bárász (Google)
- Paul ChristianoPaul Christiano (UC Berkeley)
- Andrew CritchAndrew Critch (PhD, UC Berkeley)
- Daniel DeweyDaniel Dewey (Oxford)
- Benja FallensteinBenja Fallenstein (Bristol U)
- Marcello HerreshoffMarcello Herreshoff (Google)
- Patrick LaVictoirePatrick LaVictoire (U Wisconsin)
- Jacob SteinhardtJacob Steinhardt (Stanford)
- Jessica TaylorJessica Taylor (Stanford)
- Qiaochu YuanQiaochu Yuan (UC Berkeley)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This three-week workshop addressed multiple open research problems simultaneously. First, participants found an improved version of the reflection principle discovered in the previous workshop, though this progress is still being written up. Second, participants improved upon earlier work by LaVictoire, resulting in the paper “Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic.” Third, participants improved upon Benja Fallenstein’s parametric polymorphism approach to tackling the Löbian obstacle for self-modifying systems.

#### November 11-18, 2012 – Berkeley, CA

## 1st Workshop on Logic, Probability, and Reflection

- Mihály BárászMihály Bárász (Google)
- Paul ChristianoPaul Christiano (UC Berkeley)
- Marcello HerreshoffMarcello Herreshoff (Google)
- Eliezer YudkowskyEliezer Yudkowsky (MIRI)

This workshop pursued one line of attack on the Löbian obstacle for self-modifying systems. The primary result of this workshop was a non-constructive “loophole” in Tarski’s undefinability of truth (via a fixed point theorem), which was later written up in draft form as “Definability of Truth in Probabilistic Logic” (see discussions here, here, and here).