May 2015 Newsletter
|
||
|
|
||
|
We recently released two new papers on reflective oracles and agents.
The first is “Reflective oracles: A foundation for classical game theory,” by Benja Fallenstein, Jessica Taylor, and Paul Christiano.
Classical game theory treats players as special—a description of a game contains a full, explicit enumeration of all players—even though in the real world, “players” are no more fundamentally special than rocks or clouds. It isn’t trivial to find a decision-theoretic foundation for game theory in which an agent’s co-players are a non-distinguished part of the agent’s environment. Attempts to model both players and the environment as Turing machines, for example, fail for standard diagonalization reasons.
In this paper, we introduce a “reflective” type of oracle, which is able to answer questions about the outputs of oracle machines with access to the same oracle. These oracles avoid diagonalization by answering some queries randomly. We show that machines with access to a reflective oracle can be used to define rational agents using causal decision theory. These agents model their environment as a probabilistic oracle machine, which may contain other agents as a non-distinguished part.
We show that if such agents interact, they will play a Nash equilibrium, with the randomization in mixed strategies coming from the randomization in the oracle’s answers. This can be seen as providing a foundation for classical game theory in which players aren’t special.
The second paper develops these ideas in the context of Solomonoff induction and Marcus Hutter’s AIXI. It is “Reflective variants of Solomonoff induction and AIXI,” by Benja Fallenstein, Nate Soares, and Jessica Taylor.
Solomonoff induction and AIXI model their environment as an arbitrary Turing machine, but are themselves uncomputable. This fails to capture an essential property of real-world agents, which cannot be more powerful than the environment they are embedded in; for example, AIXI cannot accurately model game-theoretic scenarios in which its opponent is another instance of AIXI.
In this paper, we define reflective variants of Solomonoff induction and AIXI, which are able to reason about environments containing other, equally powerful reasoners. To do so, we replace Turing machines by probabilistic oracle machines (stochastic Turing machines with access to an oracle). We then use reflective oracles, which answer questions of the form, “is the probability that oracle machine M outputs 1 greater than p, when run on this same oracle?” Diagonalization can be avoided by allowing the oracle to answer randomly if this probability is equal to p; given this provision, reflective oracles can be shown to exist. We show that reflective Solomonoff induction and AIXI can themselves be implemented as oracle machines with access to a reflective oracle, making it possible for them to model environments that contain reasoners as powerful as themselves.
|
||
|
MIRI recently sponsored Oxford researcher Stuart Armstrong to take a solitary retreat and brainstorm new ideas for AI control. This brainstorming generated 16 new control ideas, of varying usefulness and polish. During the past month, he has described each new idea, and linked those descriptions from his index post: New(ish) AI control ideas.
He also named each AI control idea, and then drew a picture to represent (very roughly) how the new ideas related to each other. In the picture below, an arrow Y→X can mean “X depends on Y”, “Y is useful for X”, “X complements Y on this problem” or “Y inspires X.” The underlined ideas are the ones Stuart currently judges to be most important or developed.
Previously, Stuart developed the AI control idea of utility indifference, which plays a role in MIRI’s paper Corrigibility (Stuart is a co-author). He also developed anthropic decision theory and some ideas for reduced impact AI and oracle AI. He has contributed to the strategy and forecasting challenges of ensuring good outcomes from advanced AI, e.g. in Racing to the Precipice and How We’re Predicting AI — or Failing To. MIRI previously contracted him to write a short book introducing the superintelligence control challenge to a popular audience, Smarter Than Us.
It’s time for my review of MIRI in 2014.1 A post about our next strategic plan will follow in the next couple months, and I’ve included some details about ongoing projects at the end of this review.
Since early 2013, MIRI’s core goal has been to help create a new field of research devoted to the technical challenges of getting good outcomes from future AI agents with highly general capabilities, including the capability to recursively self-improve.2
Launching a new field has been a team effort. In 2013, MIRI decided to focus on its comparative advantage in defining open problems and making technical progress on them. We’ve been fortunate to coordinate with other actors in this space — FHI, CSER, FLI, and others — who have leveraged their comparative advantages in conducting public outreach, building coalitions, pitching the field to grantmakers, interfacing with policymakers, and more.3
MIRI began 2014 with several open problems identified, and with some progress made toward solving them, but with very few people available to do the work. Hence, most of our research program effort in 2014 was aimed at attracting new researchers to the field and making it easier for them to learn the material and contribute. This was the primary motivation for our new technical agenda overview, the MIRIx program, our new research guide, and more (see below). Nick Bostrom’s Superintelligence was also quite helpful for explaining why this field of research should exist in the first place.
Today the field is much larger and healthier than it was at the beginning of 2014. MIRI now has four full-time technical researchers instead of just one. Around 85 people have attended one or more MIRIx workshops. There are so many promising researchers who have expressed interest in our technical research that ~25 of them have already confirmed interest and availability to attend a MIRI introductory workshop this summer, and this mostly doesn’t include people who have attended past MIRI workshops, nor have we sent out all the invites yet. Moreover, there are now several researchers we know who are plausible MIRI hires in the next 1-2 years.
I am extremely grateful to MIRI’s donors, without whom this progress would have been impossible.
The rest of this post provides a more detailed summary of our activities in 2014.
Today we publicly release a new technical report by Patrick LaVictoire, titled “An Introduction to Löb’s Theorem in MIRI Research.” The report’s introduction begins:
This expository note is devoted to answering the following question: why do many MIRI research papers cite a 1955 theorem of Martin Löb, and indeed, why does MIRI focus so heavily on mathematical logic? The short answer is that this theorem illustrates the basic kind of self-reference involved when an algorithm considers its own output as part of the universe, and it is thus germane to many kinds of research involving self-modifying agents, especially when formal verification is involved or when we want to cleanly prove things in model problems. For a longer answer, well, welcome!
I’ll assume you have some background doing mathematical proofs and writing computer programs, but I won’t assume any background in mathematical logic beyond knowing the usual logical operators, nor that you’ve even heard of Löb’s Theorem before.
If you’d like to discuss the article, please do so here.
Get notified every time a new technical paper is published.
Today we are proud to publicly launch the Intelligent Agent Foundations Forum (RSS), a forum devoted to technical discussion of the research problems outlined in MIRI’s technical agenda overview, along with similar research problems.
Patrick’s welcome post explains:
Broadly speaking, the topics of this forum concern the difficulties of value alignment- the problem of how to ensure that machine intelligences of various levels adequately understand and pursue the goals that their developers actually intended, rather than getting stuck on some proxy for the real goal or failing in other unexpected (and possibly dangerous) ways. As these failure modes are more devastating the farther we advance in building machine intelligences, MIRI’s goal is to work today on the foundations of goal systems and architectures that would work even when the machine intelligence has general creative problem-solving ability beyond that of its developers, and has the ability to modify itself or build successors.
The forum has been privately active for several months, so many interesting articles have already been posted, including:
Also see How to contribute.
Between 2006 and 2009, senior MIRI researcher Eliezer Yudkowsky wrote several hundred essays for the blogs Overcoming Bias and Less Wrong, collectively called “the Sequences.” With two days remaining until Yudkowsky concludes his other well-known rationality book, Harry Potter and the Methods of Rationality, we are releasing around 340 of his original blog posts as a series of six books, collected in one ebook volume under the title Rationality: From AI to Zombies.
Yudkowsky’s writings on rationality, which were previously scattered in a constellation of blog posts, have been cleaned up, organized, and collected together for the first time. This new version of the Sequences should serve as a more accessible long-form introduction to formative ideas behind MIRI, CFAR, and substantial parts of the rationalist and effective altruist communities.
While the books’ central focus is on applying probability theory and the sciences of mind to personal dilemmas and philosophical controversies, a considerable range of topics is covered. The six books explore rationality theory and applications from multiple angles:
I. Map and Territory. A lively introduction to the Bayesian conception of rational belief in cognitive science, and how it differs from other kinds of belief.
II. How to Actually Change Your Mind. A guide to overcoming confirmation bias and motivated cognition.
III. The Machine in the Ghost. A collection of essays on the general topic of minds, goals, and concepts.
IV. Mere Reality. Essays on science and the physical world, as they relate to rational inference.
V. Mere Goodness. A wide-ranging discussion of human values and ethics.
VI. Becoming Stronger. An autobiographical account of Yudkowsky’s philosophical mistakes, followed by a discussion of self-improvement and group rationality.
These essays are packaged together as a single electronic text, making it easier to investigate links between essays and search for keywords. The ebook is available on a pay-what-you-want basis (link), and on Amazon.com for $4.99 (link). In the coming months, we will also be releasing print versions of these six books, and Castify will be releasing the official audiobook version.