The post May 2016 Newsletter appeared first on Machine Intelligence Research Institute.
]]>
Research updates
General updates
News and links

The post May 2016 Newsletter appeared first on Machine Intelligence Research Institute.
]]>The post A new MIRI research program with a machine learning focus appeared first on Machine Intelligence Research Institute.
]]>MIRI’s research in general can be viewed as a response to Stuart Russell’s question for artificial intelligence researchers: “What if we succeed?” There appear to be a number of theoretical prerequisites for designing advanced AI systems that are robust and reliable, and our research aims to develop them early.
Our general research agenda is agnostic about when AI systems are likely to match and exceed humans in general reasoning ability, and about whether or not such systems will resemble presentday machine learning (ML) systems. Recent years’ impressive progress in deep learning suggests that relatively simple neuralnetworkinspired approaches can be very powerful and general. For that reason, we are making an initial inquiry into a more specific subquestion: “What if techniques similar in character to presentday work in ML succeed in creating AGI?”.
Much of this work will be aimed at improving our highlevel theoretical understanding of taskdirected AI. Unlike what Nick Bostrom calls “sovereign AI,” which attempts to optimize the world in longterm and largescale ways, task AI is limited to performing instructed tasks of limited scope, satisficing but not maximizing. Our hope is that investigating task AI from an ML perspective will help give information about both the feasibility of task AI and the tractability of early safety work on advanced supervised, unsupervised, and reinforcement learning systems.
To this end, we will begin by investigating eight relevant technical problems:
1. Inductive ambiguity detection.
How can we design a general methodology for ML systems (such as classifiers) to identify when the classification of a test instance is underdetermined by training data?
For example: If an ambiguitydetecting classifier is designed to distinguish images of tanks from images of nontanks, and the training set only contains images of tanks on cloudy days and nontanks on sunny days, this classifier ought to detect that the classification of an image of a tank on a sunny day is ambiguous, and pose some query for its operators to disambiguate it and avoid errors.
While past and current work in active learning and statistical learning theory more broadly has made progress towards this goal, more work is necessary to establish realistic statistical bounds on the error rates and query rates of realworld systems in advance of their deployment in complex environments.
2. Informed oversight.
How might we train a reinforcement learner to output both an action and a “report” comprising information to help a human evaluate its action?
For example: If a human is attempting to train a reinforcement learner to output original stories, then in evaluating the story, the human will want to know some information about the story (such as whether it has been plagiarized from another story) that may be difficult to determine by looking at the story itself.
3. Safe training procedures for humanimitators.
How might we design a ML system that imitates humans performing some task that involves rich outputs (such as answering questions in natural language), to the best of the ML system’s abilities?
While there are existing approaches to imitation learning and generative models, these have some theoretical shortcomings that prevent them from fully solving the general problem. In particular, a generative adversarial model trained on human actions only has an incentive to imitate aspects of the human that the adversary can detect; thus, issues similar to the plagiarism problem from (2) can arise.
4. Conservative concepts.
How might we design a system that, given some positive examples of a concept, can synthesize new instances of the concept without synthesizing edge cases of it?
For example: If we gave the system detailed information about 100 humancreated burritos as training data, it should manufacture additional burritos while avoiding edge cases such as extremely small burritos (even though these could still be considered burritos).
By default, most objective functions will lead to such edge cases (say, because small burritos are cheaper to manufacture). Can we develop a general technique for avoiding this problem?
5. Specifying environmental goals using sensory data.
The goals of most ML systems are defined in terms of sensory data, such as discounted reward over time. While these sensory goals can be useful proxies for environmental goals, they are not identical to them: tricking your own sensors into perceiving a strawberry in the room is not the the same as actually having one there. How might we create systems that directly pursue useful goals in their environment, such as “make there be a strawberry in this room”, in a way that is robust to opportunities for the system to interfere with the proxies we used to specify the goal?
6. Lowimpact agents.
We would prefer a highly intelligent taskoriented system to generally avoid creating large unintendedbyus side effects in pursuit of its goal, and also to notify us of any large impacts that might result from achieving its goal. For example, if we ask it to build a house for a homeless family, it should know implicitly that it should avoid destroying nearby houses for materials — a large side effect — while still having the desirable large followon effect of improving the family’s socioeconomic situation. For any specific task we can specify adhoc cost functions for side effects like nearby house destruction, but since we cannot always anticipate such costs in advance, we want a quantitative understanding of how to generally limit an AI’s side effects (without also limiting its ability to have large positive intended impacts).
7. Mild optimization.
In the event that the low impact criteria we employ in (6) are flawed or missing some key considerations, we would like some way to tell our system “don’t try too hard” or “don’t exercise too much optimization power”, since intuitively, large impacts are more likely to result from large efforts (although the correspondence is imperfect).
Less capable AI systems tend to be less likely to have large unintended side effects, and we would like to know how to make even highly capable systems safe in a similar way, by having them apply less “effort”. With a satisfactory encoding of our notion of “effort”, we could then program a system to impose absolute limits on its total effort, or to use only as much effort as is needed to achieve its objective with reasonably high probability, thus potentially limiting impacts that we did not manage to measure in (6).
8. Averting instrumental pressures.
How might we design a system that is not motivated to pursue certain convergent instrumental goals — such as gaining additional resources — even when pursuing these goals would help it achieve its main objective?
In particular, we may wish to build a system that has no incentive to cause or prevent its own shutdown/suspension. This relates to (6) and (7) in that instrumental pressures like “ensure my continued operation” can incentivize large impacts/efforts. However, this is a distinct agenda item because it may be possible to completely eliminate certain instrumental incentives in a way that would apply even before solutions to (6) and (7) would take effect.
Having identified these topics of interest, we expect our work on this agenda to be timely. The idea of “robust and beneficial” AI has recently received increased attention as a result of the new wave of breakthroughs in machine learning. The kind of theoretical work in this project has more obvious connections to the leading paradigms in AI and ML than, for example, our recent work in logical uncertainty or in game theory, and therefore lends itself better to collaborations with AI/ML researchers in the near future.
Thanks to Eliezer Yudkowsky and Paul Christiano for seeding many of the initial ideas for these research directions, to Patrick LaVictoire, Andrew Critch, and other MIRI researchers for helping develop these ideas, and to Chris Olah, Dario Amodei, and Jacob Steinhardt for valuable discussion.
The post A new MIRI research program with a machine learning focus appeared first on Machine Intelligence Research Institute.
]]>The post New papers dividing logical uncertainty into two subproblems appeared first on Machine Intelligence Research Institute.
]]>The solutions for each subproblem are available in two new papers, based on work spearheaded by Scott Garrabrant: “Uniform coherence” and “Asymptotic convergence in online learning with unbounded delays.”^{1}
To give some background on the problem: Modern probability theory models reasoners’ empirical uncertainty, their uncertainty about the state of a physical environment, e.g., “What’s behind this door?” However, it can’t represent reasoners’ logical uncertainty, their uncertainty about statements like “this Turing machine halts” or “the twin prime conjecture has a proof that is less than a gigabyte long.”^{2}
Roughly speaking, if you give a classical probability distribution variables for statements that could be deduced in principle, then the axioms of probability theory force you to put probability either 0 or 1 on those statements, because you’re not allowed to assign positive probability to contradictions. In other words, modern probability theory assumes that all reasoners know all the consequences of all the things they know, even if deducing those consequences is intractable.
We want a generalization of probability theory that allows us to model reasoners that have uncertainty about statements that they have not yet evaluated. Furthermore, we want to understand how to assign “reasonable” probabilities to claims that are too expensive to evaluate.
Imagine an agent considering whether to use quicksort or mergesort to sort a particular dataset. They might know that quicksort typically runs faster than mergesort, but that doesn’t necessarily apply to the current dataset. They could in principle figure out which one uses fewer resources on this dataset, by running both of them and comparing, but that would defeat the purpose. Intuitively, they have a fair bit of knowledge that bears on the claim “quicksort runs faster than mergesort on this dataset,” but modern probability theory can’t tell us which information they should use and how.^{3}
What does it mean for a reasoner to assign “reasonable probabilities” to claims that they haven’t computed, but could compute in principle? Without probability theory to guide us, we’re reduced to using intuition to identify properties that seem desirable, and then investigating which ones are possible. Intuitively, there are at least two properties we would want logically nonomniscient reasoners to exhibit:
1. They should be able to notice patterns in what is provable about claims, even before they can prove or disprove the claims themselves. For example, consider the claims “this Turing machine outputs an odd number” and “this Turing machine outputs an even number.” A good reasoner thinking about those claims should eventually recognize that they are mutually exclusive, and assign them probabilities that sum to at most 1, even before they can run the relevant Turing machine.
2. They should be able to notice patterns in sentence classes that are true with a certain frequency. For example, they should assign roughly 10% probability to “the 10^{100}th digit of pi is a 7” in lieu of any information about the digit, after observing (but not proving) that digits of pi tend to be uniformly distributed.
MIRI’s work on logical uncertainty this past year can be very briefly summed up as “we figured out how to get these two properties individually, but found that it is difficult to get both at once.”
“Uniform coherence,” which I coauthored with Garrabrant, Benya Fallenstein, and Abram Demski, shows how to get the first property. The abstract reads:
While probability theory is normally applied to external environments, there has been some recent interest in probabilistic modeling of the outputs of computations that are too expensive to run. Since mathematical logic is a powerful tool for reasoning about computer programs, we consider this problem from the perspective of integrating probability and logic.
Recent work on assigning probabilities to mathematical statements has used the concept of coherent distributions, which satisfy logical constraints such as the probability of a sentence and its negation summing to one. Although there are algorithms which converge to a coherent probability distribution in the limit, this yields only weak guarantees about finite approximations of these distributions. In our setting, this is a significant limitation: Coherent distributions assign probability one to all statements provable in a specific logical theory, such as Peano Arithmetic, which can prove what the output of any terminating computation is; thus, a coherent distribution must assign probability one to the output of any terminating computation.
To model uncertainty about computations, we propose to work with approximations to coherent distributions. We introduce uniform coherence, a strengthening of coherence that provides appropriate constraints on finite approximations, and propose an algorithm which satisfies this criterion.
Given a series of provably mutually exclusive sentences, or a series of sentences where each provably implies the next, a uniformly coherent predictor’s probabilities eventually start respecting this pattern. This is true even if the predictor hasn’t been able to prove that the pattern holds yet; if it would be possible in principle to eventually prove each instance of the pattern, then the uniformly coherent predictor will start recognizing it “before too long,” in a specific technical sense, even if the proofs themselves are very long.
“Asymptotic convergence in online learning with unbounded delays,” which I coauthored with Garrabrant and Jessica Taylor, describes an algorithm with the second property. The abstract reads:
We study the problem of predicting the results of computations that are too expensive to run, via the observation of the results of smaller computations. We model this as an online learning problem with delayed feedback, where the length of the delay is unbounded, which we study mainly in a stochastic setting. We show that in this setting, consistency is not possible in general, and that optimal forecasters might not have average regret going to zero. However, it is still possible to give algorithms that converge asymptotically to Bayesoptimal predictions, by evaluating forecasters on specific sparse independent subsequences of their predictions. We give an algorithm that does this, which converges asymptotically on good behavior, and give very weak bounds on how long it takes to converge. We then relate our results back to the problem of predicting large computations in a deterministic setting.
The first property is about recognizing patterns about logical relationships between claims — saying “claim A implies claim B, so my probability on B must be at least my probability on A.” By contrast, the second property is about recognizing frequency patterns between similar claims — saying “I lack the resources to tell whether this claim is true, but 90% of similar claims have been true, so the base rate is 90%” (where part of the problem is figuring out what counts as a “similar claim”).
In this technical report, we model the latter task as an online learning problem, where a predictor observes the behavior of many small computations and has to predict the behavior of large computations. We give an algorithm that eventually assigns the “right” probabilities to every predictable subsequence of observations, in a specific technical sense.
Each paper is interesting in its own right, but for us, the exciting result is that we have teased apart and formalized two separate notions of what counts as “good reasoning” under logical uncertainty, both of which are compelling.
Furthermore, our approaches to formalizing these two notions are very different. “Uniform coherence” frames the problem in the traditional “unify logic with probability” setting, whereas “Asymptotic convergence in online learning with unbounded delays” fits more naturally into the online machine learning framework. The methods we found for solving the first problem don’t appear to help with the second problem, and vice versa. In fact, the two isolated solutions appear quite difficult to reconcile. The problem that these two papers leave open is: Can we get one algorithm that satisfies both properties at once?
Get notified every time a new technical paper is published.
The post New papers dividing logical uncertainty into two subproblems appeared first on Machine Intelligence Research Institute.
]]>The post April 2016 Newsletter appeared first on Machine Intelligence Research Institute.
]]>
Research updates
General updates
News and links

The post April 2016 Newsletter appeared first on Machine Intelligence Research Institute.
]]>The post New paper on bounded Löb and robust cooperation of bounded agents appeared first on Machine Intelligence Research Institute.
]]>Löb’s theorem and Gödel’s theorem make predictions about the behavior of systems capable of selfreference with unbounded computational resources with which to write and evaluate proofs. However, in the real world, systems capable of selfreference will have limited memory and processing speed, so in this paper we introduce an effective version of Löb’s theorem which is applicable given such bounded resources. These results have powerful implications for the game theory of bounded agents who are able to write proofs about themselves and one another, including the capacity to outperform classical Nash equilibria and correlated equilibria, attaining mutually cooperative program equilibrium in the Prisoner’s Dilemma. Previous cooperative program equilibria studied by Tennenholtz and Fortnow have depended on tests for program equality, a fragile condition, whereas “Löbian” cooperation is much more robust and agnostic of the opponent’s implementation.
Tennenholtz (2004) showed that cooperative equilibria exist in the Prisoner’s Dilemma between agents with transparent source code. This suggested that a number of results in classical game theory, where it is a commonplace that mutual defection is rational, might fail to generalize to settings where agents have strong guarantees about each other’s conditional behavior.
Tennenholtz’s version of program equilibrium, however, only established that rational cooperation was possible between agents with identical source code. Patrick LaVictoire and other researchers at MIRI supplied the additional result that more robust cooperation was possible between noncomputable agents, and that it is possible to efficiently determine the outcomes of such games. However, some readers objected to the infinitary nature of the methods (for example, the use of halting oracles) and worried that not all of the results would carry over to finite computations.
Critch’s report demonstrates that robust cooperative equilibria exist for bounded agents. In the process, Critch proves a new generalization of Löb’s theorem, and therefore of Gödel’s second incompleteness theorem. This parametric version of Löb’s theorem holds for proofs that can be written out in n or fewer characters, where the parameter n can be set to any number. For more background on the result’s significance, see LaVictoire’s “Introduction to Löb’s theorem in MIRI research.”
The new Löb result shows that bounded agents face obstacles to selfreferential reasoning similar to those faced by unbounded agents, and can also reap some of the same benefits. Importantly, this lemma will likely allow us to discuss many other selfreferential phenomena going forward using finitary examples rather than infinite ones.
Get notified every time a new technical paper is published.
The post New paper on bounded Löb and robust cooperation of bounded agents appeared first on Machine Intelligence Research Institute.
]]>The post MIRI has a new COO: Malo Bourgon appeared first on Machine Intelligence Research Institute.
]]>As MIRI’s secondincommand, Malo will be taking over a lot of the handson work of coordinating our daytoday activities: supervising our ops team, planning events, managing our finances, and overseeing internal systems. He’ll also be assisting me in organizational strategy and outreach work.
Prior to joining MIRI, Malo studied electrical, software, and systems engineering at the University of Guelph in Ontario. His professional interests included climate change mitigation, and during his master’s, he worked on a project to reduce waste through online detection of inefficient electric motors. Malo started working for us shortly after completing his master’s in early 2012, which makes him MIRI’s longeststanding team member next to Eliezer Yudkowsky.
Until now, I’ve generally thought of Malo as our secret weapon — a smart, practical efficiency savant. While Luke Muehlhauser (our previous executive director) provided the vision and planning that transformed us into a mature research organization, Malo was largely responsible for the implementation. Behind the scenes, nearly every system or piece of software MIRI uses has been put together by Malo, or in a joint effort by Malo and Alex Vermeer — a close friend of Malo’s from the University of Guelph who now works as a MIRI program management analyst. Malo’s past achievements at MIRI include:
More recently, Malo has begun representing MIRI in meetings with philanthropic organizations, government agencies, and forprofit AI groups.
Malo has been an invaluable asset to MIRI, and I’m thrilled to have him take on more responsibilities here. As one positive consequence, this will free up more of my time to work on strategy, recruiting, fundraising, and research.
In other news, MIRI’s head of communications, Rob Bensinger, has been promoted to the role of research communications manager. He continues to be the best person to contact at MIRI if you have general questions about our work and mission.
Lastly, Katja Grace, the primary contributor to the AI Impacts project, has been promoted to our list of research staff. (Katja is not part of our core research team, and works on questions related to AI strategy and forecasting rather than on our technical research agenda.)
My thanks and heartfelt congratulations to Malo, Rob, and Katja for all the work they’ve done, and all they continue to do.
The post MIRI has a new COO: Malo Bourgon appeared first on Machine Intelligence Research Institute.
]]>