News and links
News and links
In our last strategy update (August 2016), Nate wrote that MIRI’s priorities were to make progress on our agent foundations agenda and begin work on our new “Alignment for Advanced Machine Learning Systems” agenda, to collaborate and communicate with other researchers, and to grow our research and ops teams.
Since then, senior staff at MIRI have reassessed their views on how far off artificial general intelligence (AGI) is and concluded that shorter timelines are more likely than they were previously thinking. A few lines of recent evidence point in this direction, such as:1
There’s no consensus among MIRI researchers on how long timelines are, and our aggregated estimate puts medium-to-high probability on scenarios in which the research community hasn’t developed AGI by, e.g., 2035. On average, however, research staff now assign moderately higher probability to AGI’s being developed before 2035 than we did a year or two ago. This has a few implications for our strategy:
1. Our relationships with current key players in AGI safety and capabilities play a larger role in our strategic thinking. Short-timeline scenarios reduce the expected number of important new players who will enter the space before we hit AGI, and increase how much influence current players are likely to have.
2. Our research priorities are somewhat different, since shorter timelines change what research paths are likely to pay out before we hit AGI, and also concentrate our probability mass more on scenarios where AGI shares various features in common with present-day machine learning systems.
Both updates represent directions we’ve already been trending in for various reasons.3 However, we’re moving in these two directions more quickly and confidently than we were last year. As an example, Nate is spending less time on staff management and other administrative duties than in the past (having handed these off to MIRI COO Malo Bourgon) and less time on broad communications work (having delegated a fair amount of this to me), allowing him to spend more time on object-level research, research prioritization work, and more targeted communications.4
I’ll lay out what these updates mean for our plans in more concrete detail below.
The Machine Intelligence Research Institute is looking for highly capable software engineers to directly support our AI alignment research efforts, with a focus on projects related to machine learning. We’re seeking engineers with strong programming skills who are passionate about MIRI’s mission and looking for challenging and intellectually engaging work.
While our goal is to hire full-time, we are initially looking for paid interns. Successful internships may then transition into staff positions.
The start time for interns is flexible, but we’re aiming for May or June. We will likely run several batches of internships, so if you are interested but unable to start in the next few months, do still apply. The length of the internship is flexible, but we’re aiming for 2–3 months.
Examples of the kinds of work you’ll do during the internship:
For MIRI, the benefit of this program is that it’s a great way to get to know you and assess you for a potential hire. For applicants, the benefits are that this is an excellent opportunity to get your hands dirty and level up your machine learning skills, and to get to the cutting edge of the AI safety field, with a potential to stay in a full-time engineering role after the internship concludes.
Our goal is to trial many more people than we expect to hire, so our threshold for keeping on engineers long-term as full staff will be higher than for accepting applicants to our internship.
Some qualities of the ideal candidate:
We strive to make working at MIRI a rewarding experience.
MIRI is an equal opportunity employer. We are committed to making employment decisions based on merit and value. This commitment includes complying with all federal, state, and local laws. We desire to maintain a work environment free of harassment or discrimination due to sex, race, religion, color, creed, national origin, sexual orientation, citizenship, physical or mental disability, marital status, familial status, ethnicity, ancestry, status as a victim of domestic violence, age, or any other status protected by federal, state, or local laws.
I recently gave a talk at Google on the problem of aligning smarter-than-human AI with operators’ goals:
The talk was inspired by “AI Alignment: Why It’s Hard, and Where to Start,” and serves as an introduction to the subfield of alignment research in AI. A modified transcript follows.
Talk outline (slides):
Nate Soares’ recent decision theory paper with Ben Levinstein, “Cheating Death in Damascus,” prompted some valuable questions and comments from an acquaintance (anonymized here). I’ve put together edited excerpts from the commenter’s email below, with Nate’s responses.
The discussion concerns functional decision theory (FDT), a newly proposed alternative to causal decision theory (CDT) and evidential decision theory (EDT). Where EDT says “choose the most auspicious action” and CDT says “choose the action that has the best effects,” FDT says “choose the output of one’s decision algorithm that has the best effects across all instances of that algorithm.”
FDT usually behaves similarly to CDT. In a one-shot prisoner’s dilemma between two agents who know they are following FDT, however, FDT parts ways with CDT and prescribes cooperation, on the grounds that each agent runs the same decision-making procedure, and that therefore each agent is effectively choosing for both agents at once.1
Below, Nate provides some of his own perspective on why FDT generally achieves higher utility than CDT and EDT. Some of the stances he sketches out here are stronger than the assumptions needed to justify FDT, but should shed some light on why researchers at MIRI think FDT can help resolve a number of longstanding puzzles in the foundations of rational action.
Anonymous: This is great stuff! I’m behind on reading loads of papers and books for my research, but this came across my path and hooked me, which speaks highly of how interesting is the content and the sense that this paper is making progress.
My general take is that you are right that these kinds of problems need to be specified in more detail. However, my guess is that once you do so, game theorists would get the right answer. Perhaps that’s what FDT is: it’s an approach to clarifying ambiguous games that leads to a formalism where people like Pearl and myself can use our standard approaches to get the right answer.
I know there’s a lot of inertia in the “decision theory” language, so probably it doesn’t make sense to change. But if there were no such sunk costs, I would recommend a different framing. It’s not that people’s decision theories are wrong; it’s that they are unable to correctly formalize problems in which there are high-performance predictors. You show how to do that, using the idea of intervening on (i.e., choosing between putative outputs of) the algorithm, rather than intervening on actions. Everything else follows from a sufficiently precise and non-contradictory statement of the decision problem.
Probably the easiest move this line of work could make to ease this knee-jerk response of mine in defense of mainstream Bayesian game theory is to just be clear that CDT is not meant to capture mainstream Bayesian game theory. Rather, it is a model of one response to a class of problems not normally considered and for which existing approaches are ambiguous.
Nate Soares: I don’t take this view myself. My view is more like: When you add accurate predictors to the Rube Goldberg machine that is the universe — which can in fact be done — the future of that universe can be determined by the behavior of the algorithm being predicted. The algorithm that we put in the “thing-being-predicted” slot can do significantly better if its reasoning on the subject of which actions to output respects the universe’s downstream causal structure (which is something CDT and FDT do, but which EDT neglects), and it can do better again if its reasoning also respects the world’s global logical structure (which is done by FDT alone).
We don’t know exactly how to respect this wider class of dependencies in general yet, but we do know how to do it in many simple cases. While it agrees with modern decision theory and game theory in many simple situations, its prescriptions do seem to differ in non-trivial applications.
The main case where we can easily see that FDT is not just a better tool for formalizing game theorists’ traditional intuitions is in prisoner’s dilemmas. Game theory is pretty adamant about the fact that it’s rational to defect in a one-shot PD, whereas two FDT agents facing off in a one-shot PD will cooperate.
In particular, classical game theory employs a “common knowledge of shared rationality” assumption which, when you look closely at it, cashes out more or less as “common knowledge that all parties are using CDT and this axiom.” Game theory where common knowledge of shared rationality is defined to mean “common knowledge that all parties are using FDT and this axiom” gives substantially different results, such as cooperation in one-shot PDs.
Our newest publication, “Cheating Death in Damascus,” makes the case for functional decision theory, our general framework for thinking about rational choice and counterfactual reasoning.
In other news, our research team is expanding! Sam Eisenstat and Marcello Herreshoff, both previously at Google, join MIRI this month.
News and links
MIRI’s research team is growing! I’m happy to announce that we’ve hired two new research fellows to contribute to our work on AI alignment: Sam Eisenstat and Marcello Herreshoff, both from Google.
Sam Eisenstat studied pure mathematics at the University of Waterloo, where he carried out research in mathematical logic. His previous work was on the automatic construction of deep learning models at Google.
Sam’s research focus is on questions relating to the foundations of reasoning and agency, and he is especially interested in exploring analogies between current theories of logical uncertainty and Bayesian reasoning. He has also done work on decision theory and counterfactuals. His past work with MIRI includes “Asymptotic Decision Theory,” “A Limit-Computable, Self-Reflective Distribution,” and “A Counterexample to an Informal Conjecture on Proof Length and Logical Counterfactuals.”
Marcello Herreshoff studied at Stanford, receiving a B.S. in Mathematics with Honors and getting two honorable mentions in the Putnam Competition, the world’s most highly regarded university-level math competition. Marcello then spent five years as a software engineer at Google, gaining a background in machine learning.
Marcello is one of MIRI’s earliest research collaborators, and attended our very first research workshop alongside Eliezer Yudkowsky, Paul Christiano, and Mihály Bárász. Marcello has worked with us in the past to help produce results such as “Program Equilibrium in the Prisoner’s Dilemma via Löb’s Theorem,” “Definability of Truth in Probabilistic Logic,” and “Tiling Agents for Self-Modifying AI.” His research interests include logical uncertainty and the design of reflective agents.
Sam and Marcello will be starting with us in the first two weeks of April. This marks the beginning of our first wave of new research fellowships since 2015, though we more recently added Ryan Carey to the team on an assistant research fellowship (in mid-2016).
We have additional plans to expand our research team in the coming months, and will soon be hiring for a more diverse set of technical roles at MIRI — details forthcoming!
It’s time again for my annual review of MIRI’s activities.1 In this post I’ll provide a summary of what we did in 2016, see how our activities compare to our previously stated goals and predictions, and reflect on how our strategy this past year fits into our mission as an organization. We’ll be following this post up in April with a strategic update for 2017.
After doubling the size of the research team in 2015,2 we slowed our growth in 2016 and focused on integrating the new additions into our team, making research progress, and writing up a backlog of existing results.
2016 was a big year for us on the research front, with our new researchers making some of the most notable contributions. Our biggest news was Scott Garrabrant’s logical inductors framework, which represents by a significant margin our largest progress to date on the problem of logical uncertainty. We additionally released “Alignment for Advanced Machine Learning Systems” (AAMLS), a new technical agenda spearheaded by Jessica Taylor.
We also spent this last year engaging more heavily with the wider AI community, e.g., through the month-long Colloquium Series on Robust and Beneficial Artificial Intelligence we co-ran with the Future of Humanity Institute, and through talks and participation in panels at many events through the year.