News and links
Our winter fundraising drive has concluded. Thank you all for your support!
Through the month of December, 175 distinct donors gave a total of $351,298. Between this fundraiser and our summer fundraiser, which brought in $630k, we’ve seen a surge in our donor base; our previous fundraisers over the past five years had brought in on average $250k (in the winter) and $340k (in the summer). We additionally received about $170k in 2015 grants from the Future of Life Institute, and $150k in other donations.
In all, we’ve taken in about $1.3M in grants and contributions in 2015, up from our $1M average over the previous five years. As a result, we’re entering 2016 with a team of six full-time researchers and over a year of runway.
Our next big push will be to close the gap between our new budget and our annual revenue. In order to sustain our current growth plans — which are aimed at expanding to a team of approximately ten full-time researchers — we’ll need to begin consistently taking in close to $2M per year by mid-2017.
I believe this is an achievable goal, though it will take some work. It will be even more valuable if we can overshoot this goal and begin extending our runway and further expanding our research program. On the whole, I’m very excited to see what this new year brings.
In addition to our fundraiser successes, we’ve begun seeing new grant-winning success. In collaboration with Stuart Russell at UC Berkeley, we’ve won a $75,000 grant from the Berkeley Center for Long-Term Cybersecurity. The bulk of the grant will go to funding a new postdoctoral position at UC Berkeley under Stuart Russell. The postdoc will collaborate with Russell and MIRI Research Fellow Patrick LaVictoire on the problem of AI corrigibility, as described in the grant proposal:
Consider a system capable of building accurate models of itself and its human operators. If the system is constructed to pursue some set of goals that its operators later realize will lead to undesirable behavior, then the system will by default have incentives to deceive, manipulate, or resist its operators to prevent them from altering its current goals (as that would interfere with its ability to achieve its current goals). […]
We refer to agents that have no incentives to manipulate, resist, or deceive their operators as “corrigible agents,” using the term as defined by Soares et al. (2015). We propose to study different methods for designing agents that are in fact corrigible.
This postdoctoral position has not yet been filled. Expressions of interest can be emailed to email@example.com using the subject line “UC Berkeley expression of interest.”
News and links
Artificial intelligence capabilities research is aimed at making computer systems more intelligent — able to solve a wider range of problems more effectively and efficiently. We can distinguish this from research specifically aimed at making AI systems at various capability levels safer, or more “robust and beneficial.” In this post, I distinguish three kinds of direct research that might be thought of as “AI safety” work: safety engineering, target selection, and alignment theory.
Imagine a world where humans somehow developed heavier-than-air flight before developing a firm understanding of calculus or celestial mechanics. In a world like that, what work would be needed in order to safely transport humans to the Moon?
In this case, we can say that the main task at hand is one of engineering a rocket and refining fuel such that the rocket, when launched, accelerates upwards and does not explode. The boundary of space can be compared to the boundary between narrowly intelligent and generally intelligent AI. Both boundaries are fuzzy, but have engineering importance: spacecraft and aircraft have different uses and face different constraints.
Paired with this task of developing rocket capabilities is a safety engineering task. Safety engineering is the art of ensuring that an engineered system provides acceptable levels of safety. When it comes to achieving a soft landing on the Moon, there are many different roles for safety engineering to play. One team of engineers might ensure that the materials used in constructing the rocket are capable of withstanding the stress of a rocket launch with significant margin for error. Another might design escape systems that ensure the humans in the rocket can survive even in the event of failure. Another might design life support systems capable of supporting the crew in dangerous environments.
A separate important task is target selection, i.e., picking where on the Moon to land. In the case of a Moon mission, targeting research might entail things like designing and constructing telescopes (if they didn’t exist already) and identifying a landing zone on the Moon. Of course, only so much targeting can be done in advance, and the lunar landing vehicle may need to be designed so that it can alter the landing target at the last minute as new data comes in; this again would require feats of engineering.
Beyond the task of (safely) reaching escape velocity and figuring out where you want to go, there is one more crucial prerequisite for landing on the Moon. This is rocket alignment research, the technical work required to reach the correct final destination. We’ll use this as an analogy to illustrate MIRI’s research focus, the problem of artificial intelligence alignment.
Andrew Critch, one of the new additions to MIRI’s research team, has taken the opportunity of MIRI’s winter fundraiser to write on his personal blog about why he considers MIRI’s work important. Some excerpts:
Since a team of CFAR alumni banded together to form the Future of Life Institute (FLI), organized an AI safety conference in Puerto Rico in January of this year, co-authored the FLI research priorities proposal, and attracted $10MM of grant funding from Elon Musk, a lot of money has moved under the label “AI Safety” in the past year. Nick Bostrom’s Superintelligence was also a major factor in this amazing success story.
A lot of wonderful work is being done under these grants, including a lot of proposals for solutions to known issues with AI safety, which I find extremely heartening. However, I’m worried that if MIRI doesn’t scale at least somewhat to keep pace with all this funding, it just won’t be spent nearly as well as it would have if MIRI were there to help.
We have to remember that AI safety did not become mainstream by a spontaneous collective awakening. It was through years of effort on the part of MIRI and collaborators at FHI struggling to identify unknown unknowns about how AI might surprise us, and struggling further to learn to explain these ideas in enough technical detail that they might be adopted by mainstream research, which is finally beginning to happen.
But what about the parts we’re wrong about? What about the sub-problems we haven’t identified yet, that might end up neglected in the mainstream the same way the whole problem was neglected 5 years ago? I’m glad the AI/ML community is more aware of these issues now, but I want to make sure MIRI can grow fast enough to keep this growing field on track.
Now, you might think that now that other people are “on the issue”, it’ll work itself out. That might be so.
But just because some of MIRI’s conclusions are now being widely adopted widely doesn’t mean its methodology is. The mental movement
“Someone has pointed out this safety problem to me, let me try to solve it!”
is very different from
“Someone has pointed out this safety solution to me, let me try to see how it’s broken!”
And that second mental movement is the kind that allowed MIRI to notice AI safety problems in the first place. Cybersecurity professionals seem to carry out this movement easily: security expert Bruce Schneier calls it the security mindset. The SANS institute calls it red teaming. Whatever you call it, AI/ML people are still more in maker-mode than breaker-mode, and are not yet, to my eye, identifying any new safety problems.
I do think that different organizations should probably try different approaches to the AI safety problem, rather than perfectly copying MIRI’s approach and research agenda. But I think breaker-mode/security mindset does need to be a part of every approach to AI safety. And if MIRI doesn’t scale up to keep pace with all this new funding, I’m worried that the world is just about to copy-paste MIRI’s best-2014-impression of what’s important in AI safety, and leave behind the self-critical methodology that generated these ideas in the first place… which is a serious pitfall given all the unknown unknowns left in the field.
A few months ago, several leaders in the scientific community signed an open letter pushing for oversight into the research and development of artificial intelligence, in order to mitigate the risks and ensure the societal benefit of the advanced technology. Researchers largely agree that AI is likely to begin outperforming humans on most cognitive tasks in this century.
Similarly, I believe we’ll see the promise of human-level AI come to fruition much sooner than we’ve fathomed. Its effects will likely be transformational — for the better if it is used to help improve the human condition, or for the worse if it is used incorrectly.
As AI agents become more capable, it becomes more important to analyze and verify their decisions and goals. MIRI’s focus is on how we can create highly reliable agents that can learn human values and the overarching need for better decision-making processes that power these new technologies.
The past few years has seen a vibrant and growing AI research community. As the space continues to flourish, the need for collaboration will continue to grow as well. Organizations like MIRI that are dedicated to security and safety engineering help fill this need. And, as a nonprofit, its research is free from profit obligations. This independence in research is important because it will lead to safer and more neutral results.
By supporting organizations like MIRI, we’re putting the safeguards in place to make sure that this immensely powerful technology is used for the greater good. For humanity’s benefit, we need to guarantee that AI systems can reliably pursue goals that are aligned with society’s human values. If organizations like MIRI are able to help engineer this level of technological advancement and awareness in AI systems, imagine the endless possibilities of how it can help improve our world. It’s critical that we put the infrastructure in place in order to ensure that AI will be used to make the lives of people better. This is why I’ve donated to MIRI, and why I believe it’s a worthy cause that you should consider as well.
Jed McCaleb created eDonkey, one of the largest file-sharing networks of its time, as well as Mt. Gox, the first Bitcoin exchange. Recognizing that the world’s financial infrastructure is broken and that too many people are left without resources, he cofounded Stellar in 2014. Jed is also an advisor to MIRI.
We’re only 11 days into December, and this month is shaping up to be a momentous one.
On December 3, the University of Cambridge partnered with the University of Oxford, Imperial College London, and UC Berkeley to launch the Leverhulme Centre for the Future of Intelligence. The Cambridge Centre for the Study of Existential Risk (CSER) helped secure initial funding for the new independent center, in the form of a $15M grant to be disbursed over ten years. CSER and Leverhulme CFI plan to collaborate closely, with the latter focusing on AI’s mid- and long-term social impact.
Meanwhile, the Strategic Artificial Intelligence Research Centre (SAIRC) is hiring its first research fellows in machine learning, policy analysis, and strategy research: details. SAIRC will function as an extension of two existing institutions: CSER, and the Oxford-based Future of Humanity Institute. As Luke Muehlhauser has noted, if you’re an AI safety “lurker,” now is an ideal time to de-lurk and get in touch.
MIRI’s research program is also growing quickly, with mathematician Scott Garrabrant joining our core team tomorrow. Our winter fundraiser is in full swing, and multiple matching opportunities have sprung up to bring us within a stone’s throw of our first funding target.
The biggest news, however, is the launch of OpenAI, a new $1 billion research nonprofit staffed with top-notch machine learning experts and co-chaired by Sam Altman and Elon Musk. The OpenAI team describes their mission:
Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as is possible safely.
I’ve been in conversations with Sam Altman and Greg Brockman at OpenAI as their team has come together. They’ve expressed a keen interest in making sure that AI has a positive impact, and we’re looking forward to future collaborations between our teams. I’m excited to see OpenAI joining the space, and I’m optimistic that their entrance will result in promising new AI alignment research in addition to AI capabilities research.
2015 has truly been an astounding year — and I’m eager to see what 2016 holds in store.
MIRI Research Fellow Benya Fallenstein and Research Associate Ramana Kumar have co-authored a new paper on machine reflection, “Proof-producing reflection for HOL with an application to model polymorphism.”
HOL stands for Higher Order Logic, here referring to a popular family of proof assistants based on Church’s type theory. Kumar and collaborators have previously formalized within HOL (specifically, HOL4) what it means for something to be provable in HOL, and what it means for something to be a model of HOL.1 In “Self-formalisation of higher-order logic,” Kumar, Arthan, Myreen, and Owens demonstrated that if something is provable in HOL, then it is true in all models of HOL.
“Proof-producing reflection for HOL” builds on this result by demonstrating a formal correspondence between the model of HOL within HOL (“inner HOL”) and HOL itself (“outer HOL”). Informally speaking, Fallenstein and Kumar show that one can always build an interpretation of terms in inner HOL such that they have the same meaning as terms in outer HOL. The authors then show that if statements of a certain kind are provable in HOL’s model of itself, they are true in (outer) HOL. This correspondence enables the authors to use HOL to implement model polymorphism, the approach to machine self-verification described in Section 6.3 of “Vingean reflection: Reliable reasoning for self-improving agents.”2
This project is motivated by the fact that relatively little hands-on work has been done on modeling formal verification systems in formal verification systems, and especially on modeling them in themselves. Fallenstein notes that focusing only on the mathematical theory of Vingean reflection might make us poorly calibrated about where the engineering difficulties lie for software implementations. In the course of implementing model polymorphism, Fallenstein and Kumar indeed encountered difficulties that were not obvious from past theoretical work, the most important of which arose from HOL’s polymorphism.
Fallenstein and Kumar’s paper was presented at ITP 2015 and can be found online or in the associated conference proceedings. Thanks to a grant by the Future of Life Institute, Kumar and Fallenstein will be continuing their collaboration on this project. Following up on “Proof-producing reflection for HOL,” Kumar and Fallenstein’s next goal will be to develop toy models of agents within HOL proof assistants that reason using model polymorphism.
- Kumar showed that if there is a model of set theory in HOL, there is a model of HOL in HOL. Fallenstein and Kumar additionally show that there is a model of set theory in HOL if a simpler axiom holds. ↩
- For more on the role of logical reasoning in machine reflection, see Fallenstein’s 2013 conversation about self-modifying systems. ↩