News and links
Scientific American writer John Horgan recently interviewed MIRI’s senior researcher and co-founder, Eliezer Yudkowsky. The email interview touched on a wide range of topics, from politics and religion to existential risk and Bayesian models of rationality.
Although Eliezer isn’t speaking in an official capacity in the interview, a number of the questions discussed are likely to be interesting to people who follow MIRI’s work. We’ve reproduced the full interview below.
John Horgan: When someone at a party asks what you do, what do you tell her?
Eliezer Yudkowsky: Depending on the venue: “I’m a decision theorist”, or “I’m a cofounder of the Machine Intelligence Research Institute”, or if it wasn’t that kind of party, I’d talk about my fiction.
John: What’s your favorite AI film and why?
Eliezer: AI in film is universally awful. Ex Machina is as close to being an exception to this rule as it is realistic to ask.
John: Is college overrated?
Eliezer: It’d be very surprising if college were underrated, given the social desirability bias of endorsing college. So far as I know, there’s no reason to disbelieve the economists who say that college has mostly become a positional good, and that previous efforts to increase the volume of student loans just increased the cost of college and the burden of graduate debt.
John: Why do you write fiction?
Eliezer: To paraphrase Wondermark, “Well, first I tried not making it, but then that didn’t work.”
Beyond that, nonfiction conveys knowledge and fiction conveys experience. If you want to understand a proof of Bayes’s Rule, I can use diagrams. If I want you to feel what it is to use Bayesian reasoning, I have to write a story in which some character is doing that.
MIRI Research Associate Kaj Sotala recently presented a new paper, “Defining Human Values for Value Learners,” at the AAAI-16 AI, Society and Ethics workshop.
The abstract reads:
Hypothetical “value learning” AIs learn human values and then try to act according to those values. The design of such AIs, however, is hampered by the fact that there exists no satisfactory definition of what exactly human values are. After arguing that the standard concept of preference is insufficient as a definition, I draw on reinforcement learning theory, emotion research, and moral psychology to offer an alternative definition. In this definition, human values are conceptualized as mental representations that encode the brain’s value function (in the reinforcement learning sense) by being imbued with a context-sensitive affective gloss. I finish with a discussion of the implications that this hypothesis has on the design of value learners.
Economic treatments of agency standardly assume that preferences encode some consistent ordering over world-states revealed in agents’ choices. Real-world preferences, however, have structure that is not always captured in economic models. A person can have conflicting preferences about whether to study for an exam, for example, and the choice they end up making may depend on complex, context-sensitive psychological dynamics, rather than on a simple comparison of two numbers representing how much one wants to study or not study.
Sotala argues that our preferences are better understood in terms of evolutionary theory and reinforcement learning. Humans evolved to pursue activities that are likely to lead to certain outcomes — outcomes that tended to improve our ancestors’ fitness. We prefer those outcomes, even if they no longer actually maximize fitness; and we also prefer events that we have learned tend to produce such outcomes.
Affect and emotion, on Sotala’s account, psychologically mediate our preferences. We enjoy and desire states that are highly rewarding in our evolved reward function. Over time, we also learn to enjoy and desire states that seem likely to lead to high-reward states. On this view, our preferences function to group together events that lead on expectation to similarly rewarding outcomes for similar reasons; and over our lifetimes we come to inherently value states that lead to high reward, instead of just valuing such states instrumentally. Rather than directly mapping onto our rewards, our preferences map onto our expectation of rewards.
Sotala proposes that value learning systems informed by this model of human psychology could more reliably reconstruct human values. On this model, for example, we can expect human preferences to change as we find new ways to move toward high-reward states. New experiences can change which states my emotions categorize as “likely to lead to reward,” and they can thereby modify which states I enjoy and desire. Value learning systems that take these facts about humans’ psychological dynamics into account may be better equipped to take our likely future preferences into account, rather than optimizing for our current preferences alone.
News and links
Our winter fundraising drive has concluded. Thank you all for your support!
Through the month of December, 175 distinct donors gave a total of $351,298. Between this fundraiser and our summer fundraiser, which brought in $630k, we’ve seen a surge in our donor base; our previous fundraisers over the past five years had brought in on average $250k (in the winter) and $340k (in the summer). We additionally received about $170k in 2015 grants from the Future of Life Institute, and $150k in other donations.
In all, we’ve taken in about $1.3M in grants and contributions in 2015, up from our $1M average over the previous five years. As a result, we’re entering 2016 with a team of six full-time researchers and over a year of runway.
Our next big push will be to close the gap between our new budget and our annual revenue. In order to sustain our current growth plans — which are aimed at expanding to a team of approximately ten full-time researchers — we’ll need to begin consistently taking in close to $2M per year by mid-2017.
I believe this is an achievable goal, though it will take some work. It will be even more valuable if we can overshoot this goal and begin extending our runway and further expanding our research program. On the whole, I’m very excited to see what this new year brings.
In addition to our fundraiser successes, we’ve begun seeing new grant-winning success. In collaboration with Stuart Russell at UC Berkeley, we’ve won a $75,000 grant from the Berkeley Center for Long-Term Cybersecurity. The bulk of the grant will go to funding a new postdoctoral position at UC Berkeley under Stuart Russell. The postdoc will collaborate with Russell and MIRI Research Fellow Patrick LaVictoire on the problem of AI corrigibility, as described in the grant proposal:
Consider a system capable of building accurate models of itself and its human operators. If the system is constructed to pursue some set of goals that its operators later realize will lead to undesirable behavior, then the system will by default have incentives to deceive, manipulate, or resist its operators to prevent them from altering its current goals (as that would interfere with its ability to achieve its current goals). […]
We refer to agents that have no incentives to manipulate, resist, or deceive their operators as “corrigible agents,” using the term as defined by Soares et al. (2015). We propose to study different methods for designing agents that are in fact corrigible.
This postdoctoral position has not yet been filled. Expressions of interest can be emailed to email@example.com using the subject line “UC Berkeley expression of interest.”
News and links
Artificial intelligence capabilities research is aimed at making computer systems more intelligent — able to solve a wider range of problems more effectively and efficiently. We can distinguish this from research specifically aimed at making AI systems at various capability levels safer, or more “robust and beneficial.” In this post, I distinguish three kinds of direct research that might be thought of as “AI safety” work: safety engineering, target selection, and alignment theory.
Imagine a world where humans somehow developed heavier-than-air flight before developing a firm understanding of calculus or celestial mechanics. In a world like that, what work would be needed in order to safely transport humans to the Moon?
In this case, we can say that the main task at hand is one of engineering a rocket and refining fuel such that the rocket, when launched, accelerates upwards and does not explode. The boundary of space can be compared to the boundary between narrowly intelligent and generally intelligent AI. Both boundaries are fuzzy, but have engineering importance: spacecraft and aircraft have different uses and face different constraints.
Paired with this task of developing rocket capabilities is a safety engineering task. Safety engineering is the art of ensuring that an engineered system provides acceptable levels of safety. When it comes to achieving a soft landing on the Moon, there are many different roles for safety engineering to play. One team of engineers might ensure that the materials used in constructing the rocket are capable of withstanding the stress of a rocket launch with significant margin for error. Another might design escape systems that ensure the humans in the rocket can survive even in the event of failure. Another might design life support systems capable of supporting the crew in dangerous environments.
A separate important task is target selection, i.e., picking where on the Moon to land. In the case of a Moon mission, targeting research might entail things like designing and constructing telescopes (if they didn’t exist already) and identifying a landing zone on the Moon. Of course, only so much targeting can be done in advance, and the lunar landing vehicle may need to be designed so that it can alter the landing target at the last minute as new data comes in; this again would require feats of engineering.
Beyond the task of (safely) reaching escape velocity and figuring out where you want to go, there is one more crucial prerequisite for landing on the Moon. This is rocket alignment research, the technical work required to reach the correct final destination. We’ll use this as an analogy to illustrate MIRI’s research focus, the problem of artificial intelligence alignment.
Andrew Critch, one of the new additions to MIRI’s research team, has taken the opportunity of MIRI’s winter fundraiser to write on his personal blog about why he considers MIRI’s work important. Some excerpts:
Since a team of CFAR alumni banded together to form the Future of Life Institute (FLI), organized an AI safety conference in Puerto Rico in January of this year, co-authored the FLI research priorities proposal, and attracted $10MM of grant funding from Elon Musk, a lot of money has moved under the label “AI Safety” in the past year. Nick Bostrom’s Superintelligence was also a major factor in this amazing success story.
A lot of wonderful work is being done under these grants, including a lot of proposals for solutions to known issues with AI safety, which I find extremely heartening. However, I’m worried that if MIRI doesn’t scale at least somewhat to keep pace with all this funding, it just won’t be spent nearly as well as it would have if MIRI were there to help.
We have to remember that AI safety did not become mainstream by a spontaneous collective awakening. It was through years of effort on the part of MIRI and collaborators at FHI struggling to identify unknown unknowns about how AI might surprise us, and struggling further to learn to explain these ideas in enough technical detail that they might be adopted by mainstream research, which is finally beginning to happen.
But what about the parts we’re wrong about? What about the sub-problems we haven’t identified yet, that might end up neglected in the mainstream the same way the whole problem was neglected 5 years ago? I’m glad the AI/ML community is more aware of these issues now, but I want to make sure MIRI can grow fast enough to keep this growing field on track.
Now, you might think that now that other people are “on the issue”, it’ll work itself out. That might be so.
But just because some of MIRI’s conclusions are now being widely adopted widely doesn’t mean its methodology is. The mental movement
“Someone has pointed out this safety problem to me, let me try to solve it!”
is very different from
“Someone has pointed out this safety solution to me, let me try to see how it’s broken!”
And that second mental movement is the kind that allowed MIRI to notice AI safety problems in the first place. Cybersecurity professionals seem to carry out this movement easily: security expert Bruce Schneier calls it the security mindset. The SANS institute calls it red teaming. Whatever you call it, AI/ML people are still more in maker-mode than breaker-mode, and are not yet, to my eye, identifying any new safety problems.
I do think that different organizations should probably try different approaches to the AI safety problem, rather than perfectly copying MIRI’s approach and research agenda. But I think breaker-mode/security mindset does need to be a part of every approach to AI safety. And if MIRI doesn’t scale up to keep pace with all this new funding, I’m worried that the world is just about to copy-paste MIRI’s best-2014-impression of what’s important in AI safety, and leave behind the self-critical methodology that generated these ideas in the first place… which is a serious pitfall given all the unknown unknowns left in the field.