# Interview with New MIRI Research Fellow Luke Muehlhauser

|   |  Conversations

#### Section Four: Machine Intelligence Research Institute Operations

Michael Anissimov: You were recently hired as a full-time employee by the Machine Intelligence Research Institute. What is your personal background?

Luke Muehlhauser: I studied psychology in university but quickly found that I learn better and faster as an autodidact. Since then, I’ve consumed many fields of science and philosophy, one at a time, as they were relevant to my interests. I’ve written dozens of articles for my blog and for Less Wrong, and I host a podcast on which I interview leading philosophers and scientists about their work. I also have an interest in the mathematics and cognitive science of human rationality, because I want the research I do to be arriving at plausibly correct answers, not just answers that make me feel good.

Michael: Why should we care about artificial intelligence?

Luke: Artificial intelligence is becoming a more powerful technology every year. We now have robots that do original scientific research, and the U.S. military is developing systems for autonomous battlefield robots that make their own decisions. Artificial intelligence will become even more important when it passes human levels of intelligence, at which points it will be able to do many things we care about better than we can — things like curing cancer and preventing disasters.

Michael: Why do you think smarter-than-human artificial intelligence is possible?

Luke: The first reason is scientific. Human intelligence is a product of information processing in a brain made of meat. But meat is not an ideal platform for intelligence; it’s just the first one that evolution happened to produce. Information processing on a faster, more durable, and more flexible platform like silicon should be able to surpass the abilities of an intelligence running on meat if we can figure out which information processing algorithms are required for intelligence – either by looking more closely at which algorithms the brain is using or by gaining new insights in mathematics.

The second reason is historical. Machines have already passed human ability in hundreds of particular tasks: playing chess or Jeopardy, searching through large databanks, and in a recent advance, reading road signs. There is little reason to suspect this trend will stop, unless scientific progress in general stops.

Michael: The mission of the Machine Intelligence Research Institute is to “to ensure that the creation of smarter-than-human intelligence benefits society.” How is your research contributing to that mission?

Luke: A smarter-than-human machine intelligence that benefits (rather than harms) society is called “Friendly AI.” My primary research focus is what we call the problem of “friendliness content.” What does it look like for an AI to be “friendly” or to “benefit” society? We all have ideas about what a good world looks like and what a bad world looks like, but when thinking about that in the context of AI you must be very precise, because an AI will only do exactly what it is programmed to do.

If we can figure out how to specify exactly what it would mean for an AI to be “friendly,” then the creation of Friendly AI could be the best thing that ever happened. An advanced artificial intelligence could do science better and faster than we can, and thereby cure cancer, cure diseases, allow for human immortality, prevent disasters, solve the problems of climate change, and allow us to extend our civilization to other planets. A Friendly AI could also discover better economic and political systems that improve conditions for everyone.

Michael: How does SIAI’s approach to making friendly AI differ from the concept of Asimov’s laws?

Luke: Asimov’s Three Laws of Robotics for governing robot behavior are widely considered to be inadequate for making sure that intelligent machines bring no harm to humans. In fact, Asimov used his stories to illustrate many of the ways in which those laws could lead to unintended consequences. SIAI’s approach is very different in that we don’t think that constraints on AI behavior will work in the long run. We need advanced AI to want the same things we want. If the AI wants something different than what we want, it will eventually find a way around whatever constraints we put on it, due to its vastly superior intelligence. But if we can make an AI want the same things we want, then it will be much more effective than we can be at bringing about the kind of world that we want – curing cancer and inventing immortality and so on.

Michael: Why is it necessary to make an AI that “wants the same things we want”?

Luke: A powerful AI that wants something different than we do could be dangerous. For example, suppose the AI’s goal system is programmed to maximize pleasure. That sounds good at first, but if you tell a super-powerful AI to “maximize pleasure,” it might do something like (1) convert most of Earth’s resources into computing machinery, killing all humans in the process, so that it can (2) tile the solar system with as many small digital minds as possible, and (3) have those digital minds run a continuous cycle of the single most pleasurable experience possible. But of course, that’s not what we want! We don’t just value pleasure, we also value things like novelty and exploration. So we need to be very careful when we tell an AI precisely what it means to be “friendly.”

We must be careful not to anthropomorphize. A machine intelligence won’t necessarily have our “common sense,” or our values, or even be sentient. When AI researchers talk about machine intelligence, they only mean to talk about a machine that is good at achieving its goals — whatever they are – in a wide variety of environments. So if you tell an AI to maximize pleasure, it will do exactly that. It’s not going to stop halfway through and “realize” – like a human might – that maximizing pleasure isn’t what was intended, and that it should do something else.

Michael: If dangerous AI were to develop, why couldn’t we just “pull the plug”?

Luke: We might not know that an AI was dangerous until it was too late. An AI with a certain level of intelligence is going to realize that in order to achieve its goals it needs to avoid being turned off, and so it would hide both the level of its own intelligence and its dangerousness.

But the bigger problem is that if some AI development team has already developed an AI that is intelligent enough to be dangerous, then other teams are only a few months or years behind them. You can’t just unplug every dangerous AI that comes along until the end of time. We’ll need to develop a Friendly AI that can ensure safety much better than humans can.

Michael: Why are you and Machine Intelligence Research Institute focused on artificial intelligence instead of human intelligence enhancement or whole brain emulation?

Luke: Human intelligence enhancement is important, and may be needed to solve some of the harder problems of Friendly AI. Whole brain emulation is a particularly revolutionary kind of human intelligence enhancement that, if invented, could allow us to upload human minds into computers, run them at speeds much faster than is possible with neurons, make backup copies, and allow immortality.

Many researchers think that artificial intelligence will arrive before whole brain emulation does, but predicting the timelines of future technology can be difficult. We are very interested in whole brain emulation, and in fact that was the subject of a presentation our researcher Anna Salamon gave at a recent AI conference. One reason for us to focus on AI for the moment is that there are dozens of open problems in Friendly AI theory that we can make progress on right now without needing the vast computational resources required to make progress in whole brain emulation.

### Section Two: Research Area Questions

Michael: What research areas do you specifically investigate to develop “Friendliness content” for artificial intelligence?

Luke: One relevant area of research is cognitive neuroscience, especially the subfields neuroeconomics and affective neuroscience.

Worlds are “good” or “bad” to us because of our values, and our values are stored in the brain’s neural networks. For decades, we’ve had to infer human values by observing human behavior because the brain has been a “black box” to us. But that can only take us so far because the environments in which we act are highly complex, and that makes it difficult to infer human values merely from behavior. Recently, new technologies like fMRI and TMS and optogenetics have allowed us to look into the black box and watch what the brain does. In fact, we’ve located the specific neurons that seem to encode the brain’s expected subjective value for the possible actions we are considering at a given moment. We’ve also learned a lot about the specific algorithms used by the brain to update how much we value certain things – in fact, they turned out to be a type of algorithm first discovered in computer science, called temporal difference reinforcement learning.

A second relevant area of research is choice modeling and preference elicitation. Economists use a variety of techniques, for example Willingness to Pay measures, to infer human preferences from human behavior. AI researchers also do this, usually for the purposes of designing a piece of software called a decision support system. The human brain doesn’t seem to encode a coherent preference set, so we’ll need to use choice modeling and preference elicitation techniques to extract a coherent preference set from whatever it is that human brains actually do.

Other fields relevant to friendliness content theory include value extrapolation, the psychology of concepts, game theory, metaethics, normativity, and machine ethics.

Michael: What is value extrapolation and how is it relevant to Friendly AI theory?

Luke: Most philosophers talk about “ideal preference theories,” but I prefer to call them “value extrapolation algorithms.” If we want to develop Friendly AI, we may not want to just scan human values from our brains and give those same values to an AI. I want to eat salty foods all day, but I kind of wish I didn’t want that, and I certainly don’t want an AI to feed me salty foods all day. Moreover, I would probably change my desires if I knew more and was more rational. I might learn things that would change what I want. And it’s unlikely that the human species has reached the end of moral development. So we don’t want to fix things in place by programming an AI with our current values. We want an AI to extrapolate our values so that it cares about what we would want if we knew more, were more rational, were more morally developed, and so on.

Michael: What is the psychology of concepts and how is it relevant to Friendly AI theory?

Luke: Some researchers think that part of the solution to the friendliness content problem will come from examining our intuitive concept of “ought” or “good,” and using this to inform our picture of what we think a good world would be like, and thus what the goal system of a super-powerful machine should be aimed toward. Philosophers have been examining our intuitive concepts of “ought” or “good” for centuries and made little progress, but perhaps new tools in psychology and neuroscience can help us do this conceptual analysis better than philosophers could from their armchairs.

On the other hand, psychological experiments have been undermining our classical theories about what concepts are, leading some to go so far as to conclude that concepts do not exist in any useful sense. The results of that research program in psychology and philosophy could have profound implications for any approach to friendliness content that depends on an examination of our intuitive concepts of “ought” or “good.”

Michael: What is game theory and how is it relevant to Friendly AI theory?

Luke: Game theory is a highly developed field of mathematics concerned with particular scenarios (“games”) where an agent’s success depends on the choices of others. Its models and discoveries have been applied to business, economics, political science, biology, computer science, and philosophy.

Game theory is relevant to friendliness content because many of our values result from our need to make decisions in scenarios where our success depends on the choices of others. It may also be relevant to value extrapolation algorithms, as the extrapolation process is likely to change the ways in which our values and decisions interact with the values and decisions of others.

Michael: What is metaethics and how is it relevant to Friendly AI theory?

Luke: Philosophers often divide the field of ethics into three levels. Applied ethics is the study of particular moral questions: How should we treat animals? Is lying ever acceptable? What responsibilities do corporations have concerning the environment? Normative ethics considers the principles by which we make judgments in applied ethics. Do we make one judgment over another based on which action produces the most good? Or should we be following a list of rules and respecting certain rights? Perhaps we should advocate what we would all agree to behind a veil of ignorance that kept us from knowing what our lot in life will be?

Metaethics goes one level deeper. What do terms like “good” and “right” even mean? Do moral facts exist, or is it all relative? Is there such a thing as moral progress? These questions are relevant to friendliness content because presumably, if moral facts exist, we would want an AI to respect them. Even if moral facts do not exist, our moral attitudes are part of what we value, and that is relevant to friendliness content theory.

Michael: What is normativity and how is it relevant to Friendly AI theory?

Luke: Normativity is about norms, and there are many kinds. Prudential norms concern what we ought to do to achieve our goals. Epistemic norms concern how we ought to pursue knowledge. Doxastic norms concern what we ought to believe. Moral norms concern how we ought to behave ethically. And so on.

A classic concern of normativity is the “is-ought gap.” Supposedly, you cannot reason from an “is” statement to an “ought” statement. It doesn’t logically follow from “The man in front of me is suffering” that “I ought to help him.” Actually, it’s trivial to bridge the is-ought gap when it comes to prudential norms. “If you want Y, then you ought to do X,” is just another way of saying “Doing X will increase your chances of attaining Y.” The first sentence contains an “ought” claim, but the second sentence reduces it away into a purely descriptive sentence about the natural world.

Some philosophers think that the “is-ought gap” can be bridged in the same way for epistemic and moral norms. Perhaps “you ought to believe X” just means “If you want true beliefs, then you ought to believe X,” which in turn can be reduced into the purely descriptive statement “Believing X will increase your proportion of true beliefs.”

But is there any other kind of normativity? Are there “categorical” oughts that do not depend on an “If you want X” clause? Naturalists tend to deny this possibility, but perhaps categorical epistemic or moral oughts can be derived from the mathematics of game theory and decision theory, as naturalist Gary Drescher suggests in Good and Real. If so, it may be wise to make sure they are included in friendliness content theory, so that an AI can respect them.

Michael: What is machine ethics and how is it relevant to Friendly AI theory?

Luke: Machine ethics is one of several names for the field that studies two major questions: (1) How can we get machines to behave ethically, and (2) which types of machines can be considered genuine moral agents (in the sense of having rights or moral worth like a human might)? Most of the work in the field so far is relevant only to “narrow AI” machines that are not nearly as intelligent as humans are, but two directions of research that may be useful for Friendly AI are mechanized deontic logic and computational metaethics.

Unfortunately, our understanding of Friendly AI – and of not-yet-invented AI technologies in general – is so primitive that we’re not even sure which fields will turn out to matter. It seems like cognitive neuroscience, preference elicitation, value extrapolation, game theory, and several other fields are relevant to Friendly AI theory, but it might turn out that as we come to understand Friendly AI better, we’ll learn that some research avenues are not relevant. But the only way we can learn that is to continue to make incremental progress in the areas of research that seem to be relevant.

Michael: What are some of those open problems in Friendly AI theory?

Luke: If we think just about about the issue of Friendliness content, some of the open questions are: How does the brain choose which few possible actions it will encode expected subjective value for? How does it combine absolute value and probability estimates to make those expected subjective value computations? Where is absolute value stored, and how is it encoded? How can we extract a coherent utility function or preference set from this activity in human brains? Which algorithms should we use to extrapolate these preferences, and why? When extrapolated, will the values of two different humans converge? Will the values of all humans converge? Would the values of all sentient beings converge? Will the details of human cognitive neuroscience matter much, or will such details get “washed out” by the higher-level mathematical structure of value systems and game theory? How can these extrapolated values be implemented in the goal system of an AI?

Friendliness content is only one area of open problems in Friendly AI theory. There are many other questions. How can an agent make optimal decisions when it is capable of directly editing its own source code, including the source code of the decision mechanism? How can we get an AI to maintain a consistent utility function throughout updates to its ontology? How do we make an AI with preferences about the external world instead of its reward signal? How can we generalize the theory of machine induction – called Solomonoff induction – so that it can use higher-order logics and reason correctly about observation selection effects? How can we approximate such ideal processes such that they are computable?

That’s a start, anyway. 🙂

Michael: In late 2010 the Machine Intelligence Research Institute published “Timeless Decision Theory.” What is timeless decision theory and how is it relevant to Friendly AI theory?

Luke: Decision theory is the study of how to make optimal decisions. We value different things differently, and we are uncertain about which actions will bring about what we value. One of the problems not handled well by traditional decision theories like Evidential decision theory (EDT) and Causal decision theory (CDT) is the problem of logical uncertainty – our uncertainty about mathematical and logical facts, for example what the nth decimal of pi is. One way to think about Timeless decision theory (TDT) is that it’s a step toward a decision theory that can handle logical uncertainty.

For an AI to be safe, its decision mechanism will have to be somewhat clear and mathematically testable for stability and safety. That probably means it will need to make decisions with decision theory, rather than through a relatively opaque neural nets mechanism. So we need to solve some fundamental problems in decision theory first, and logical uncertainty is one of the remaining fundamental problems in decision theory.

Michael: What is reflective decision theory and why is it necessary to Friendly AI?

Luke: Traditional decision theories cannot handle agents that can modify their own source code, including the source code for their decision mechanism. A reflective decision theory is one that can handle such a strongly self-modifying agent. Because an advanced AI will be intelligent enough to modify its own source code, we need to develop a reflective decision theory that will allow us to ensure that the AI will remain Friendly throughout the self-modification and self-improvement process.

Michael: Can you give a concrete example of how your research has made progress towards a solution on one or more open problems in Friendly AI?

Luke: I’ve only just begun working with the Machine Intelligence Research Institute, and making progress on open problems in Friendly AI theory is only one of the many things I do. My first contribution to friendliness content theory was to summarize some very recent advances in neuroeconomics that are relevant to the study of human values. I did that because other researchers in the field were not yet familiar with that material, and I think much of the work in friendliness content theory can be done collaboratively by a broad community of researchers if we are all well-informed.

These results from neuroeconomics appear to be relevant to friendliness content theory, though only time will tell. For example, we’ve learned that expected utility for human actions is encoded cardinally (not ordinally) in the brain, and thus avoids a limiting result from economics called Arrow’s impossibility theorem.

Michael: Why hasn’t the Machine Intelligence Research Institute produced any concrete artificial intelligence code?

Luke: This is a common confusion. Most of the open problems in Friendly AI theory are in math and philosophy, not in computer programming. Sometimes programmers approach us, offering to work on Friendly AI theory, and we reply: “What we need are mathematicians. Are you brilliant at math?”

As it turns out, the heroes who can save the world are not those with incredible strength or the power of flight. They are mathematicians.

### Section Three: Less Wrong and Rationality

Michael: You originally came to the Machine Intelligence Research Institute’s attention when you gained over 10,000 karma points very quickly on Less Wrong. For those who aren’t familiar with it, can you tell us what Less Wrong is and what its relationship is to the Machine Intelligence Research Institute?

Luke: Less Wrong is a group blog and community devoted to the study of rationality: How to get truer beliefs and make better decisions. the Machine Intelligence Research Institute’s co-founder Eliezer Yudkowsky originally wrote hundreds of articles about rationality for another blog, Overcoming Bias, because he wanted to build a community of people that could think clearly about difficult problems like Friendly AI. Those articles were then used as the seed content for a new website, Less Wrong. I discovered Less Wrong because of my interest in rationality, and eventually started writing articles for the site – many of which became very popular.

Michael: What originally got you interested in rationality?

Luke: I was raised an enthusiastic evangelical Christian, and had a dramatic crisis of faith when I learned a few things about the historical Jesus, science, and philosophy. I was disturbed by how confidently I had believed something that was so thoroughly wrong, and I no longer trusted my intuitions. I wanted to avoid being so wrong again, so I studied the phenomena that allow human brains to be so mistaken — things like confirmation bias and the affect heuristic. I also gained an interest in the mathematics of correct thinking, like Bayesian updating and decision theory.

Michael: You recently were a instructor at Rationality MiniCamp in Berkeley during the summer. Can you tell us a little about the MiniCamp, what people did there, what you taught, and so on?

Luke: Anna and I put together the minicamp, a one-week camp full of classes and activities about rationality, social effectiveness, and existential risks. Over 20 people stayed in a large house in Berkeley, where we held the classes. Some of them came from as far away as Sweden and England.

Minicamp was a blast, mostly because the people were so great! We are still in contact, still learning and growing.

We taught things like how to update our beliefs using probability theory, how to use the principle of fungibility to better fulfill our goals, and how to use body language and fashion to improve some parts of our lives that math-heads sometimes neglect! We also taught classes on optimal philanthropy (how to get the most bang for your philanthropic buck) and existential risks (risks that could cause human extinction).

Michael: Besides being a website, Less Wrong groups meet up around the world. If I were interested, where would I be able to get involved in one of those meetups, and what goes on at these meetups?

Luke: Because of how sensitive humans are to context, surrounding yourself with other people who are learning rationality and trying to improve themselves is one of the most powerful ways to improve yourself.

The easiest way to find a Less Wrong meetup near you is probably to check for the most recent front-page post on Less Wrong with the title that begins “Weekly LW Meetups…” That post will list all the Less Wrong meetups happening that week.

Each Less Wrong meetup has different people and different activities. You can contact the meetup organizer for the meetup nearest you for more information.

Michael: The recently released Strategic Plan mentions intentions to “spin off rationality training to another organization so that the Machine Intelligence Research Institute can focus on Friendly AI research.” Can you tell us something about that?

Luke: We believe that building a large community of rationality enthusiasts is crucial to the success of our mission. The Less Wrong rationality community has been an indispensable source of human and financial capital for the Machine Intelligence Research Institute. However, we understand that it’s confusing to be an organization devoted to two such apparently different fields: advanced artificial intelligence and human rationality. That’s why we are working toward launching a new organization devoted to rationality training. The Machine Intelligence Research Institute, then, will be more solely devoted to the safety of advanced artificial intelligence.

### Section Four: Machine Intelligence Research Institute Operations

Michael: You and Louie Helm were hired on September 1st. How were the two of you hired?

Luke: The Machine Intelligence Research Institute doesn’t hire someone unless they do quite a bit of volunteer work first. I first came to the Machine Intelligence Research Institute as a visiting fellow. During the next few months I co-organized and taught at the Rationality Minicamp, taught classes for the longer Rationality Boot Camp, wrote dozens of articles on metaethics and rationality for Less Wrong, wrote the Intelligence Explosion FAQ and IntelligenceExplosion.com, led the writing of a strategic plan for the organization, and did many smaller tasks.

Louie Helm arrived in Berkeley not long after I did. As a past visiting fellow, Louie was the one who had suggested I apply to the visiting fellows program. Louie did some teaching for Rationality Boot Camp, helped me write the strategic plan, developed a donor database so that our contact with donors is more consistent, optimized the Machine Intelligence Research Institute’s finances, did lots of fundraising, and much more.

We both produced lots of value for the organization over those months as volunteers, so the Board hired us – me as a research fellow and Louie as Director of Development.

Michael: What does Louie Helm do as Director of Development?

Luke: We’re a small team, so we all do more than our title says, and Louie is no exception. Louie raises funds, communicates with donors, applies for grants, and so on. But he also launched the Research Associates program, coordinates the Volunteer Network, helps organize the Singularity Summit, seeks out potential new researchers, and more.

Michael: The Machine Intelligence Research Institute just raised $250,000 in our Summer Challenge Grant. What will those funds be spent on? Luke: We were very pleased by the results of the summer challenge grant. No single person gave more than$25,000, so the grant succeeded because so many different people gave. More than 40 people gave $1,000 or more, which shows a high degree of trust from our core supporters. It costs$368,000 annually to support our lean family of eight full-time staff members, four of whom are research fellows: Eliezer Yudkowsky, Anna Salamon, Carl Shulman, and myself. The money will also be used to run the 2011 Singularity Summit, though we expect that event to be cash-positive this year. We plan to redesign the singinst.org website so that it is easier to navigate and provides greater organizational transparency. And with enough funds after the Summit, we hope to hire additional researchers.

Michael: Carl Shulman was hired not long before you and Louie. What is his role in the organization?

Luke: Carl also did quite a lot of work for the Machine Intelligence Research Institute before being hired. He has written several papers and given a few talks, many of which you can read from our publications page. He continues to work on a variety of research projects, and collaborates closely with researchers at Oxfordâs Future of Humanity Institute.

Michael: What sort of new researchers is the Machine Intelligence Research Institute looking for?

Luke: Mathematicians, mostly. If you’re a brilliant math student and want to live and work in the Bay Area where you’ll be surrounded by smart, influential, altruistic people, please apply here.

Did you like this post? You may enjoy our other Conversations posts, including: