# Bill Hibbard on Ethical Artificial Intelligence

|   |  Conversations

Bill Hibbard is an Emeritus Senior Scientist at the University of Wisconsin-Madison Space Science and Engineering Center, currently working on issues of AI safety and unintended behaviors. He has a BA in Mathematics and MS and PhD in Computer Sciences, all from the University of Wisconsin-Madison. He is the author of Super-Intelligent Machines, “Avoiding Unintended AI Behaviors,” “Decision Support for Safe AI Design,” and “Ethical Artificial Intelligence.” He is also principal author of the Vis5D, Cave5D, and VisAD open source visualization systems.

Luke Muehlhauser: You recently released a self-published book, Ethical Artificial Intelligence, which “combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence.” Most of the book is devoted to the kind of exploratory engineering in AI that you and I described in a recent CACM article, such that you mathematically analyze the behavioral properties of classes of future AI agents, e.g. utility-maximizing agents.

Many AI scientists have the intuition that such early, exploratory work is very unlikely to pay off when we are so far from building an AGI, and don’t what an AGI will look like. For example, Michael Littman wrote:

…proposing specific mechanisms for combatting this amorphous threat [of AGI] is a bit like trying to engineer airbags before we’ve thought of the idea of cars. Safety has to be addressed in context and the context we’re talking about is still absurdly speculative.

How would you defend the value of the kind of work you do in Ethical Artificial Intelligence to Littman and others who share his skepticism?

Bill Hibbard: This is a good question, Luke. The analogy with cars is useful. Unlike engineering airbags before cars are even thought of, we are already working hard to develop AI and can anticipate various types of dangers.

When cars were first imagined, engineers probably knew that they would propel human bodies at speed and that they would need to carry some concentrated energy source. They knew from accidents with horse carriages that human bodies travelling at speed are liable to injury, and they knew that concentrated energy sources are liable to fire and explosion which may injure humans. This is analogous with what we know about future AI: that to serve humans well AI will have to know a lot about individual humans and that humans will not be able to monitor every individual action by AI. These properties of future AI pose dangers just as the basic properties of cars (propelling humans and carrying energy) pose dangers.

Early car designers could have anticipated that no individual car would carry all of humanity and thus car accidents would not pose existential threats to humanity. To the extent that cars threaten human safety and health via pollution, we have time to notice these threats and address them. With AI we can anticipate possible scenarios that do threaten humanity and that may be difficult to address once the AI system is operational. For example, as described in the first chapter of my book, the Omniscience AI, with a detailed model of human society and a goal of maximizing profits, threatens to control human society. However, AI poses much greater potential benefits than cars but also much greater dangers. This justifies greater effort to anticipate the dangers of AI.

It’s also worth noting that the abstract frameworks for exploratory engineering apply to any reasonable future AI design. As the second chapter of my book describes, any set of complete and transitive preferences among outcomes can be expressed by a utility function. If preferences are incomplete then there are outcomes A and B with no preference between them, so the AI agent cannot decide. If preferences are not transitive then there are outcomes A, B, and C such that A is preferred to B, B is preferred to C, and C is preferred to A. Again, the AI agent cannot decide. Thus our exploratory engineering can assume utility maximizing agents and cover all cases in which the AI agent can decide among outcomes.

Similarly, the dangers discussed in the book are generally applicable. Any design for powerful AI should explain how it will avoid the self-delusion problem described by Ring and Orseau, the problem of corrupting the reward generator as described by Hutter, and the problem of unintended instrumental actions as described by Omohundro (he called them basic AI drives).

The threat level from AI justifies addressing AI dangers now and with significant resources. And we are developing tools that enable us to analyze dangers of AI systems before we know the specifics of their designs.

Luke: Your book mostly discusses AGIs rather than contemporary narrow AI systems. Roughly when do you expect humanity will develop something resembling the kind of AGIs you have in mind? Or, what does your probability distribution over “Years to AGI” look like?

Bill: In my 2002 book, Super-Intelligent Machines, I wrote that “machines as intelligent as humans are possible and will exist within the next century or so.” (The publisher owns the copyright for my 2002 book, preventing me from giving electronic copies to people, and charges more than $100 per print copy. This largely explains my decision to put my current book on arxiv.org.) I like to say that we will get to human-level AI during the lives of children already born and in fact I can’t help looking at children with amazement, contemplating the events they will see. In his 2005 book, The Singularity is Near, Ray Kurzweil predicted human-level AI by 2029. He has a good track record at technology prediction and I hope he is right: I was born in 1948 so have a good chance of living until 2029. He also predicted the singularity by 2045, which must include the kind of very powerful AI systems discussed in my recent book. Although it has nowhere near human-level intelligence, the DeepMind Atari player is a general AI system in the sense that it has no foreknowledge of Atari games other than knowing that the goal is to get a high score. The remarkable success of this system increases my confidence that we will create true AGI systems. DeepMind was purchased by Google, and all the big IT companies are energetically developing AI. It is the combination of AGI techniques and access to hundreds of millions of human users that can create the scenario of the Omniscience AI described in Chapter 1 of my book. Similarly for government surveillance agencies, which have hundreds of millions of unwitting users. In 1983 I made a wager that a computer would beat the world Go champion by 2013, and lost. In fact, most predictions about AI have been wrong. Thus we must bring some humility to our predictions about the dates of AI milestones. Because Ray Kurzweil’s predictions are based on quantitative extrapolation from historical trends and because of his good track record, I generally defer to his predictions. If human- level AI will exist by 2029 and very capable and dangerous AGI systems will exist by 2045, it is urgent that we understand the social effects and dangers of AI as soon as possible. Luke: Which section(s) of your book do you think are most likely to be intriguing to computer scientists, because they’ll learn something that seems novel (to them) and plausibly significant? Bill: Thanks Luke. There are several sections of the book that may be interesting or useful. At the Workshop on AI and Ethics at AAAI-15 there was some confusion about the generality of utilitarian ethics, based on the connotation that a utility function is defined as a linear sum of features or similar simple expression. However, as explained in Chapter 2 and in my first answer in this interview, more complex utility functions can express any set of complete and transitive preferences among outcomes. That is, if an agent always has a most preferred outcome among any finite set of outcomes, then that agent can be expressed as a utility-maximizing agent. Chapter 4 goes into detail on the issues of agents whose environment models are finite stochastic programs. Most of the papers in the AGI community assume that environments are modeled by programs for universal Turing machines, with no limit on their memory use. I think that much can be added to what I wrote in Chapter 4, and hope that someone will do that. The self-modeling agents of Chapter 8 are the formal framework analog of value learners such as the DeepMind Atari player, and their use as a formal framework is novel. Self-modeling agents have useful properties, such as the capability to value agent resource increases and a way to avoid the problem of the agent utility function being inconsistent with the agent’s definition. An example of this problem is what Armstrong refers to as “motivated value selection.” More generally, it is the problem of adding any “special” actions to a utility maximizing agent, where those special actions do not maximize the utility function. In motivated value selection, the special action is the agent evolving its utility function. A utility maximizing agent may choose an action of removing the special actions from its definition, as counter-productive to maximizing its utility function. Self-modeling agents include such evolutionary special actions in the definition of their value functions, and they learn a model of their value function which they use to choose their next action. Thus there is no inconsistency. I think these ideas should be interesting to other computer scientists. At the FLI conference in San Juan in January 2015 there was concern about the kind of technical AI risks described in Chapters 5 – 9 of my book, and concern about technological unemployment. However, there was not much concern about the dangers associated with: 1. Large AI servers connected to the electronic companions that will be carried by large numbers of people and the ability of the human owners of those AI servers to manipulate society, and 2. A future world in which great wealth can buy increased intelligence and superior intelligence can generate increased wealth. This positive feedback loop will result in a power law distribution of intelligence as opposed to the current normal distribution of IQs with mean = 100 and standard deviation = 15. These issues are discussed in Chapters 1 and 10 of my book. The Global Brain researchers study the way intelligence is exhibited by the network of humans; the change in distribution of intelligence of humans and machines who are nodes of the network will have profound effects on the nature of the Global Brain. Beyond computer scientists, I think the public needs to be aware of these issues. Finally, I’d like to expand on my previous answer, specifically that the DeepMind Atari player is an example of general AI. In Chapter 1 of my book I describe how current AI systems have environment models that are designed by human engineers, whereas future AI systems will need to learn environment models that are too complex to be designed by human engineers. The DeepMind system does not use an environment model designed by engineers. It is “model-free” but the value function that it learns is just as complex as an environment model and in fact encodes an implicit environment model. Thus the DeepMind system is the first example of a future AI system with significant functionality. Luke: Can elaborate what you mean by saying that “the self-modeling agents of Chapter 8 are the formal framework analog of value learners such as the DeepMind Atari player”? Are you saying that the formal work you do in chapter 8 has implications even for an extant system like the DeepMind Atari player, because they are sufficiently analogous? Bill: To elaborate on what I mean by “the self-modeling agents of Chapter 8 are the formal framework analog of value learners such as the DeepMind Atari player,” self-modeling agents and value learners both learn a function v(ha) that produces the expected value of proposed action a after interaction history h (that is, h is a sequence of observations and actions; see my book for details). For the DeepMind Atari player, v(ha) is the expected game score after action a and h is restricted to the most recent observation (i.e., a game screen snapshot). Whereas the DeepMind system must be practically computable, the self-modeling agent framework is a purely mathematical definition. This framework is finitely computable but any practical implementation would have to use approximations. The book offers a few suggestions about computing techniques, but the discussion is not very deep. Because extant systems such as the DeepMind Atari player are not yet close to human-level intelligence, there is no implication that this system should be subject to safety constraints. It is encouraging that the folks at DeepMind and at Vicarious are concerned about AI ethics, for two reasons: 1) They are likely to apply ethical requirements to their systems as they approach human-level, and 2) They are very smart and can probably add a lot to AI safety research. Generally, research on safe and ethical AI complicates the task of creating AI by adding requirements. My book develops a three-argument utility function expressing human values which will be very complex to compute. Similarly for other components of the definition of self-modeling agents in the book. I think there are implications the other way around. The self-modeling framework is based on statistical learning and the success of the DeepMind Atari player, the Vicarious captcha solver, IBM’s Watson, and other practical systems that use statistical learning techniques increases our confidence that these techniques can actually work for AI capability and safety. Some researchers suggest that safe AI should rely on logical deduction rather than statistical learning. This idea offers greater possibility of proving safety properties of AI, but so far there are no compelling demonstrations of AI systems based on logical deduction (at least, none that I am aware of). Such demonstrations would add a lot of confidence in our ability to prove safety properties of AI systems. Luke: Your 10th chapter considers the political aspects of advanced AI. What do you think can be done now to improve our chances of solving the political challenges of AI in the future? Sam Altman of YC has proposed various kinds of regulation — do you agree with his general thinking? What other ideas do you have? Bill: The central point of my 2002 book was the need for public education about and control over above-human-level AI. The current public discussion by Stephen Hawking, Bill Gates, Elon Musk, Ray Kurzweil, and others about the dangers of AI is very healthy, as it educates the public. Similarly for the Singularity Summits organized by the Singularity Institute (MIRI’s predecessor), which I thought were the best thing the Singularity Institute did. In the US people cannot own automatic weapons, guns of greater than .50 caliber, or explosives without a license. It would be absurd to license such things but to allow unregulated development of above-human-level AI. As the public is educated about AI, I think some form of regulation will be inevitable. However, as they say, the devil will be in the details and humans will be unable to compete with future AI on details. Complex details will be AI’s forte. So formulating effective regulation will be a political challenge. The Glass-Steagal Act of 1933, regulating banking, was 37 pages long. The Dodd-Frank bill of 2010, also to regulate banking 77 years later, was 848 pages long. An army of lawyers drafted the bill, many employed to protect the interests of groups affected by the bill. The increasing complexity of laws reflects efforts by regulated entities to lighten the burden of regulation. The stakes in regulating AI will be huge and we can expect armies of lawyers, with the aid of the AI systems being regulated, to create very complex laws. In the second chapter of my book, I conclude that ethical rules are inevitably ambiguous and base my proposed safe AI design on human values expressed in a utility function rather than rules. Consider the current case before the US Supreme Court to interpret the meaning of the words “established by the state” in the context of the 363,086 words of the Affordable Care Act. This is a good example of the ambiguity of rules. Once AI regulations become law, armies of lawyers, aided by AI, will be engaged in debates over their interpretation and application. The best counterbalance to armies of lawyers creating complexity on any legal issue is a public educated about the issue and engaged in protecting their own interests. Automobile safety is a good example. This will also be the case with AI regulation. And, as discussed in the introductory section of Chapter 10, there is precedent for the compassionate intentions of some wealthy and powerful people and this may serve to counterbalance their interest in creating complexity. Privacy regulations, which affect existing large IT systems employing AI, already exist in the US and even more so in Europe. However, many IT services depend on accurate models of users’ preferences. At the recent FLI conference in San Juan, I tried to make the point that a danger from AI will be that people will want the kind of close, personal relationship with AI systems that will enable intrusion and manipulation by AI. The Omniscience AI described in Chapter 1 of my book is an example. As an astute IT lawyer said at the FLI conference, the question of whether an IT innovation will be legal depends on whether it will be popular. This brings us back to the need for public education about AI. For people to resist being seduced by the short term benefits of close relationships with AI, they need to understand the long term consequences. I think it is not realistic to prohibit close relationships between people and AI, but perhaps the public, if it understands the issues, can demand some regulation over the goals for which those relationships are exploited. The final section of my Chapter 10 says that AI developers and testers should recognize that they are acting as agents for the future of humanity and that their designs and test results should be transparent to the public. The FLI open letter and Google’s panel on AI ethics are encouraging signs that AI developers do recognize their role as agents for future humanity. Also, DeepMind has been transparent about the technology of their Atari player, even making source code available for non-commercial purposes. AI developers deserve to be rewarded for their success. On the other hand, people have a right to avoid losing control over their own lives to an all-powerful AI and its wealthy human owners. The problem is to find a way to achieve both of these goals. Among current humans, with naturally evolved brains, IQ has a normal distribution. When brains are artifacts, their intelligence is likely to have a power law distribution. This is the pattern of distributions of sizes of other artifacts such as trucks, ships, buildings, and computers. The average human will not be able to understand or ever learn the languages used by the most intelligent minds. This may mean the end of any direct voice in public policy decisions for average humans – effectively the end of democracy. But if large AI systems are maximizing utility functions that account for the values of individual humans, that may take the place of direct democracy. Chapters 6 – 8 of my book propose mathematical definitions for an AI design that does balance the values of individual humans. Chapter 10 suggests that this design may be modified to provide different weights to the values of different people, for example to reward those who develop AI systems. I must admit that the connection between the technical chapters of my book and Chapter 10, on politics, is weak. Political issues are just difficult. For example, the future will probably have multiple AI systems with conflicting utility functions and a power law distribution of intelligence. It is difficult to predict how such a society would function and how it would affect humans, and this unpredictability is a risk. Creating a world with a single powerful AI system also poses risks, and may be difficult to achieve. Since my first paper about future AI in 2001, I have thought that the largest risks from AI are political rather than technical. We have an ethical obligation to educate the public about the future of AI, and an educated public is an essential element of finding a good outcome from AI. Luke: Thanks, Bill! Did you like this post? You may enjoy our other Conversations posts, including: • http://pragmatarianism.blogspot.com/ Xerographica You’re trying to tackle the issue of how to prevent robots from inefficiently allocating humans while I’m trying to tackle the issue of how to prevent humans from inefficiently allocating humans! Would you prefer being allocated by congress or by taxpayers? Based on my considerably research on the topic… I’m pretty sure that you’d be much safer in the hands of taxpayers’. And by “safer” I mean less likely to be wasted (inefficiently allocated). Given that congress spends other people’s money… it didn’t cost them anything to send me to Afghanistan. But it would have cost taxpayers their hard-earned money to do so. And with greater personal cost comes greater scrutiny. Taxpayers would have wanted more information and, as a group, they would have been able to process far more information than congress, as a group, was able to. If I ask you for$1000, then chances are really good that you’re going to want a really good reason why you should give me the money. Feel free to prove me wrong! You’d be the first to do so! This strong aversion to loss is the reason why the market functions as a fail safe device. It ensures that too many resources don’t end up in wasteful hands. So when many resources do end up in some hands… “taxpayers”… we have to recognize/respect that this allocation is the product of an extremely robust vetting/vouching/validating process.

My point is, if you can understand how and why transferring the power of the purse from 500 congresspeople to millions of earners/taxpayers would decrease the chances of humans being inefficiently allocated… then this understanding could potentially help increase your chances of figuring out how to prevent robots from inefficiently allocating humans. Because, if you can’t ensure that humans won’t destroy each other with bombs…. then it really doesn’t seem likely that you’ll be able to ensure that humans won’t destroy each other with AIs.

• https://www.tumblr.com/blog/sojournshepard Indianna Jones

Man, reading your comment, I feel like I just ran into a kindred spirit. Thank you for your insight! As much as I legitimately enjoy reading insights different than mine, (more understanding = less fighting = world peace, maaaaaan 😛 ), it’s nice to see someone who sees the same issues within, well, an issue!

• http://pragmatarianism.blogspot.com/ Xerographica

Are you sure we lack the means to control them effectively? AI Safety vs Human Safety

• Kevin Scales

Because we all know AI psychology is identical to human psychology. What a moron.

• http://pragmatarianism.blogspot.com/ Xerographica

Ok, I’m a moron. This means that the rules of gravity don’t apply to me like they apply to geniuses like you. You need something like a plane in order to fly. Not me, if I want to fly then I just flap my arms at a moderate pace.

The same thing is true when it comes to the rules of economics. Because I’m a moron… my irrational choices don’t have rational consequences. Last year I spent all my time and energy attacking windmills because I thought that they were giants. And now I’m filthy rich.

Because I’m a moron… I’m sure that the rules of gravity and economics won’t apply to robots like they apply to humans.

• Kevin Scales

Economics will prevent AI harming humans in the same way that gravity prevents planes from flying.