Roman Yampolskiy on AI Safety Engineering

 |   |  Conversations

Roman V. Yampolskiy holds a PhD degree from the Department of Computer Science and Engineering at the University at Buffalo. There he was a recipient of a four year NSF IGERT fellowship. Before beginning his doctoral studies, Dr. Yampolskiy received a BS/MS (High Honors) combined degree in Computer Science from Rochester Institute of Technology, NY, USA.

After completing his PhD, Dr. Yampolskiy held a position of an Affiliate Academic at the Center for Advanced Spatial Analysis, University of London, College of London. In 2008 Dr. Yampolskiy accepted an assistant professor position at the Speed School of Engineering, University of Louisville, KY. He had previously conducted research at the Laboratory for Applied Computing (currently known as Center for Advancing the Study of Infrastructure) at the Rochester Institute of Technology and at the Center for Unified Biometrics and Sensors at the University at Buffalo. Dr. Yampolskiy is also an alumnus of Singularity University (GSP2012) and a past visiting fellow of MIRI.

Dr. Yampolskiy’s main areas of interest are behavioral biometrics, digital forensics, pattern recognition, genetic algorithms, neural networks, artificial intelligence and games. Dr. Yampolskiy is an author of over 100 publications including multiple journal articles and books. His research has been cited by numerous scientists and profiled in popular magazines both American and foreign (New Scientist, Poker Magazine, Science World Magazine), dozens of websites (BBC, MSNBC, Yahoo! News) and on radio (German National Radio, Alex Jones Show). Reports about his work have attracted international attention and have been translated into many languages including Czech, Danish, Dutch, French, German, Hungarian, Italian, Polish, Romanian, and Spanish


Luke Muehlhauser: In Yampolskiy (2013) you argue that machine ethics is the wrong approach for AI safety, and we should use an “AI safety engineering” approach instead. Specifically, you write:

We don’t need machines which are Full Ethical Agents debating about what is right and wrong, we need our machines to be inherently safe and law abiding.

As you see it, what is the difference between “machine ethics” and “AI safety engineering,” and why is the latter a superior approach?


Roman Yampolskiy: The main difference between the two approaches is in how the AI system is designed. In the case of machine ethics the goal is to construct an artificial ethicist capable of making ethical and moral judgments about humanity. I am particularly concerned if such decisions include “live or die” decisions, but it is a natural domain of Full Ethical Agents and so many have stated that machines should be given such decision power. In fact some have argued that machines will be superior to humans in that domain just like they are (or will be) in most other domains.

I think it is a serious mistake to give machines such power over humans. First, once we relinquish moral oversight we will not be able to undo that decision and get the power back. Second, we have no way to reward or punish machines for their incorrect decisions — essentially we will end up with an immortal dictator with perfect immunity against any prosecution. Sounds like a very dangerous scenario to me.

On the other hand, AI safety engineering treats AI system design like product design, where your only concern is product liability. Does the system strictly follow formal specifications? The important thing to emphasize is that the product is not a Full Moral Agent by design and so never gets to pass moral judgment on its human owners.

A real life example of this difference can be seen in military drones. A fully autonomous drone deciding at whom to fire at will has to make an ethical decision of which humans are an enemy worthy of killing, while a drone with a man-in-the-loop design may autonomously locate potential targets but needs a human to make a decision to fire.

Obviously the situation is not as clear cut as my example tries to show, but it gives you an idea of what I have in mind. To summarize, AI systems we design should remain as our tools not equal or superior partners in “live or die” decision making.


Luke: I tend to think of machine ethics and AI safety engineering as complimentary approaches. AI safety engineering may be sufficient for relatively limited AIs such as those we have today, but when we build fully autonomous machines with general intelligence, we’ll need to make sure they want the same things we want, as the constraints that come with “safety engineering” will be insufficient at that point. Are you saying that safety engineering might also be sufficient for fully autonomous machines, or are you saying we might be able to convince the world to never build fully autonomous machines (so that we don’t need machine ethics), or are you saying something else?


Roman: I think fully autonomous machines can never be safe and so should not be constructed. I am not naïve; I don’t think I will succeed in convincing the world not to build fully autonomous machines, but I still think that point of view needs to be verbalized.

You are right to point out that AI safety engineering can only work on AIs which are not fully autonomous, but since I think that fully autonomous machines can never be safe, AI safety engineering is the best we can do.

I guess I should briefly explain why I think that fully autonomous machines can’t ever be assumed to be safe. The difficulty of the problem is not that one particular step on the road to friendly AI is hard and once we solve it we are done, all steps on that path are simply impossible. First, human values are inconsistent and dynamic and so can never be understood/programmed into a machine. Suggestions for overcoming this obstacle require changing humanity into something it is not, and so by definition destroying it. Second, even if we did have a consistent and static set of values to implement we would have no way of knowing if a self-modifying, self-improving, continuously learning intelligence greater than ours will continue to enforce that set of values. Some can argue that friendly AI research is exactly what will teach us how to do that, but I think fundamental limits on verifiability will prevent any such proof. At best we will arrive at a probabilistic proof that a system is consistent with some set of fixed constraints, but it is far from “safe” for an unrestricted set of inputs.

It is also unlikely that a Friendly AI will be constructible before a general AI system, due to higher complexity and impossibility of incremental testing.

Worse yet, any truly intelligent system will treat its “be friendly” desire the same way very smart people deal with constraints placed in their minds by society. They basically see them as biases and learn to remove them. In fact if I understand correctly both the LessWrong community and CFAR are organizations devoted to removing pre-existing bias from human level intelligent systems (people) — why would a superintelligent machine not go through the same “mental cleaning” and treat its soft spot for humans as completely irrational? Or are we assuming that humans are superior to super-AI in their de-biasing ability?


Luke: Thanks for clarifying. I agree that “Friendly AI” — a machine superintelligence that stably optimizes for humane values — might be impossible. Humans provide an existence proof for the possibility of general intelligence, but we have no existence proof for the possibility of Friendly AI. (Though, by the orthogonality thesis, there should be some super-powerful optimization process we would be happy to have created, though it may be very difficult to identify it in advance.)

You asked “why would a superintelligent machine not . . . treat its soft spot for humans as completely irrational?” Rationality as typically defined in cognitive science and AI is relative to one’s goals. So if a rational-agent-style AI valued human flourishing (as a terminal rather than instrumental goal), then it wouldn’t treat its preference for human flourishing as irrational. It would only do that if its preference for human flourishing was an instrumental goal, and it discovered a way to achieve its terminal values more efficiently without achieving the instrumental goal of human flourishing. Of course, the first powerful AIs to be built might not use a rational-agent structure, and we might fail to specify “human flourishing” properly, and we might fail to build the AI such that it will preserve that goal structure upon self-modification, and so on. But if we succeed in all those things (and a few others) then I’m not so worried about a superintelligent machine treating its “soft spot for humans” as irrational, because rationality is defined in terms of ones values.

Anyway: so it seems your recommended strategy for dealing with fully autonomous machines is “Don’t ever build them” — the “relinquishment” strategy surveyed in section 3.5 of Sotala & Yampolskiy (2013). Is there any conceivable way Earth could succeed in implementing that strategy?


Roman: Many people are programmed from early childhood with a terminal goal of serving God. We can say that they are God friendly. Some of them, as they mature and become truly human-level-intelligent, remove this God friendliness bias despite it being a terminal not instrumental goal. So despite all the theoretical work on orthogonality thesis the only actual example of intelligent machines we have is extremely likely to give up its pre-programmed friendliness via rational de-biasing if exposed to certain new data.

I previously listed some problematic steps on the road to FAI, but it was not an exhaustive list. Additionally, all programs have bugs, can be hacked or malfunction because of natural or externally caused hardware failure, etc. To summarize, at best we will end up with a probabilistically safe system.

Anyway, you ask me if there is any conceivable way we could succeed in implementing the “Don’t ever build them” strategy. Conceivable yes, desirable NO. Societies such as Amish or North Koreans are unlikely to create superintelligent machines anytime soon. However, forcing similar level restrictions on technological use/development is neither practical nor desirable.

As the cost of hardware exponentially decreases the capability necessary to develop an AI system opens up to single inventors and small teams. I would not be surprised if the first AI came out of a garage somewhere, in a way similar to how Apple and Google was started. Obviously, there is not much we can do to prevent that from happening.


Luke: Our discussion has split into two threads. I’ll address the first thread (about changing one’s values) in this question, and come back to the second thread (about relinquishment) in a later question.

You talked about humans deciding that their theological preferences were irrational. That is a good example of a general intelligence deciding to change its values — indeed, as a former Christian, I had exactly that experience! And I agree that many general intelligences would do this kind of thing.

What I said in my previous comment was just that some kinds of AIs wouldn’t change their terminal values in this way, for example those with a rational agent architecture. Humans, famously, are not rational agents: we might say they have a “spaghetti code” architecture instead. (Even rational agents, however, will in some cases change their terminal values. See e.g. De Blanc 2011 and Bostrom 2012.)

Do you think we disagree about anything, here?


Roman: I am not sure. To me “even rational agents, however, will in some cases change their terminal values” means that friendly AI may decide to be unfriendly. If you agree with that, we are in complete agreement.


Luke: Well, the idea is that if we can identify the particular contexts in which agents will change their terminal values, then perhaps we can prevent such changes. But this isn’t yet known. In any case, I certainly agree that an AI which seems to be “friendly,” as far as we can discern, could turn out not to be friendly, or could become unfriendly at some later point. The question is whether we can make the risk of that happening so small that it is worth running the AI anyway — especially in a context in which e.g. other actors will soon run other AIs with fewer safety guarantees. (This idea of running or “turning on” an AI for the first time is of course oversimplified, but hopefully I’ve communicated what I’m trying to say.)

Now, back to the question of relinquishment: Perhaps I’ve misheard you, but it sounds like you’re saying that machine ethics is hopelessly difficult, that AI safety engineering will be insufficient for fully autonomous AIs, and that fully autonomous AIs will be built because we can’t/shouldn’t rely on relinquishment. If that’s right, it seems like we have no “winning” options on the table. Is that what you’re saying?


Roman: Yes. I don’t see a permanent, 100% safe option. We can develop temporarily solutions such as Confinement or AI Safety Engineering, but at best this will delay the full outbreak of problems. We can also get very lucky — maybe constructing AGI turns out to be too difficult/impossible, maybe it is possible but the constructed AI will happen to be human-neutral, by chance. Maybe we are less lucky and an artilect war will take place and prevent development. It is also possible that as more researchers join in the AI Safety Research a realization of danger will result in diminished effort to construct AGI. (Similar to how perceived dangers of chemical and biological weapons or human cloning have at least temporarily reduced efforts in those fields).


Luke: You’re currently raising funds on indiegogo to support you in writing a book about machine superintelligence. Why are you writing the book, and what do you hope to accomplish with it?


Roman: Most people don’t read research papers. If we want the issue of AI safety to become as well-known as global warming we need to address the majority of people in a more direct way. With such popularity might come some benefit as I said in my answer to your previous question. Most people whose opinion matters read books. Unfortunately majority of AI books on the market today talks only about what AI system will be able to do for us, not to us. I think that writing a book which in purely scientific terms addresses potential dangers of AI and what we can do about it is going to be extremely beneficial to reduction of risk posed by AGI. So I am currently writing the book I called Artificial Superintelligence: a Futuristic Approach. I made it available for pre-order to help reduce the final costs of publishing by taking advantage of printing in large quantity. In addition to crowd-funding the book I am also relying on the power of the crowd to help me edit the book. For just $64 anyone can become an editor for the book. You will get an early draft of the book to proofread and to suggest modifications and improvements! Your help will be acknowledged in the book and you will of course also get a Free signed hardcopy of the book in its final form. In fact that the option (to become an editor) turned out to be as popular as the option to pre-order a digital copy of the book, indicating that I am on the right path here. So I encourage everyone concerned about the issue of AI safety to consider helping out with the project in any way they can.


Luke: Thanks Roman!

  • M_1

    “Many people are programmed from early childhood with a terminal goal of
    serving God. We can say that they are God friendly. Some of them, as
    they mature and become truly human-level-intelligent, remove this God
    friendliness bias despite it being a terminal not instrumental goal.”

    I believe this ignores the most important Friendly AI concept: the whole system that results in an intelligent Friendly AI in the first place should be inherently non-functional if the ability to discard the foundational goal of “friendliness” is an option. This is less like a human being deciding to stop believing in religion, and more like a human being deciding to stop metabolizing oxygen. You just can’t do it, and there isn’t any good reason to do it (within the limits that the analogy is useful to the discussion).

    Of course, the real problem is that the world is full of bad and crazy people, and sooner or later a total nutjob is likely to produce an AGI which is intentionally unfriendly (or at least, not friendly-restricted). I think our best hope is to already have extremely capable friendly AGIs available before that happens, which I believe is likely to be our best defense against this scenario.

    • M_1

      EY’s “Creating a Friendly AI” section 5.3.5:

      “Subgoals do not have independent decisive power. They do not have the power to promote or protect themselves. Actions, including self-modiication actions, are taken by a higher-level decision process whose sole metric of desirability is predicted supergoal fulillment. An action which favors a subgoal at the unavoidable expense of another goal, or a parent goal, is not even ‘tempting’; it is simply, automatically, undesirable.”

  • Nevan Wichers
    • http://www.nothingismere.com/ Rob Bensinger

      Yes, that’s the right link. It should work above, now. Thanks, Nevan!