Armando Tacchella on Safety in Future AI Systems

 |   |  Conversations

Armando Tacchella is Associate Professor of Information Systems at the Faculty of Engineering, at the University of Genoa. He obtained his Ph.D in Electrical and Computer Engineering from the University of Genoa in 2001 and his “Laurea” (M.Sc equivalent) in Computer Engineering in 1997. His teaching activities include graduate courses in AI, formal languages, compilers, and machine learning as well as undergraduate courses in design and analysis of algorithms. His research interest are mainly in the field of AI, with a focus on systems and techniques for automated reasoning and machine learning, and applications to modeling, verification and monitoring of cyber-physical systems. His recent publications focus on improving the dependability of complex control architectures using formal methods, from the design stage till the operational stage of the system. He has published more than forty papers in international conferences and journals including AAAI, IJCAI, CAV, IJCAR, JAI, JAIR, IEEE-TCAD. In 2007 he was awarded by the Italian Association of Artificial Intelligence (AI*IA) the prize “Marco Somalvico” for the best young Italian researcher in AI.

Luke Muehlhauser: My summary of Transparency in Safety-Critical Systems was:

Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.

The last sentence applies to a 2009 paper you co-authored with Luca Pulina, in which you show formal guarantees about the behavior of a trained multi-layer perceptron (MLP). Could you explain roughly how that works, and what kind of guarantees you were able to prove?


Armando Tacchella: A trained MLP is just a mathematical function and, in principle, its properties could be verified in a logic that deals with real numbers and interpreted linear functions. However, useful MLPs, even the single hidden layer ones that we consider in our paper, use non-linear transcendental functions in their “activation” units (e.g., logistic or hyperbolic tangent). Any logic language that deals natively with such functions is bound to have a non-decidable decision problem — for a discussion see the introduction of this paper.

Our approach to establish safety of MLPs is thus based on an abstraction-refinement paradigm, whereby an “abstract” MLP is obtained from the “concrete” one using some over-approximation whose properties are decidable. This is a technique introduced by P. Cousot and R. Cousot some time ago and more recently exploited in the model checking community to attempt the verification of systems whose properties are not decidable.

If the abstract MLP is found  to satisfy the properties, then also the concrete one does. Otherwise, we obtain an “abstract” counterexample, i.e., a set of inputs which violates a property in the abstract MLP.

If the set of inputs is found to violate the property in the concrete MLP, then the concrete MLP is faulty, and must be fixed. On the other hand, if the concrete MLP works fine when supplied the abstract counterexample, such counterexample is “spurious”, meaning that it is an artifact of the abstraction which was too coarse to be useful.

The abstraction is thus “refined”, i.e., made less coarse, perhaps more expensive to check, and the process iterates, until either the MLP is proven safe, or the process exhausts available resources.

Clearly, abstraction-refinement, does not work around the initial undecidability issue, because an infinite chain of refinements due to spurious counterexamples might always exist. However, in practice, we are often able to establish the validity of a property with a reasonable number of refinements.

In our paper, we instantiate the approach above to the specific case of single hidden layer MLPs which are nevertheless universal approximators when subject to some reasonable conditions that we ensure. In particular:

    1. We abstract MLPs to corresponding “interval functions”, i.e., if the MLP is y = f(x) then the abstract one is [y] = F([x]), where “F” is the interval function corresponding to “f”.
    2. We consider a “safety property”, i.e., if [x] is contained in the input range of f, then [y] will be contained in some interval [l,h] with l < h (the “safety bound”).
    3. We prove the property on F using a decision procedure for Linear Arithmetic over reals, e.g., a Satisfiability Modulo Theory (SMT) solver. If the property holds for F, i.e., for all [x] in the domain of f we have that F([x]) is contained in [l,h], then this is the case also for f.

If a counterexample is found, then there is some interval [x] such that F([x]) is not contained in [l,h]. We consider any x contained in [x], and we try the corresponding f(x) to check whether f(x) is in [l,h]. If one is found, then f is faulty and must be fixed. Otherwise, the counterexample was spurious, so we restart from step (1) with a finer grained refinement.

In the end, if the MLP f passes our test, we are guaranteed that whatever input it will receive, it will never exceed its “specifications” in term of minimum and maximum expected output values.


Luke: In what ways do you think that formal methods helped to “explain” the MLP, and make it more transparent to human programmers?


Armando: The answer to this question largely depends on what we mean by “explanation”.

If by “explanation” we mean that, using our verification methodology, the neural net user is now able to understand clearly which “reasoning rule” is implemented by which neuron (or set of neurons), then our method does not provide any further insight in that direction.

However, if we look for “behavioral explanation” then, the techniques proposed in our paper, helps in at least two ways:

  • the formal analysis of the original trained neural network enables the network user to establish whether its behaviors are in line with the specifications. When this is not the case, the user is given a (set of) counterexample(s) that will aid her in identifying the causes of misbehaviors. Indeed, by perusing the verification outputs, the user learns about the “points of failure” in the input/output space of the network, which, in turn, will enable him to improve his original design (e.g., by collecting more experimental evidence around those points)
  • since a neural network is not “classical” software — albeit from a formal point of view we treat it exactly as such — it might be the case that the user cannot readily “fix” the network using counterexamples. Our “repair” procedure can help to fix the network in an automated way by performing a controlled training using the result of “spurious” counterexamples. If the system to be learned by the network is available for experimentation, our repair procedure can also leverage the results of actual counterexamples (a form of “active learning”).

Technically, repair exposes the network to the danger of “overfitting” input data, i.e., the variance in the network performance is very high when considering several different datasets. However, we controlled for that possibility, and it never occurred in our experiments.


Luke: Before our interview, you wrote to me that:

I am quite skeptical that we will see any strongly autonomous robot around unless we are able to show (in a convincing way) that the robot is harmless for the environment…

Since I do not expect to be able to achieve safe strong autonomy with “correct-by-construction” and “proven in use” philosophies alone (the ones we use to build airplanes), I expect formal analysis to be even more important in robots than in any other implement.

Were you saying that, despite the tremendous economic and military incentives to build ever-more-capable machines without waiting for safety-related capabilities to catch up, companies and governments will in fact not release strongly autonomous programs and robots “into the real world” without first making sure that we have the kind of confidence about their safety that we currently require for (e.g.) autopilot software?


Armando: Engineers know very well the kind of pressure your are talking about and, in fact, if we consider any catalog of “horror stories” – see e.g. Software Horror Stories for more than 100 cases related only to SW engineering – we cannot rule out the possibility of something bad happening when autonomous robots will hit the market.

For robots endowed with a relatively weak autonomy, or with a relatively low probability of causing serious harm to the surrounding environment, incidents may also be part of a trade-off wherein robot makers are not willing to invest huge amounts of effort to eliminate them completely. In these cases, I expect traditional risk analysis, and engineering best practices to prevail over a major shift to precise verification methods.

For robots endowed with strong autonomy, and a non-negligible potential to cause serious harms to the environment, I believe there will be a strong deterrent towards souping-up traditional engineering practices and adopting them in domains that they cannot really handle.

To make the discussion concrete, let us assume that the autonomous car made by Google is going to be sold as a product like, say, a mobile phone. A shipping company can now buy a fleet of such cars, and send them driverless to delivery parcels at the doors of their customers. Technologically this is feasible today, but we will not see this happening any time soon. The reasons are several, but let me mention a few that I consider important:

  • Ethical. Do we really want to empower a machine to a level that it can autonomously decide, e.g., to run-down a pedestrian? Also an autopilot can fail and kill people, but it was clearly not the “will” of the autopilot to do so.
  • Technical. Can we ensure that without an operator ready to press “emergency stop” the car will never behave in an hazardous way? Planes are operated by very-well trained officials, who can always “go manual” if anything goes wrong in the autopilot.
  • Economical. Even if we had the technology to insure safety, would it be cost-effective? Right now, airplane makers can ensure, through standard engineering practices (e.g., redundancy, stress testing, preventive maintenance) that autopilots are cost effective within their operational requirements.
  • Legal. Whose is the liability if an autonomous car causes an accident? The issue becomes especially bothersome if there are more legal entities involved in the product, e.g., the car maker, the system integrator, the “AI” supplier, the operator (assuming on-field configuration is available).

Going back to my initial statement about the importance of formal methods in robots, I expect that a crisp understanding of the phenomena governing the “brains” of an autonomous robot and its interactions with its “body” and the surrounding environment, are of paramount importance if we wish to address any of the above issues in a satisfactory way.


Luke: Levitt (1999) argued that as intelligent systems become increasingly autonomous, and operate in increasingly non-deterministic, unknown environments, it will not be possible to model their environment and possible behaviors entirely with tools like control theory:

If robots are to put to more general uses, they will need to operate without human intervention, outdoors, on roads and in normal industrial and residential environments where unpredictable… events routinely occur. It will not be practical, or even safe, to halt robotic actions whenever the robot encounters an unexpected event or ambiguous [percept].

Currently, commercial robots determine their actions mostly by control-theoretic feedback. Control-theoretic algorithms require the possibilities of what can happen in the world be represented in models embodied in software programs that allow the robot to pre-determine an appropriate action response to any task-relevant occurrence of visual events. When robots are used in open, uncontrolled environments, it will not be possible to provide the robot with a priori models of all the objects and dynamical events that might occur.

In order to decide what actions to take in response to un-modeled, unexpected or ambiguously interpreted events in the world, robots will need to augment their processing beyond controlled feedback response, and engage in decision processes.

Thus, he argues, autonomous programs will eventually need to be decision-theoretic.

What do you think of that prediction? In general, what tools and methods do you think will be most relevant for achieving high safety assurances for the highly autonomous systems of the future?


Armando: I agree with Levitt when he says that “control theory” alone cannot deal with strongly autonomous robots. This is also what motivates the stream of research in Formal Methods dealing with Robotics and Automation (see, e.g., Workshop on Formal Methods for Robotics and Automation for a recent event on the subject).

However, it might be the case that “engaging in decision theoretic processes” while the robot is performing its tasks raises challenging — to say the least — computational issues. Indeed, it is well known that even simple factual, i.e., Boolean, reasoning may require exponential time to be carried out. Since we expect reality to be much more complex than simple 0/1 evidence, reasoning in every “logic” that the robot is endowed with is going to be computationally hard.

Even assuming the availability of powerful heuristics and/or approximations, the problem is still made tough by real-time requirements in the decision process. Indeed, the robot may not take forever to plan its course of actions, and in many cases high variances in response time are not acceptable as well.

At the same time, making sure that a strongly autonomous robot stays safe using off-line verification only, is also probably not going to happen. This is because of computational issues (combinatorial state-space explosion, undecidability of expressive logics to deal with time/space/probabilities), but also because, as in control theory, most of the situations that will arise while the robot is running are hard to frame.

In my opinion, what we might see is a combination of several approaches, including:

  • “Correct by construction” design and implementation
  • Formal Verification and testing at the design and implementation stage
  • Lightweight reasoning while the robot is executing

Reasoning during execution is probably going to be “self-reflective” by adopting e.g., efficient monitors, that can be automatically synthesized during the design process, and then deployed on the robot to ensure that certain safety conditions are met at all the times without the burden of “deep” logical reasoning.

Finally, actual robots will probably need to be “compliant” to humans, not only in their cognitive/reasoning abilities but also in their structure (see, e.g., recent development in “Soft Robotics”). A “compliant” structure will improve security and/or will make resoning about security of the robot easier, since some “intrinsic safety” can be achieved by the use of innovative materials and electronics. In other words, we should not worry just about a “safe mind”, but we must retain the principle that a safer “mind” requires also a safer “body”.


Luke: Sometimes I like to distinguish “bottom up” vs. “top down” approaches to the challenge of long-term AI safety — that is, to the challenge of making sure that we can continue to achieve confident safety guarantees for AI systems as they grow in autonomy and capability.

“Bottom up” approaches to long-term AI safety build incrementally upon current methods for achieving confident safety guarantees about today’s limited systems. “Top down” approaches begin by trying to imagine the capabilities of the AI systems we’ll have several decades from now, and use those speculations to guide research today.

I would describe your response to my previous question as a “bottom up” perspective on long-term AI safety, and I think it’s the kind of perspective most people working on AI safety would give. To illustrate what I mean by “top down” approaches to long-term AI safety, I’ll provide three examples: (1) Lampson (1979)‘s theoretical description of the “confinement problem,” which motivated the creation of the field of covert channel communication. (2) Orseau & Ring (2012)‘s work on the agent-environment boundary typically used as a simplifying assumption in reinforcement learning and other AI models, which I hope will inspire future research into more “naturalistic” AI models that will avoid problematic “Cartesian” mistakes in agent reasoning. (3) Weaver (2013)‘s discussion of paradoxes in rational agency, which I hope will motivate later research into how future self-modifying agent-based AIs can reason coherently about the consequences of modifying their own decision-making code.

Even if you aren’t familiar with these examples already, do you have an opinion on “top down” approaches to long-term AI safety in general, and how they should (or shouldn’t) inform today’s research efforts?


Armando: Indeed, I was not familiar with the references you cited, and I can see that they are definitely much more “visionary” than what I have suggested previously. Since my background is in Engineering, I must acknowledge a bias towards “bottom-up” views as you define them, or simply “reductionist” and “pragmatic” as I would define them.

Your citations reminded my of a paper by Weld and Etzioni. I often quote and cite this paper in my works because, in my opinion, it had the merit to raise the level of concern of the AI community about “unleashing” strongly autonomous agents in the physical world. The contents of the paper, albeit slightly outdated, still can provide some useful guidelines and vocabulary for current research on safe AI (at least they did it for me). While it is not the kind of paper I would conceive, I believe this kind of speculation is useful to prepare the ground for more tangible achievements.

On the other hand, if we are to chase the current push for stronger autonomy in robots acting in the physical world, then we should probably be very pragmatic about what we can do with current technology in order to ensure safety at all times. While it is possible that future advances in technology will make a “decision theoretic” agent possible within the resource constraints of a mobile robot, it is current technology which is enabling, e.g., autonomous cars to drive around. Therefore, if we want this kind of robots to become products in the next few years, I believe we should focus research on making them trustworthy by leveraging current scientific and technological developments.


Luke: Are there any other readings you’d recommend, besides Weld & Etzioni (1994), on the subject of long-term AI safety?


Armando: Weld and Etzioni returned on the subject of safe AI in their paper “Intelligent agents on the internet: Fact, fiction, and forecast

A similar theme is touched upon by Eichmann in his “Ethical web agents“.

I have found out about the “Call to Arms” by Etzioni and Weld in a series of papers by Gordon, of which I would deem “Asimovian Adaptive Agents” as the most relevant.

I had just stumbled upon (but still did not have a chance to read thoroughly) a relatively recent book by Wilks “Close Engagements with Artificial Companions: Key Social, Psychological, Ethical and Design Issues” (see a review here). The book seems controversial at times, but it does contain many different views about artificial companions and it does raise some issues that we should probably keep well in mind when designing physically situated agents.

Finally, to have an idea about concrete projects going on in the field of robotics where safety is considered both a prominent value and an essential challenge, this site lists many interesting recent developments and ideas for further research.


Luke: Thanks, Armando!