Emil Vassev on Formal Verification

 |   |  Conversations

Emil Vassev portraitDr. Emil Vassev received his M.Sc. in Computer Science (2005) and his Ph.D. in Computer Science (2008) from Concordia University, Montreal, Canada. Currently, he is a research fellow at Lero (the Irish Software Engineering Research Centre) at University of Limerick, Ireland where he is leading the Lero’s participation in the ASCENS FP7 project and the Lero’s joint project with ESA on Autonomous Software Systems Development Approaches. His research focuses on knowledge representation and awareness for self-adaptive systems. A part from the main research, Dr. Vassev’s research interests include engineering autonomic systems, distributed computing, formal methods, cyber-physical systems and software engineering. He has published two books and over 100 internationally peer-reviewed papers. As part of his collaboration with NASA, Vassev has been awarded one patent with another one pending.

Luke Muehlhauser: In “Swarm Technology at NASA: Building Resilient Systems,” you and your co-authors write that:

To increase the survivability of [remote exploration] missions, NASA [uses] principles and techniques that help such systems become more resilient…

…Practice has shown that traditional development methods can’t guarantee software reliability and prevent software failures. Moreover, software developed using formal methods tends to be more reliable.

When talking to AI scientists, I notice that there seem to be at least two “cultures” with regard to system safety. One culture emphasizes the limitations of systems that are amenable to (e.g.) formal methods, and advises that developers use traditional AI software development methods to build a functional system, and try to make it safe near the end of the process. The other culture tends to think that getting strong safety guarantees is generally only possible when a system is designed “from the ground up” with safety in mind. Most machine learning people I speak to seem to belong to the former culture, whereas e.g. Kathleen Fisher and other people working on safety-critical systems seem to belong to the latter culture.

Do you perceive these two cultures within AI? If so, does the second sentence I quoted from your paper above imply that you generally belong to the second culture?


Emil Vassev: Before going to the answer of your question, I need to clarify some of the basics of AI as I perceive it. Basically, AI depends on our ability to efficiently transfer knowledge to software-intensive systems. A computerized machine can be considered as one exhibiting AI when it has the basic capabilities to transfer data into context-relevant information and then that information into conclusions exhibiting knowledge. Going further, I can say that AI is only possible in the presence of artificial awareness, one by which we can transfer knowledge to machines. Artificial awareness entails much more than computerized knowledge, however. It must also incorporate means by which a computerized machine can perceive events and gather data about its external and internal worlds. Therefore, to exhibit awareness, intelligent systems must sense and analyze components as well as the environment in which they operate. Determining the state of each component and its status relative to performance standards, or service-level objectives, is therefore vital for an aware system. Such systems should be able to notice changes, understand their implications, and apply both pattern analysis and pattern recognition to determine normal and abnormal states. In other words, awareness is conceptually a product of representing, processing, and monitoring knowledge. Therefore, AI requires knowledge representation, which can be considered as a formal specification of the “brain” of an AI system. Moreover, to allow for learning, we must consider an open-world model of this “machine brain”.

Now, let’s to go back to your question. I think, both research “cultures” have their niche within AI. Both cultures lean towards the use of open-world modeling of the AI by using formal methods. The difference lies mainly in the importance of the safety requirements, which justifies both approaches. Note that AI is a sort of superior control mechanism that exclusively relies on the functionality of the system to both detect safety hazards and pursue safety procedures. Therefore, in all cases AI is limited by system functionality and systems designed “from the ground up with safety in mind” are presumably designed with explicit safety-related functionality, and thus, their AI is less limited when it comes to safety.

To answer your second question, yes, I consider myself as leaning towards the second “research culture”. For many NASA/ESA systems, safety is an especially important source of requirements. Requirements engineers can express safety requirements as a set of features and procedures that ensure predictable system performance under normal and abnormal conditions. Furthermore, AI engineers might rely on safety requirements to derive special self-* objectives controlling the consequences of unplanned events or accidents. Think about the self-* objectives as AI objectives driving the system in critical situations employing self-adaptive behavior. Safety standards might be a good source of safety requirements and consecutively on safety-related self-* objectives. Such self-* objectives may provide for fault-tolerance behavior, bounding failure probability, and adhering to proven practices and standards. Explicit safety requirements provide a key way to maintain safety-related knowledge within a machine brain of what is important for safety. In typical practice, safety-related AI requirements can be derived by a four-stage process:

  1. Hazard identification – all the hazards exhibited by the system are identified. A hazard might be regarded as a condition – situation, event, etc., that may lead to an accident.
  2. Hazard analysis – possible causes of the system’s hazards are explored and recorded. Essentially, this step identifies all processes, combinations of events, and sequences that can lead from a ‘normal’ or ‘safe’ state to an accident. Success in this step means that we now understand how the system can get to an accident.
  3. Identifying safety capabilities – a key step is to identify the capabilities (functionality) the system needs to have in order to perform its goals and remain safe. It is very likely that some of the capabilities have been already identified by for the purpose of other self-* objectives.
  4. Requirements derivation – once the set of hazards is known, and their causation is understood, engineers can derive safety requirements that either prevent the hazards occurring or mitigate the resulting accidents via self-* objectives.

Luke: As explained here, no system can be 100% safe, in part because its formal safety requirements might fail to capture a hazard that was missed, or inadequately modeled, by the system’s designers. What’s your intuitive sense of how much “extra” confidence in a system’s safety is typically gained by the application of formal methods? For example, you might say that space probe launched with lots of testing but no formal methods would have a 70% chance of completing its mission successfully, whereas a space probe which had been tested and formally verified (and hence, designed in a different way, so as to be amenable to formal verification) would have a 90% chance of completing its mission successfully.

Naturally, such estimates depend greatly on the details of the specific system, and which properties were proved by the formal verification process, but I’m hoping you can say something about your intuitive sense as to how much safety is gained in exchange for the extra work of designing a system so precisely that it can be formally verified.


Emil: Generally speaking, formal methods strive to build software right (and thus, reliable) by eliminating flaws, e.g., requirements flaws. Formal method tools allow comprehensive analysis of requirements and design and eventually near-to-complete exploration of system behavior, including fault conditions. However, good requirements formalization depends mainly on the analytical skills of the requirements engineers along with the proper use of the formal methods in hand. Hence, errors can be introduced when capturing or implementing safety requirements. This is maybe the main reason why, although efficient in terms of capacity of the dedicated analysis tools such as theorem provers and model checkers, formal methods actually do not eliminate the need of testing.

In regards with safety requirements, the application of formal methods can only add on safety. Even if we assume that proper testing can capture all the safety flaws that we may capture with formal verification, with proper use of formal methods we can always improve the quality of requirements and eventually derive more efficient test cases. Moreover, formal methods can be used to create formal specifications, which subsequently can be used for automatic test case generation. Hence, in exchange for the extra work put to formally specify the safety requirements of a system, you get not only the possibility to formally verify and validate these requirements, but also to more efficiently test their implementation.

So, to go back to your question, as you said 100% safety cannot be guaranteed, but when properly used, formal methods can significantly contribute to safety by not replacing, but complementing testing. The quantitative measure of how much safety can be gained with formal methods may be regarded in three aspects:

  1. Formal verification and validation allows for early detection of safety flaws, i.e., before implementation.
  2. High quality of safety requirements improves the design and implementation of these requirements.
  3. Formally specified safety requirements assist in the derivation and generation of efficient test cases.

To be more specific, although it really depends on the complexity of the system in question, my intuition is that these three aspects complement each other and together they may help us build a system with up to 99% safety guarantee.


Luke: You wrote about the difficulty of “good requirements formalization” — that is, the challenge of translating our intuitive ideas about how we want the system to behave into a logical calculus.

What types of requirements are researchers currently able to formalize for use in formally verified systems? What types of intuitively desirable requirements are we as-yet unable to figure out how to formalize?


Emil: Contemporary formal verification techniques (e.g., model checking) rely on state-transition models where objects or entities are specified with states they can be in and associated with functions that are performed to change states or object characteristics. Therefore, basically every system property that can be measured or quantified, or qualified as a function can be formalized for the needs of formal verification. Usually, the traditional types of requirements – functional and non-functional (e.g., data requirements, quality requirements, time constraints, etc.), are used to provide a specific description of functions and characteristics that address the general purpose of the system. The formal verification techniques use the formal specification of such requirements to check desired safety and liveness properties. For example, to specify safety properties of a system, we need to formalize “nothing bad will happen to the system”, which can be done via the formalization of non-desirable system states along with the formalization of behavior that will never lead the system to these states.

Obviously, the formalization of well-defined properties (e.g., with proper states expressed via boundaries, data range, outputs, etc.) is a straightforward task. However, it is not that easy to formalize uncertainty, e.g., liveness properties (something good will eventually happen). Although, probabilistic theories such as fuzzy logic help us formalize “degrees of truth” and deal with approximate conclusions rather with exact ones, the verification tools for fuzzy control systems are not efficient due to the huge spate-explosion problem. Moreover, testing such systems is not efficient as well, simply because, statistical evidence for their correct behavior may be not enough. Hence, any property that requires a progressive evaluation (or partial satisfaction, e.g., soft goals) is difficult and often impossible to be formalized “for use in formally verified systems”.

Other properties that are “intuitively desirable” (especially by AI) but still cannot be formalized today are human behavior and principles, related to cultural differences, ethics, feelings, etc. The problem is that with the formal approaches today we cannot express, for example, emotional bias as a meaningful system state.


Luke: Let’s work with a concrete, if hypothetical, example. Suppose a self-driving car’s navigation software was designed with formal verification in mind, at least for many core components of the software. I presume we are very far from being able to define a formal requirement that matches the intuitive meaning of “Don’t ever be in a state such that the car has injured a pedestrian.”

So, what requirements could we formalize such that, if they were proven to be met by the car’s navigation software, would help us to feel increasingly confident that the car would never injure a pedestrian (while still allowing the car to engage in normal driving behavior)?


Emil: Again, this example should be regarded with the insight that “100 % safety is not possible”, especially when the system in question (e.g., a self-driving car) engages in interaction with a non-deterministic and open-world environment. What we should do though, to maximize the safety guarantee that “the car would never injure a pedestrian” is to determine all the critical situations involving the car itself in close proximity to pedestrians. Then we shall formalize these situations as system and environment states and formalize self-adaptive behavior (e.g., as self-* objectives) driving the car in such situations. For example, a situation could be defined as “all the car’s systems are in operational condition and the car is passing by a school”. To increase safety in this situation, we may formalize a self-adaptive behavior such as “automatically decrease the speed down to 20 mph when getting in close proximity to children or a school”.

Further, we need to specify situations involving close proximity to pedestrians (e.g., crossing pedestrians) and car states emphasizing damages or malfunction of the driving system, e.g., flat tires, malfunctioning stirring wheel, malfunctioning breaks, etc. For example, we may specify a self-adaptive behavior “automatically turn off the engine when the break system is malfunctioning and the car is getting in close proximity to pedestrians.”

Other important situations should involve severe weather conditions introducing hazards on the road, e.g., snow storm, ice, low visibility (formalized as environment states), and the car getting in close proximity to pedestrians. In such situations, formalized self-adaptive behavior should automatically enforce low speed, turning lights on, turning wipers on, etc.


Luke: To what degree, if any, can we demonstrate both deductive guarantees and probabilistic guarantees with today’s formal verification methods?

In a sense, many of the deductive proofs for safety properties in today’s formally verified systems are already “probabilistic” in the sense that the designers have some subjective uncertainty as to whether the formal specification accurately captures the intuitively desirable safety properties, and (less likely) whether there was an error in the proof somewhere.

But the question I’m trying to ask is about explicitly probabilistic guarantees within the proof itself. Though perhaps to do this, we’d first need a solution to the problem of how a formal system or logical calculus can have internally consistent probabilistic beliefs about logical sentences (a la Demski 2012 or Hutter et al. 2012).

If we could generalize formal specification and verification procedures to allow both for deductive guarantees and probabilistic guarantees, perhaps we could verify larger, more complex, and more diverse programs.


Emil: With deductive guarantees a formal verification actually provides true statements that demonstrate that desired safety properties are held. Such a verification process is deterministic and a complete proof is required to guarantee the correctness of safety properties. For example, such a proof can be equipped with deterministic rules and expressed in the classical first-order logic (or in high-order logic if we use Isabelle to run a deductive verification). On the other hand, with the probabilistic guarantees you can accept that a complete proof is not necessary and safety properties can be verified with some degree of uncertainty. Basically, the probabilistic guarantees can be regarded as a result of quantification of uncertainty in both the verification parameters and subsequent predictions. With the Bayesian methods, for example, we quantify our uncertainty as prior distribution of our beliefs we have in the values of certain properties. Moreover, we also embed likelihood in the properties formalization, i.e., how likely is it that we would observe a certain value in particular conditions. You may think about it as the likelihood of holding certain safety properties in specific situations. Then, the probabilistic guarantees assert in a natural way “likely” properties over the possibilities that we envision.

So, to answer your question, unfortunately, deductive guarantees can be provided only for simple safety properties, because their complete proof often unavoidably does not terminate. Although deductive verification may deal with infinite state systems, its automation is limited, which is mainly due to the decidability of the logical reasoning (first-order logic and its extensions such as high-order logic are not decidable, or rather semi-decidable). For example, if we go back to our example with the self-driving car, we may supply all the needed deterministic rules expressing our safety requirements (e.g., speed limit of 20 mph when passing by a school), but the complete proof eventually cannot be achieved, because although the desired conclusion follows from some of the premises, other premises may eventually lead to resolution refutation.

The probabilistic guarantees are not as complete as the deductive ones, but they may deal with more complex properties, e.g., where a larger number of states can be required. Of course, this tradeoff should be considered when evaluating the results of any probabilistic formal verification. So, if we go back to your question about how much confidence in system’s safety is gained with formal methods, probabilistic guarantees bring less confidence than deductive ones, but they may bring some extra confidence to safety properties that cannot be handled otherwise.

It is important to mention that abstraction is the most efficient solution to the state-explosion problem (and respectively, to the problem of deductive guarantees decidability). With abstraction the size of the state space is reduced by aggregating state transitions into coarser-grained state transitions. The technique effectively reduces the total amount of states to be considered but is likely to reduce the granularity of the system to a point where it no longer adequately represents that system. The problem is that although the abstract model (e.g, the formalization of safety properties) is relatively small it should also be precise enough to adequately represent the original system.

Therefore as you said, in order to obtain better results, we shall consider both verification approaches and eventually apply these together. For example, we may formalize with the presumption that both deductive and probabilistic guarantees can be obtained in a sort of compositional verification where we may apply both approaches to different safety properties, and eventually combine the results under the characteristics of global safety invariants. Such invariants can be classified as: goal invariants, behavior invariants, interaction invariants and resource invariants (see my co-authored paper “Verification of Adaptive Systems”).


Luke: Besides mixing probabilistic and deductive verification methods, what other paths forward do you see for improving on our current verification toolset? In particular, what important work in this area seems to be most neglected by funding and researcher hours?


Emil: Well, maybe the most popular technique for formal verification is model checking where the properties are expressed in a temporal logic and the system formalization is turned into a state machine. The model checking methods verify if the desired properties hold in all the reachable states of a system, which is basically a proof that properties will hold during the execution of that system. State explosion is the main issue model checking is facing today. This problem is getting even bigger when it comes to concurrent systems where the number of states is exponential to the number of concurrent processes. So, basically, model checking is an efficient and powerful verification method, but only when applied to finite, yet small state spaces.

Here, to answer your question about what can be done to improve the current verification toolset: on the one side we need to work on fully automated deductive verification based on decidable logics with both temporal and probabilistic features, and on the other side we need to work on improving the model checking ability to handle large state spaces (e.g., symbolic model-checking, probabilistic model checking, etc.).

Important work that seems neglected by the scientific community is the so-called stabilization science, which provides a common approach to studying system stability. In this approach, a system is linearized around its operating point to determine a small-signal linearized model of that operating point. The stability of the system is then determined using linear system stability analysis methods such as Routh-Hurwitz, Root Locus, Bode Plot, and Nyquist Criterion. We may use stabilization science to build small-signal linearized models for the different system components, anticipating that the linearized models of system components will yield a relatively small state space, enabling for their efficient verification (see again my co-authored paper “Verification of Adaptive Systems”). Then we may apply compositional verification techniques to produce an overall system-wide verification.

Other, not that well-developed verification techniques are those related to automatic test-case generation and simulation, which may reduce testing costs and improve the quality of testing. For example, test cases can be generated from a formal specification of a system built with a domain-specific formal language. If combined with code generation and analysis techniques for efficient test-case generation (e.g., change-impact analysis), automatic test-case generation might be used to efficiently test system behavior under simulated conditions (see my co-authored paper “Automated Test Case Generation of Self-Managing Policies for NASA Prototype Missions Developed with ASSL”).

Moreover, high-performance computing can be used for parallelizing simulations, which will allow multiple state space explorations to occur simultaneously, etc.


Luke: Thanks, Emil!