Aaron Tomb on crowd-sourced formal verification

 |   |  Conversations

Aaron Tomb portrait Aaron Tomb is a Principal Investigator at Galois, where his work includes research, development, and project leadership in the area of automated and semi-automated techniques for analysis of software, including type systems, defect detection tools, formal verification, and more general software development tools based on deep analysis of program semantics. He joined Galois in 2007, and received a Ph.D. in Computer Science from the University of California, Santa Cruz on a technique for identifying inconsistent assumptions in software source code.

Luke Muehlhauser: DARPA’s Crowd Sourced Formal Verification (CSFV) program “aims to investigate whether large numbers of non-experts can perform formal verification faster and more cost-effectively than conventional processes.” To that end, the program created Verigames.com, the home of several free online puzzle games.

Your contribution to Verigames is Stormbound. In a blog post, your colleague Jef Bell explains:

Here’s a high level view of how StormBound works, from the [formal verification] perspective. First we figure out the properties of the system we want to prove…

Based on our knowledge about the common sources of errors we are trying to prevent, we select certain code points in the program source code where we need to check that a property we care about is in fact true…

Unfortunately, in a typical program there are a lot of these code points, and it is not always easy to find the exact property that needs to hold at each point. It turns out this is where StormBound players can help, so bear with me as I explain how…

We run the system a whole bunch of times in a variety of scenarios that are intended to exercise the system as it would be exercised in real-world use. As we run the system all these times, we collect snapshots of the data that is in the computer’s memory, every time the program gets to one of the code points we identified.

This provides us with a large collection of examples of the data in the computer’s memory. Now, we want to identify some things in common about the data at each code point, every time that same code point is reached while running the program. These things in common are in fact the kinds of properties we are looking for. And here’s an important bit: it turns out that computers aren’t always good at finding these “things in common,” but human beings are very adept at finding patterns, especially visual patterns.

This is where the StormBound game comes in. Each game level actually consists of the data that was in memory all the times a particular code point was reached. When you play a level, you are looking for common patterns in that data. The game shows you the data (in what we think is a fun and visually appealing form), and allows you to “experiment” with the data in various ways, to help you to find those common patterns. The spells you cast in StormBound describe the patterns. In the [formal verification] world we call these spells assertions. We combine together the assertions discovered by many players of the same game level, which enables the [formal verification] tools we use to prove the truthfulness of the desired properties at the corresponding code point.

Roughly how many players have played StormBound so far? Have they yet found anything that you expect would have been very difficult for an automated search algorithm to find?


Aaron Tomb: At this point we’ve had thousands of players, who have collectively described over 100,000 patterns (which ultimately become logical assertions about program states). The game that is available for play at the moment is the first of two phases, and allows players to describe a relatively restricted class of assertions. And, in general, the assertions we have gathered during this first phase have been relatively simple, and likely discoverable using automated techniques. However, having the current iteration of the game available to a wide range of players has taught us a lot about what keeps players engaged (or causes them to lose interest), and has also allowed us to develop an understanding of the sorts of results that we need to get from players but that aren’t encouraged (or, in a few cases, possible) with the current interface. We are using this information to design a second version that we expect will yield richer results beyond what automated tools would be likely to produce.


Luke: Makes sense. That’s standard Silicon Valley wisdom: launch a minimum viable product, learn, iterate quickly, etc.

How would you know whether Stormbound, or even CSFV altogether (if you were running it), had succeeded or failed? Presumably there’s a substantial chance that crowd-sourced formal verification assistance can’t really work with anything like current verification techniques and problems?


Aaron: I can’t speak for DARPA about how they plan to measure the success of the program, but I can say something about how we can know when we’ve successfully verified a particular piece of software. One advantage of typical verification techniques is that there’s a fairly clear measure of completion. Verification tools often accept program source code plus some statement of desired properties of the code. The tools will then attempt to prove, usually automatically, that the code does indeed have those properties. If they succeed, there’s a high probability (not 100%, because the tools may have bugs, themselves) that the code has the desired properties.

StormBound is structured to generate annotations for C code that are sent to a sophisticated tool called Frama-C. Those annotations are combined with some built-in definitions that describe the absence of common security vulnerabilities. By using Frama-C, StormBound is already working with state-of-the-art verification techniques. Our target programs are large pieces of open source software critical to internet infrastructure, so we’re also working with real-world problems.

A key question is then: can StormBound generate the necessary annotations to allow verification to succeed? In the first phase, we’ve found that players have come up with some useful annotations, but have missed others that would be necessary for a complete proof. In the second phase, we’ll be updating the game to include significantly more context about the ultimate verification goal, in game-oriented terms, and we expect that this will lead more players toward constructing solutions that give significant verification progress.


Luke: You write that “Our target programs are large pieces of open source software critical to internet infrastructure…”

Do you have any opinion as to whether formal verification would have been likely to catch the Heartbleed bug in OpenSSL?


Aaron: Yes, any sound verification tool capable of proving the absence of undefined behavior in C, such as Frama-C, would have failed to prove that OpenSSL was safe. Depending on the tool being used, that proof failure may or may not help you identify the nature of the bug, but it would at least indicate that some bug of some sort exists, and something about its rough location. John Regehr, a Computer Science professor at the University of Utah, wrote a nice blog post on the topic, which specifically mentions using Frama-C to do the verification.

The key complication is that, in order to prove that such bugs don’t exist, tools like Frama-C require a lot of extra information from the programmer (or from game players, in the case of StormBound), encoded as annotations in the source code. In general, the effort required to create those annotations can exceed the time taken to write the program in the first place, and the process requires familiarity with verification tools, which is not widespread. We hope that StormBound will be able to significantly reduce the necessary effort and expertise, to the point where it becomes practical.


Luke: What’s your current guess as to how general the CSFV of 5-10 years from now will be? How broad a range of verification challenges might they conceivably be applied to, once they are substantially more fully developed than they are in these early days?


Aaron: A lot can happen in 5-10 years, so my answer to this is very speculative, and just reflects my own opinions, not those of DARPA or even the rest of the StormBound team.

In the last decade, there has been significant progress on tools for interactive construction of mathematical proofs that can be reliably checked by computers for correctness. Some very general and robust tools such as ACL2, Coq, Isabelle and PVS have arisen from this research. (Most of them were initially created even earlier than the last decade, but the approach has become significantly more widespread recently.) These tools have been used for significant undertakings in software verification, as well as general mathematics, resulting in successful proofs that would simply be infeasible to do (correctly) by hand. However, these tools are time-consuming to use for proofs about realistic software systems, and require rare expertise.

The design approaches we have been considering for next revision of StormBound borrow significantly from the technical underpinnings of these interactive theorem provers, and the resulting game should, as a result, be quite general. While the tentative design builds on these ideas, it presents them through a very different interface than that used by existing tools, and we hope that will allow people without the same level of mathematical background to prove at least some theorems.

The key unknown, at the moment, is how well players without mathematical background will be able to develop the intuition necessary to make progress. The current design sketches focus heavily on using visual pattern recognition in place of standard mathematical reasoning techniques in many places, but proofs of typical mathematical theorems still tend to rely on deep insight based on years of experience. At the same time, the proofs necessary to verify some basic security properties of software are often much less intricate than those that mathematicians spend their time on.

So, to finally get to the answer, I could see several possible results the next decade. We could have a significantly different interface to interactive theorem provers that may make expert users more efficient. An even more exciting possibility, though, is that we could have a system that allows non-experts to take a stab at proving essentially any mathematical theorem simply by recognizing patterns and applying a few relatively simple rules. In either case, the I expect that the results will be more general than simply proving security properties about software.


Luke: Thanks, Aaron!