Michael Carbin is a Ph.D. Candidate in Electrical Engineering and Computer Science at MIT. His interests include the design of programming systems that deliver improved performance and resilience by incorporating approximate computing and self-healing.
His work on program analysis at Stanford University as an undergraduate received an award for Best Computer Science Undergraduate Honors Thesis. As a graduate student, he has received the MIT Lemelson Presidential and Microsoft Research Graduate Fellowships. His recent research on verifying the reliability of programs that execute on unreliable hardware received a best paper award at OOPSLA 2013.
Luke Muehlhauser: In Carbin et al. (2013), you and your co-authors present Rely, a new programming language that “enables developers to reason about the… probability that [a program] produces the correct result when executed on unreliable hardware.” How is Rely different from earlier methods for achieving reliable approximate computing?
Michael Carbin: This is a great question. Building applications that work with unreliable components has been a long-standing goal of the distributed systems community and other communities that have investigated how to build systems that are fault-tolerant. A key goal of a fault tolerant system is to deliver a correct result even in the presence of errors in the system’s constituent components.
This goal stands in contrast to the goal of the unreliable hardware that we have targeted in my work. Specifically, hardware designers are considering new designs that will — purposely — expose components that may silently produce incorrect results with some non-negligible probability. These hardware designers are working in a subfield that is broadly called approximate computing.
The key idea of the approximate computing community is that many large-scale computations (e.g., machine learning, big data analytics, financial analysis, and media processing) have a natural trade-off between the quality of their results and the time and resources required to produce a result. Exploiting this fact, researchers have devised a number of techniques that take an existing application and modify it to trade the quality of its results for increased performance or decreased power consumption.
One example that my group has worked on is simply skipping parts of a computation that we have demonstrated — through testing — can be elided without substantially affecting the overall quality of the application’s result. Another approach is executing portions of an application that are naturally tolerant of errors on these new unreliable hardware systems.
A natural follow-on question to this is, how have developers previously dealt with approximation?
These large-scale applications are naturally approximate because exact solutions are often intractable or perhaps do not even exist (e.g., machine learning). The developers of these applications therefore often start from an exact model of how to compute an accurate result and then use that model as a guide to design a tractable algorithm and a corresponding implementation that returns a more approximate solution. These developers have therefore been manually applying approximations to their algorithms (and their implementations) and reasoning about the accuracy of their algorithms for some time. A prime example of this is the field of numerical analysis and its contributions to scientific computing.
The emerging approximate computing community represents the realization that programming languages, runtime systems, operating systems, and hardware architectures can not only help developers navigate the approximations they need to make when building these applications, but also that these systems can incorporate approximations themselves. So for example, the hardware architecture may itself export unreliable hardware components that an application’s developers can then use as one of their many tools for performing approximation.
Luke: Why did you choose to develop Rely as an imperative rather than functional programming language? My understanding is that functional programming languages are often preferred for applications related to reliability, safety, and security, because they can often be machine-checked for correctness.
Michael: There has historically been a passionate rivalry between the imperative and functional languages groups. Languages that have reached wide-spread usage (e.g., C, C++, Java, php, and Python) have traditionally been imperative whereas functional languages have traditionally appealed to a smaller and more academically inclined group of programmers.
The divide between the imperative and functional mindset also holds true for researchers within the programming languages and compilers research community. Our decision to use an imperative language is largely motivated by the fact that simple imperative languages are accessible to a broad public and research audience.
However, the results we have for imperative languages can be adapted to functional languages as well. This is important because with the popularity of languages like Erlang (WhatsApp) and Scala (Twitter) there has been more public interest in functional languages. As a result, the divide between these imperative and functional camps has started to blur as standard functional languages features have been adopted into mainstream imperative languages (e.g., lambdas C++ and Java). Our research is therefore in a position to adapt to changes in the landscape of programming paradigms.
One important thing to note is that — in principle — reasoning about a program written in two different Turing-complete languages (imperative or functional) is equally as difficult (i.e., undecidable). However, writing a program in a functional language typically better exposes the structure of the computation.
For example, mapping a list of elements to another list of elements in a functional language makes explicit the recursive nature of the computation and the fact that the current head element of the list is disjoint from its tail.
However, mapping a list of elements with a straightforward C implementation (for example) would immediately use pointers and therefore complicate the reasoning required to perform verification. As imperative languages begin to expose better structured programming constructs, the reasoning gap between imperative and functional languages will narrow.
Luke: What are some promising “next steps” for research into methods and tools like Rely that could improve program reliability for e.g. approximate computing and embedded system applications?
Michael: The approximate computing community is just starting to pick up steam, so there are many opportunities going forward.
On the computer hardware side, there are still open questions about what performance/energy gains are possible if we intentionally build hardware that breaks the traditional digital abstraction by silently returning incorrect or approximate results. For example, researchers in the community are still asking, will we see 2x gains or can we hope to see 100x gains?
Another main question about the hardware is, what are the error models for each approximate component — do they fail frequently with arbitrary error or fail infrequently with small bounded error? Or, is there some happy balance in between those two extremes?
On top of the hardware then comes a variety of software concerns. Most software is designed and built around the assumption that hardware is reliable. The research we are doing with Rely is some of the first to propose a programming model and workflow that provides reliability and accuracy guarantees in the presence of these new approximate hardware designs.
However, there are still many challenges. For example, compilers have traditionally relied on the assumption that all instructions and storage regions are equally reliable. However, approximate hardware may result in hardware designs where some operations/storage regions are more reliable than others. Because of this distinction, standard compiler transformations that optimize a program by exchanging one sequence of operations for another sequence of operations may now change the program’s reliability. This new reality will require the community to rethink how we design, build, and reason about compilers to balance both optimization and reliability.
One additional opportunity that the approximate computing community has yet is explore is the fact that an existing piece of software implements some algorithm that may have flexibility itself or may be one of a number of potential algorithms. Going forward, the approximate computing community will need to consider an application’s algorithmic flexibility to realize the broad impact it hopes to achieve.
Specifically, by bringing the algorithm into the picture, the approximate computing community will be able to incorporate the experience and results of the numerical analysis, scientific computing, and theory communities to provide strong guarantees about the accuracy, stability, and convergence of the algorithms that these approximate hardware and software systems will be used to implement.
Luke: Thanks, Michael!
Did you like this post? You may enjoy our other Conversations posts, including: