Hadi Esmaeilzadeh on Dark Silicon

 |   |  Conversations

Hadi Esmaeilzadeh recently joined the School of Computer Science at the Georgia Institute of Technology as assistant professor. He is the first holder of the Catherine M. and James E. Allchin Early Career Professorship. Hadi directs the Alternative Computing Technologies (ACT) Lab, where he and his students are working on developing new technologies and cross-stack solutions to improve the performance and energy efficiency of computer systems for emerging applications. Hadi received his Ph.D. from the Department of Computer Science and Engineering at University of Washington. He has a Master’s degree in Computer Science from The University of Texas at Austin, and a Master’s degree in Electrical and Computer Engineering from University of Tehran. Hadi’s research has been recognized by three Communications of the ACM Research Highlights and three IEEE Micro Top Picks. Hadi’s work on dark silicon has also been profiled in New York Times.

Luke Muehlhauser: Could you please explain for our readers what “dark silicon” is, and why it poses a threat to the historical exponential trend in computing performance growth?

Hadi Esmaeilzadeh: I would like to answer your question with a question. What is the difference between the computing industry and the commodity industries like the paper towel industry?

The main difference is that computing industry is an industry of new possibilities while the paper towel industry is an industry of replacement. You buy paper towels because you run out of them; but you buy new computing products because they get better.

And, it is not just the computers that are improving; it is the offered services and experiences that consistently improve. Can you even imagine running out of Microsoft Windows?

One of the primary drivers of this economic model is the exponential reduction in the cost of performing general-purpose computing. While in 1971, at the dawn of microprocessors, the price of 1 MIPS (Million Instruction Per Second) was roughly $5,000, it today is about 4¢. This is an exponential reduction in the cost of raw material for computing. This continuous and exponential reduction in cost has formed the basis of computing industry’s economy in the past four decades.

Two primary enabling factors of this economic model are:

  1. Moore’s Law: The consistent and exponential improvement in transistor fabrication technology that happens every 18 months.
  2. The continuous improvements to the general-purpose processors’ architecture that leverage the transistor-level improvements.

Moore’s Law has been a fundamental driver of computing for more than four decades. Over the past 40 years, every 18 months, the transistor manufacturing facilities have been able to develop a new technology generation that doubles the number of transistors on a single monolithic chip. However, doubling the number of transistors does not provide any benefits by itself. The computer architecture industry harvests these transistors and designs general-purpose processors that make these tiny switches available to the rest of computing community. By building general-purpose processors, the computer architecture community provides a link with mechanisms and abstractions that make these devices accessible to compilers, programming languages, system designers, and application developers. To this end, general-purpose processors enable the computing industry to commodify computing and make it pervasively present everywhere.

The computer architecture community has also harvested the exponentially increasing number of transistors to deliver almost the same rate of improvement in performance of general-purpose processors. This consistent improvement in performance has proportionally reduced the cost of computing that in turn enabled application and system developers to consistently offer new possibilities.

The ability to consistently provide new possibilities has historically paid off the huge cost of developing new process technologies for transistor fabrication. This self-sustaining loop has preserved the economic model of our industry over the course of the past four decades. Nonetheless, there are fundamental challenges that are associated with developing new process technologies and integrating exponentially increasing number of transistors on a single chip.

One of the main challenges of doubling the number of transistors on the chip is powering them without melting the chip and incurring excessively expensive cooling costs. Even though the number of transistors on the chip has exponentially increased since 1971 (the time first microprocessors were introduced), the chip power has merely increased very modestly and has plateaued in recent years.

Robert Dennard formulated how the new transistor fabrication process technology can provide such physical properties. In fact, Dennard’s theory of scaling is the main force behind Moore’s Law. Dennard’s scaling theory showed how to reduce the dimensions and the electrical characteristics of a transistor proportionally to enable successive shrinks that simultaneously improved density, speed, and energy efficiency. According to Dennard’s theory with a scaling ratio of 1/√2, the transistor count doubles (Moore’s Law), frequency increases by 40%, and the total chip power stays the same from one generation of process technology to the next on a fixed chip area. That is, the power per transistor will decrease by the same rate transistor area shrinks from one technology generation to the next.

With the end of Dennard scaling in the mid 2000s, process technology scaling can sustain doubling the transistor count every generation, but with significantly less improvement in transistor switching speed and power efficiency. This disparity will translate to an increase in chip power if the fraction of active transistors is not reduced from one technology generation to the next.

One option to avoid increases in the chip power consumption was to not increase the clock frequency or even lower it. The shift to multicore architectures was partly a response to the end of Dennard scaling. When developing a new process technology, if the rate that the power of the transistors scales is less than the rate the area of the transistors shrinks, it might not be possible to turn on and utilize all the transistors that scaling provides. Thus, we define dark silicon as follows:

Dark silicon is the fraction of chip that needs to be powered off at all times due to power constraints.

The low utility of this dark silicon poses a great challenge for the entire computing community. If we cannot sufficiently utilize the transistors that developing costly new process technologies provide, how can we justify their development cost? If we cannot utilize the transistors to improve the performance of the general-purpose processors and reduce the cost of computing, how can we avoid becoming an industry of simple replacement?

When the computing industry was under the reign of Dennard scaling, computer architects harvested new transistors to build higher frequency single-core microprocessors and equip them with more capabilities. For example, as technology scaled, the processors were packing better branch predictors, wider pipelines, larger caches, etc. These techniques were applying superlinear complexity-power tradeoffs to harvest instruction-level parallelism (ILP) and improve single core performance. However, the failure of Dennard scaling created a power density problem. The power density problem in turn broke many of these techniques that were used to improve the performance of single-core processors. The industry raced down the path of building multicore processors.

The multicore era started at 2004, when the major consumer processor vendor (Intel) cancelled its next generation single-core microarchitecture, Prescott, and gave up on focusing exclusively on single-thread performance switching to multicore, as their performance scaling strategy.

We mark the start of multicore era not with the date of the first multicore part, but with the time multicore processors became the default and main strategy for continued performance improvement.

The basic idea behind designing multicore processors was to substitute building more complex/capable single-core processors with building multicore processors that constitute simpler and/or lower frequency cores. It was anticipated that by exploiting parallelism in the applications, we can overcome the trends in the transistor level. The general consensus was that a long-term era of multicore has begun and the general expectation was that by increasing the number of cores, processors will provide benefits that will enable developing many more process fabrication technologies. Many believe that there will be thousands of cores on each single chip.

However, in our dark silicon ISCA paper, we performed an exhaustive and comprehensive quantitative study that showed how the severity of the problem at the transistor level and the post Dennard scaling trends will affect the prospective benefits from multicore processors.

In our paper, we quantitatively question the consensus about multicore scaling. The results show that even with optimistic assumptions, multicore scaling — increasing the number of cores every technology generation — is not a long-term solution and cannot sustain the historical rates of performance growth in the coming years.

The gap between the projected performance of multicore processors and what the microprocessor industry has historically provided is significantly large, 24x. Due to lack of high degree of parallelism and the severe degree of energy inefficiency in the transistor level, adding more cores will not even enable using all the transistors that new process technologies provide.

In less than a decade from now, more than 50% of the chip may be dark. The lack of performance benefits and the lack of ability to utilize all the transistors that new process technologies provide may undermine the economic viability of developing new technologies. We may stop scaling not because of the physical limitations, but because of the economics.

Moore’s Law has effectively worked as a clock and enabled the computing industry to consistently provide new possibilities and periodically may stop or slow down significantly. The entire computing industry may be at the risk of becoming an industry of replacement if new avenues for computing are not discovered.

Luke: How has the computing industry reacted to your analysis? Do some people reject it? Do you think it will be taken into account in the next ITRS reports?

Hadi: I think the best way for me to answer this question is to point out the number of citations for our ISCA paper. Even though we published the paper in summer of 2011, the paper has already been cited more than 200 times. The paper was profiled on NYTimes and was selected as IEEE Micro Top Picks and the Communications of the ACM Research Highlights.

I think there are people in industry and academia who thought the conclusions were too pessimistic. However, I talked to quite a few device physicists and they confirmed the problem at the transistor level is very dire. We have also done some preliminary measurements that show our projections were more optimistic than the reality. I think the results in our paper show the urgency of the issue, and the opportunity for disruptive innovation. I think time will tell us how optimistic or pessimistic our study was.

ITRS is an industry consortium that sets targets and goal for the semiconductor manufacturing. We used ITRS’s projections in our study; however, I am not sure if ITRS can actually use our results.

Luke: You point out in your CACM paper that if your calculations are correct, multicore scaling won’t be able to maintain the historical exponential trend in computations-per-dollar long enough for us to make the switch to radical alternative solutions such as “neuromorphic computing, quantum computing, or bio-integration.” What do you think are the most promising paths by which the semiconductor industry might be able to maintain the historical trend in computations per dollar?

Hadi: I think significant departures from conventional techniques are needed to provide considerable energy efficiency in general-purpose computing. I believe approximate computing and specialization have a lot of potential. There may be other paths forward though.

We have focused on general-purpose approximate computing. What I mean by general-purpose approximate computing is general-purpose computing that relaxes the robust digital abstraction of full accuracy, allowing a degree of error in execution. This might sound a bit odd, but there are many applications for which tolerance to error is inherent to the application. In fact, there is a one billion dollar company that makes profit by making your pictures worse. There are many cyber-physical and embedded systems that take in noisy sensory inputs and perform computations that do not have a unique output. Or, when you are searching on the web, there are multiple acceptable outputs. We are embracing error in the computation.

Conventional techniques in energy-efficient processor design, such as voltage and frequency scaling, navigate a design space defined by the two dimensions of performance and energy, and traditionally trade one for the other. In this proposal, we explore the dimension of error, a third dimension, and trade accuracy of computation for gains in both performance and energy.

In this realm, we have designed an architectural framework from the ISA (Instruction Set Architecture) to the microarchitecture, which conventional processors can use to trade accuracy for efficiency. We have also introduced a new class of accelerators that map a hot code region from a von Neumann model to a neural model, and provides significant performance and efficiency gains. We call this new class of accelerators Neural Processing Units (NPUs). These NPUs can potentially allow us to use analog circuits for general-purpose computing. I am excited about this work because it bridges von Neumann and neural models of computing, which are thought to be alternatives to one another. Our paper on NPUs was selected for IEEE Micro Top Picks and has been recently nominated for CACM Research Highlights.

As for specialization: we try to redefine the abstraction between hardware and software. Currently, the abstraction and the contract between hardware and software is the instruction set architecture (ISA) of the general-purpose processors. However, even though these ISAs provide a high level of programmability, they are not the most efficient way of realizing an application. There is a well-known tension between programmability and efficiency. There are orders of magnitude difference in efficiency between running an application on general-purpose processors and implementing the application with ASICs (application-specific integrated circuits).

Since designing ASICs for the massive base of quickly changing, general-purpose applications is currently infeasible, providing programmable and specialized accelerators is a very important and interesting research direction. Programmable accelerators provide an intermediate point between the efficiency of ASICs and the generality of conventional processors, gaining significant efficiency for restricted domains of applications. GPUs and FPGAs are examples of these specialized accelerators.

Luke: What do your studies of dark silicon suggest, quantitatively?

Hadi: Our results show that without a breakthrough in process technology or microarchitecture design, core count scaling provides much less of a performance gain than conventional wisdom suggests. Under (highly) optimistic scaling assumptions—for parallel workloads—multicore scaling provides a 7.9× (23% per year) over ten years. Under more conservative (realistic) assumptions, multicore scaling provides a total performance gain of 3.7× (14% per year) over ten years, and obviously less when sufficiently parallel workloads are unavailable. Without a breakthrough in process technology or microarchitecture, other directions are needed to continue the historical rate of performance improvement.

Luke: You’ve been careful to say that your predictions will hold unless there is “a breakthrough in process technology or microarchitecture design.” How likely are such breakthroughs, do you think? Labs pretty regularly report preliminary results that “could” lead to breakthroughs in process technology or microarchitecture design in a few years (e.g. see this story), but do you have any sense of how many of these preliminary results are actually promising?

Hadi: I have been careful because I firmly believe in creativity! All these reports are of extreme value and they may very well be the prelude to a breakthrough. However, I have not seen any technology that is ready to replace the large-scale silicon manufacturing that is driving the whole computing industry.

Remember that we are racing against the clock, and technology transfer from a lab that builds a prototype device to a large-scale industry takes considerable time. The results of our study show an urgent need for a shift in focus.

I like to look at our study as a motivation for exploring new avenues and unconventional ways of performing computation. And I feel that we have had that impact in the community. However, timing is crucial!

Luke: Finally, what do you think is the main technology for the post-silicon era?

Hadi: I personally like to see the use of biological nervous tissues for general-purpose computing. Furthermore, using physical properties of devices and building hybrid analog-digital general-purpose computers is extremely enticing.

At the end, I would like to acknowledge my collaborators on the dark silicon project, Emily Blem, Renee St. Amant. Karthikeyan Sankaralingam, and Doug Burger as well as my collaborators on the NPU project, Adrian Sampson, Luis Ceze, and Doug Burger.

Luke: Thanks, Hadi!