We have often discussed the many ramifications of Moore’s Law in these pages. Of course, chips continue to get exponentially cheaper, faster, more capable, and more efficient. Also of course, the fixed costs of making a new chip continue to get exponentially higher. If one combines these two trends, one sees that we must be increasingly careful on what chips we choose to make. Any company setting out to design a new chunk of leading-node silicon these days must be quite certain that they are building something that either works in a wide range of diverse applications or solves a critical problem in a single application with enormous production volume. Otherwise, the amortized fixed costs make the project infeasible.
We chatted at length with Altera CTO Misha Burich recently about the past and future of FPGA technology and the increasing trend toward silicon convergence. As one of the few companies that meets the criteria above and is designing new products on the newest process nodes (28nm and beyond), Altera spends a lot of time and energy making sure they’re building silicon that people actually need and will use. As CTO, a big part of Burich’s job is to look forward in the technological crystal ball and extrapolate current trends into future direction.
Burich segmented the world of logic devices into three categories – General Processors, PLDs, and Application-Specific. He expressed the relationships between these as a contiunuum, with the “flexible” end of the spectrum held down by microprocessors, microcontrollers, and DSPs. The “efficient” devices were on the other end of the scale – with ASSPs and ASICs. In the middle, Burich explains, are programmable logic devices like FPGAs. Choosing a spot on this continuum, therefore, amounts to choosing a tradeoff between flexibility and computational efficiency.
Over the past several years, with Moore’s Law driving down the price of transistors to near-free, we’ve seen an increasing trend toward new chips with many of these traits. Just about every device coming off the fab these days is some kind of SoC – with one or more processing engines such as microprocessors, microcontrollers, or DSPs. At tiny geometries like 28nm, a complete processing subsystem including processor, bus, peripherals, and some memory amounts to a trivial fraction of the total area of a chip. That means that it’s almost silly not to put a hardened, optimized, multi-core processing system of some type on just about every chip that goes out the door.
However, there are an increasing number of applications that require standardized, optimized hardware of other types as well. Functions like H.264 and encryption can technically be accomplished in software, but they are hundreds to thousands of times faster and more power efficient when executed by optimized task-specific hardware. That means more design teams are throwing in a number of these ASSP-like blocks onto their latest-generation chips as well.
So – our typical fancy-pants SoC on a leading-edge process node is likely to have both ends of Burich’s spectrum covered – some processing blocks for highest flexibility, and many hardened functions for maximum performance and power efficiency. Many SoC designs will look strikingly similar – a few ARM cores, some sort of AMBA interconnect protocol hardware, a bunch of peripherals, some hardware accelerators and special-purpose blocks, memory, and IO.
The question becomes – why spend $40M-$100M designing a chip that’s almost exactly like the other guy’s? If there’s a chip that does exactly what you want – you should just buy it, of course. But, what if there’s a chip that does almost exactly what you want? Historically, that would be called an ASSP, and you’d plop an FPGA next to it to customize your design with your own “particulars”. As Burich points out – that’s what people have been doing for years. More recently – the FPGA companies have announced hybrid FPGA/SoC devices that may do what you need with the FPGA and the optimized, high-performance processing subsystem already built on one chip.
Interestingly, this “SoC with FPGA fabric” idea is not new. Several ASIC companies (like IBM for example) have offered FPGA fabric to their customers as a standard cell block. A few customers designed the blocks in. Over time, however, they found that their customers didn’t use the FPGA fabric, and they ended up removing it during subsequent design revisions.
The ASIC-with-FPGA-blocks experience brings up some interesting questions: Is FPGA fabric on a complex SoC useful and practical? What will it be used for? Does the failure of FPGA blocks as standard cells spell doom for the new SoC-FPGA hybrids?
Burich says no. Making practical and efficient use of FPGA fabric is a lot more complex than just throwing some LUTs down on a chip. If your SoC has FPGA fabric on it, you need the ecosystem offered by the FPGA companies to make it sing – tools, IP, and support make all the difference between a bunch of unused LUTs soaking up silicon and coulombs and a practical, flexible fabric that can differentiate and enable your design.
While the obvious use of FPGA fabric in a complex SoC is adding that last bit of differentiation and customization that makes your design different from the pack, Burich points out that there is enormous potential in using programmable logic for compute acceleration in a flexible manner. For years, supercomputer companies have parked big blocks of FPGAs next to conventional processors and offloaded massively parallel compute operations to on-the-fly-designed hardware accelerators in FPGAs. What worked for reconfigurable supercomputers could also work on the chip scale. The challenge is programming. Most software engineers don’t have the time, patience, or skill set to pull out a performance-intensive piece of functionality and custom code a bunch of VHDL or Verilog to create an optimized FPGA-based hardware accelerator.
That’s where languages like OpenCL come in.
OpenCL is a language designed to allow software to be targeted to GPUs, or more generally to “hetereogeneous computing platforms.” Languages like OpenCL try to overcome the problem of parallelism across multiple, heterogeneous processing elements in a standardized way. In theory, some of those processing elements could be custom processors and datapaths implemented in FPGA fabric. Altera has reportedly been working on an OpenCL solution for a while now. Such a solution would facilitate the critical software/hardware partitioning and tradeoff in complex systems that could make some truly spectacular things possible on a single chip – without breaking the bank on power budget.
Another thing that would make some spectacular things possible on a single device is the current trend toward heterogeneous 3D architectures, Burich explains. While we’re building our ideal “do everything” device here – with processor, memory, custom blocks, FPGA fabric, and more – we start to run into a problem with monolithic ICs. That problem is that different process technologies are best for different parts of our system. Processors don’t like to be made with the same process as memory or FPGA fabric. IOs and analog blocks have different process requirements as well. If we try to make a huge SoC with all of these parts on a single, monolithic die, we have to choose a process that is probably sub-optimal for everything. However, if we take advantage of the ability to combine heterogeneous die on a single silicon interposer or with stacked die and through-silicon-vias, we gain the ability to build an SoC where each part of the system is made with the best possible process technology for that function.
By stacking all these elements together in a single package, we could dramatically increase the bandwidth and number of connections between subsystems and similarly dramatically reduce the amount of power required to move all those signals between blocks. It’s a win-win scenario. A heterogeneous 3D SoC with a high-performance processing subsystem, FPGA fabric, analog, ample memory, and optimized special-purpose blocks could do amazing things in a very small footprint – with a shockingly small power budget.
Getting to that future vision, however, requires a major rework of today’s semiconductor ecosystem – and the assumptions that go with it. With all these dies being integrated into a single 3D package, standards will need to be established so dies from various suppliers will play nicely together. Then, the big question will be “who is the integrator?” because the integrator will hold enormous economic advantages in taking the final device to market.
Burich’s vision seems on track with trends we are already seeing in both the market and the technology. Of course, Burich sees FPGAs and the companies who make them playing major roles as developers, integrators, and marketers of these future devices. That’s a natural assumption for the visionary CTO-types at any company – to paint themselves and their own industry segment into the picture. If Burich is right about the need for programmable hardware – both for system customization and for hardware/software co-design and compute acceleration – in tomorrow’s SoCs, then the future should look a lot like what he sketched.
We probably won’t have to wait long to find out.
There seems to be widespread agreement that we are headed into a period of silicon convergence. But, who will produce that converged silicon? Will it contain programmable logic fabric? What do you think?
Wow, great article Kevin. Sorry for the late post. In addition to the punch line, “who is the integrator”, I can’t help but ask, where is EDA in this equation?
Intel’s acquisition of Altera answers the “who is the integrator” question.