We like our pictures.
We like them to move and we like them to be sharp.
We also want them in our pockets.
Delivering high-quality video to mobile devices is truly a system-wide endeavor. Every link of the chain – from the heavy-iron infrastructure to the wireless access points to the mobile devices themselves require a major upgrade in bandwidth, compute performance, quality of service, and power efficiency. If you take your favorite market forecast and a spreadsheet, you can start with the number of subscribers and do the math back through the system – and come up with some sobering figures for bandwidth at every level. Today’s 40G/100G challenges become tomorrows 400G challenges – and, with specifications and standards guaranteed to be in flux, this spells a big opportunity for programmable logic devices.
With 40nm safely in the pipe, FPGA companies are looking to see what they can do to deliver what the market will want at 28nm. Doing some quick, back-of-the-envelope math, Altera points out that 400G will require about a 4X jump over 100G. (Did everybody follow along on that, or should we publish our formulas in the margin?) Since 100G is the “killer app” for 40nm FPGAs, getting to 400G by 28nm means we need to find about double the usual Moore’s Law bump.
Something has to change.
Altera says that density, power efficiency, and transceiver capacity all have to improve by 4x to make the next generation viable for use in 400G systems. Obviously, doing the usual thing again on the next process node won’t cut it – even if we squint our eyes a bunch. Fundamental changes are needed to the FPGA recipe that will allow us to out-pace Moore’s Law for a generation.
Altera is just announcing several initiatives for their 28nm families (which are not yet announced but which the company claims will “start shipping in 2010”). The goal of these innovations is to bring FPGAs to market that are up to the 400G challenge. Altera’s changes fall into three areas – higher performance transceivers, more flexible hardened-IP, and virtual density improvement via partial reconfiguration.
Starting with the transceivers, the company claims its next-gen FPGAs will come with 28 Gbps transceivers. As we all know, there is no free lunch on transceivers. Moore’s Law doesn’t come waltzing in and make our mostly-analog transceiver technology better for free. In fact, each generation generally sees larger transceivers as we keep wanting to kick up the bandwidth. For 28nm, Altera says they’ll be running their transceivers up to 28 Gbps. This represents almost a threefold increase in performance from their 40nm line. From generation to generation, as transceivers get larger, they start to dominate the geometry and the economics of the chip. The way to bring that dominance down is to increase the speed, thus reducing the need to increase the number of transceivers. Altera points out that 400G using 10 Gbps transceivers requires 80 transceivers, whereas the same throughput with 25 Gbps transceivers only requires 32. Even though the faster transceivers are larger, you gain silicon real-estate by reducing the number required.
The second area of change for Altera at 28nm is an approach to hardened IP blocks that the company is calling “Embedded HardCopy Blocks.” To understand what this means and why it’s important, let’s look at one of the riskiest questions in FPGA architecture the last few years – what should we harden? Taking a chunk of logic and building it essentially as optimized physical hardware (rather than constructing it from programmable LUT fabric) brings that part of your FPGA on par with the most cutting-edge ASIC in terms of speed, power, and area efficiency. If you harden every block in your design, of course, you have an ASIC. For FPGA companies, the decision about what to harden and what to leave for the programmable fabric is a tricky balancing act. Ideally, you only want to harden that which most users of your FPGA will use. Anything else is wasted space, cost, and sometimes power. If hardened multipliers are twice as good as LUT-based ones, but you harden twice as many as you need, you’re right back where you started.
For the past few generations, Xilinx (Altera’s archrival) has attacked this issue by producing a variety of different FPGA families with different mixes of hard IP – aimed at different classes of applications. Some have more DSP/multiplier blocks, some have more RAM, some have more transceivers, and some have hardened processor cores. The penalty for this diversification of the FPGA line, of course, is higher mask costs for the FPGA vendor. Remember how everybody is moving from ASIC to FPGA because ASIC non-recurring engineering costs (NRE) are getting out of control? The same thing is happening to FPGA companies developing each new generation of their products.
Altera’s “Embedded HardCopy Block” solution relies on what we might call “semi-hardened” IP. For the past several years, the company has offered an FPGA-to-ASIC conversion process called HardCopy that replaces the configuration logic in your Altera FPGA design with metal-to-metal connections via a special fabric. By customizing only a couple of layers of metal, you get a lot of the advantages of a full ASIC version of your design, without most of the associated NRE. For their 28nm FPGAs, the company is planning to use their HardCopy technology in a different way – embedded in their FPGAs – to offer various configurations of hardened IP on the same base device. This means they’ll be able to get a lot of the same benefits Xilinx sees with their varying mixes of hardened blocks, but without investing the cost of a full NRE to design a whole new FPGA for each one. In theory, this should allow Altera to produce more variants of their FPGAs at a lower cost, and to add new variants with different mixes of IP much faster. The company also says that it would at least be “theoretically possible” to customize the Embedded HardCopy Blocks on a per-customer basis.
The final change Altera is announcing for 28nm is partial reconfiguration. This is is the feature where Marketing gets to play in the game as well. This is also where those of us who try to find scraps of humor in writing about technology say “thank you, thank you, thank you…” For years, we’ve described partial reconfiguration as the Extreme Sport of FPGA design. Like climbing K2, Base Jumping, and heli-skiing, partial reconfiguration has always been justified by hard-core enthusiasts with the phrase “because it’s there.” Well – maybe that’s not completely fair. The poster child for partial reconfiguration in the past was always applications like SDR where one could conceivably swap out modems while keeping the rest of the FPGA up and working. One could also conceivably get to work faster from the top of a high-rise condo building by BASE jumping from the balcony. Conceivably, that is.
In practice, partial reconfiguration is hard. Getting the circuit to behave well while part of it is off reconfiguring, isolating the area that’s getting reconfigured from the part that’s still operating, making the IOs behave properly during the transition… all of these little issues can become big and complicated. Altera proposes to make partial reconfiguration possible via their well-established LogicLock strategy – making partial reconfiguration act and feel like incremental design. In some of the target applications at 28nm, partial reconfiguration might help to increase the effective density of the LUT fabric by allowing multiple design fragments to be modally swapped in.
On the marketing side, the same folks that brought us the “zero power” FPGA are now pushing the potential power of partial reconfiguration to offer us… wait for it… “infinite density”. Since one could swap in an unlimited number of configurations on the fly (they argue), there is really no limit to the effective density one could realize in the FPGA. Editorially, this is like finding a 20 dollar bill on the sidewalk. In fact, it’s just too easy, so we won’t make fun of it at all. Please send me one Infinite Density Zero Power FPGA development kit and I can be set. Forever.