Twenty-eight nanometers doesn’t give much room for error. When you’re spending the bucks to spin a new chip at that process node, you really, really want to get it right.
Altera has made that challenge even steeper with their new Stratix V FPGA family, attempting a number of new features at the same time as they try to wrestle the bristly next-step-up-the-curve of Moore’s Law to the ground. As we mentioned in our previous article on the company’s 28nm architecture (click here), the keystones of the new offering are super-high-speed 28 Gigabit SerDes transceivers, embedded HardCopy fabric, and partial reconfiguration.
Now, the company has introduced an actual FPGA family based on those principles – Stratix V. The announcement gives us considerably more insight into the company’s 28nm offerings, and it brings a surprise bonus as well – a new DSP block architecture.
Tackling the transceivers first, Altera was bold in making the reach for 28-gig transceivers on the new technology. Historically, analog-intensive blocks – and transceivers in particular – have been very problematic at a new process node. Analog doesn’t scale nearly as nicely as digital, and trying to handle signal integrity and other issues at super-high frequency and on an unproven process have been very problematic for FPGA companies in the past as we mentioned in a feature article in 2006 (click here). Altera has always mitigated these issues with a well-conceived test chip program, fabricating a series of early prototype devices for the purpose of testing out a number of alternative designs instead of going for the “all the marbles,” get-it-right-the-first-try approach to chip design. A single test chip design might have a number of alternative transceiver designs, allowing the company to converge on a working solution much faster – and more deterministically – than if they just chose one and rolled the dice.
Altera had good motivation for pursuing the faster transceivers. In their stated quest for single-chip 400 gig, it’s a lot more resource-efficient to pack all that bandwidth into 4 transceivers each direction than 10 (which would be required with current-generation 10G transceivers). Fewer lanes means less interconnect, fewer connectors, less board and backplane real estate, and lower overall cost.
When we reviewed the company’s architecture back in February, we didn’t quite appreciate the scope and motivation for the embedded hard-copy blocks. Now, the picture is clearer and it’s a compelling feature – although the average user will never take advantage of them directly. Basically, Altera can produce a single-chip design – with almost all the layers of metal fixed, and then add a couple more layers right at the end to customize a big chunk of the fabric with metal-to-metal connections, creating high-performance, low-power, high-density hard IP blocks. Normally, an FPGA company has to guess at the best mix of hard IP to include in an FPGA. While some popular and multi-purpose blocks are easy (memory, DSP/multiplier blocks, etc.), there is a huge variety of more application-/domain-specific IP that would be very beneficial to some market segments and totally useless to others. Hardening this type of IP is a huge compromise. You’ve just raised the cost of the FPGA for all the non-users, while making one or two market segments very happy. As you add more and more hard IP to expand the utility of the design, you end up with a device that is so bloated and expensive, it’s good for almost nobody.
Producing a wide range of FPGAs to cater individually to all these different markets, however, would be prohibitively expensive and would similarly drive up the price.
By embedding a large chunk of HardCopy fabric in their FPGAs, Altera has made it very cheap and easy for them to customize their devices for specific markets with varying mixes of hard IP. This means they can make an FPGA into a programmable semi-ASSP, with domain-specific collections of hard IP – taking advantage of the higher density, higher performance, and lower power consumption of metal-to-metal customization in those blocks, and leaving the FPGA LUT fabric available for the proprietary part of the customers’ designs.
It also means that Altera can easily add hard IP support for new standards after the fact, without having to spin a whole new FPGA line from the start. Since FPGA companies tend to come out with new FPGA families on a two year heartbeat, corresponding to new process nodes becoming available, it’s a problem when some new IP standard comes out shortly after a new family is announced. Normally, the FPGA company would be compelled to wait a full two years or so – until their next-generation product – to release a device with a hardened block for the new standard. However, Altera now has the option to harden IP for new standards using their embedded HardCopy fabric and bring a new chip to market in very short order and at a very low cost.
There is also the technical possibility that the company could allow large customers to spec-out their own FPGAs with a particular mix of hard IP, but Altera has not announced any plans to offer such a service.
The final installment of the 28nm trilogy – partial reconfiguration, still has us in “wait and see” mode. For years, Xilinx has offered partial reconfiguration as an option, and very, very few designers worked up the courage to even attempt it. We even characterized it as the “extreme sport” of FPGA design. During that time, Altera was all too happy to agree with us, claiming that partial reconfiguration was unnecessary and overly-complicated. Now, however, Altera claims that partial reconfiguration in certain application areas will allow designers to get substantially more effective density from their parts, with a minimum of design overhead. Let us know if you try it – we’ll do a whole feature on you! You can even be anonymous if you want.
There was one additional nice surprise in the Altera announcement. The company is announcing a new variable-width DSP block that can be partitioned into various bit widths to accommodate the needs of various applications. The new block also supports floating point – the first FPGA-based DSP block to offer native floating point as far as we know. We’ll have more in-depth coverage of the new DSP block in a future feature.
Now, by the numbers, the new family weighs in with the largest device: around 1.1 million LUT equivalents – the first FPGA to go on record breaking the million-LUT4 barrier. Note that this doesn’t mean there are actually a million LUTs on the chip. Long ago we stopped counting actual logic elements. Now, the LUT4 count (which Altera calls “logic elements” or “LEs” is based on some multiplier on the actual number of wider (6/7 input) LUTs that are actually on the device. Still – a million LUT4 (equivalents) is a LOT of logic. Altera’s density doesn’t end there, however. The company claims there is enough of the embedded HardCopy fabric to equal an entire previous-generation Stratix device. In addition to the fabric, the family boasts up to 53 megabits embedded memory, up to 3680 embedded 18×18 multipliers (again, “equivalent” since the new DSP blocks are actually wider and partitionable), up to 66 12.5 Gbps transceivers in one version, and another version with the new 28 Gbps transceivers.
Altera says the new family will also continue to offer their HardCopy ASIC conversion service – allowing customers that want to cost-reduce to transition to a metal-configured HardCopy ASIC directly from their working FPGA design.
Included in the list of hard IP are: PCIe Gen3, Gen2, Gen1, 40G/100G Ethernet, CPRI/OBSAI, Interlaken, Serial RapidIO® (SRIO) 2.0 and 10 Gigabit Ethernet (GbE) 10GBASE-R. The line also boasts memory interfaces with hardened read/write paths including DDR3, RLDRAM II and QDR II+.
Altera expects software support for Stratix V to begin in Q2 of this year, with the first samples to begin shipping in Q1 2011. Consider this advance warning.