Last week we examined the legacy of the LUT – the basic building block that defines the very fabric of FPGAs. Surprisingly, however, the primary driver of attributes such as cost, power consumption, and utility in FPGAs is not the fabric itself, but the choice of I/O for the device. You see, while the internal logic keeps shrinking, some of the I/O structures don’t really scale well – things like bonding pads and higher-current transistors don’t track Moore’s Law, so the cost of an individual I/O compared with an amount of core logic keeps increasing with every product generation.
In custom IC design, devices have traditionally been either “pad limited” or “core limited.” I/O pads had to be lined up along the periphery of the die to facilitate wire bonding. If all of your I/Os lined up along the edge made the die bigger than it needed to be to accommodate your core logic, your design was “pad limited.” If, on the other hand, the core logic left room for the required number of I/Os around the periphery with room to spare (stop chuckling out there, IC designers), your design was “core limited.” The reason for all that laughter at the back of the room is that the idea of “room to spare” was typically a theoretical condition that seldom arose in real-world practice.
Over time, improved bonding techniques allowed multiple rows of I/O to be placed around the device – improving the design options on balancing core with I/O. For devices like FPGAs, however, it never makes sense to waste either core or I/O area. FPGAs have typically been designed to strike a balance between core and I/O that covers the largest cross-section of design needs without wasting silicon real-estate. Xilinx, for example, has multiple versions of their Spartan-3 series with one ring of I/Os on devices designed for more core-intensive applications and two-ring versions for applications that have greater I/O needs.
In recent years, a major innovation has begun to change the tradeoff space between I/O and core logic. Flip chip technology rewrites the rules of IC design when it comes to I/O versus core tradeoffs. Instead of traditional wire bonding to get signals from I/O pads to package pins, flip chip places solder balls directly on the chip pads during the final stages of wafer fabrication. The very cool part is that the die is inverted into the package, bringing the solder balls into contact with connectors in the package (or even sometimes on a circuit board). The solder is melted and the connection is made completely without wire bonding.
The flip chip approach has numerous advantages. First, I/O pads can be placed basically anywhere on the device. This completely eliminates the core versus I/O tradeoff above. It also drastically reduces the capacitance, delay, and signal integrity issues associated with bonding wires. Additionally – on the die, the I/O pads can be located much closer to the internal circuitry that connects to them. The additional routing area, capacitance, and delay usually associated with getting from the middle of the device (or even the opposite side) to the I/O ring is significantly improved. Also – silicon IP can now be developed in an entirely different way. IP that requires both core logic and dedicated I/O can theoretically be designed as a physically contiguous unit. In the past, the I/O portions were on the periphery and the core portions in the middle, and placement and routing software was charged with connecting the two in an acceptable fashion.
In FPGAs, these benefits translate into smoother tradeoff curves between I/O and core resources, more scalable fabric design (because specific types of fabric can be directly paired with required I/O types), and the ability to mix-and-match (at the FPGA vendor level) re-usable sections of devices – allowing for more application-specific diversification of feature mixes. The technology also brings benefits like better signal integrity for SerDes, more rigid construction for hostile environments, and (theory has it) eventually lower manufacturing costs. Today, however, the cost is still higher than for wire-bond packaging – leading to low-cost FPGAs remaining wire-bond while the high-end devices migrate to flip-chip.
Speaking of feature mixes, in the world of FPGAs, figuring out how many I/Os you need per core logic and where to put them is only the first part of the battle. In normal custom IC design, you pick each I/O device based on the requirements of that specific connection. Drive currents, voltages, fanout/fanin capability, tri-stating, clock-recovery, timing performance, and advanced capabilities like LVDS and SerDes can be judiciously added to the design only where and when they’re required. In FPGAs, however, you have to select the mix of features without knowing what the design requires in advance.
Every FPGA architect has to examine large numbers of “typical” customer designs and then strike a balance in the number and types of I/O features included. Initially, it might seem that you would want to construct some kind of super-I/O that is capable of meeting most needs from a single piece of hardware. However, such a “Swiss Army Knife” I/O buffer becomes prohibitively expensive and complex and, like a Swiss Army Knife, does many things, but none very well. Thus designers are faced with sprinkling not only the right mix of I/O features into the base FPGA design, but also placing them in the locations most likely to give good results relative to other IP such as multipliers, memories, processor cores, and other core logic that might need direct outside connections.
SerDes makes the I/O quandary even more difficult for our friends designing FPGAs. With standards like PCIe and gigabit Ethernet becoming more and more pervasive, high-speed serial transceivers are likely to become required components in most FPGAs. Already, we have seen the migration of those mainstream capabilities down from the exotic, highest-of-the-high-end FPGA families like Altera’s Stratix “GX” families and Xilinx’s Virtex “FX” families into low-cost offerings. Lattice Semiconductor actually led the charge of SerDes into low-cost FPGAs about a year ago, and other vendors are now following suit.
When adding SerDes to an FPGA, there are still crucial decisions that can have a dramatic impact on the success or failure of the family. When SerDes first hit the FPGA scene, the trend was to build mega-transceivers that could handle every standard under the sun and – most importantly – supported higher maximum data rates than the competitors’.
Ultimately, this strategy backfires. Designing transceivers that can handle every conceivable standard and speed once again makes you fall victim to Swiss Army Knife Syndrome – Jack of all trades, master of none. This particularly becomes a problem if you have, say, a transceiver that works perfectly for all the mainstream standards, but fails to test and operate properly at some more obscure and less-used data rate that it claims to support. In this case, you have a device that fails testing – even though it would work perfectly for most mainstream applications. Your effective yield goes down and takes profits and customer goodwill along with it.
Now, vendors are much more judicious about what standards they attempt to support with FPGA-based SerDes transceivers. In some cases, hard-wired I/O units support specific standards like PCI Express. In other cases, vendors keep their claims tame – only setting their sights on standards that hit the majority of design needs without requiring the extreme performance that drives up complexity, reduces yields, and increases testing requirements. By segregating their transceivers, vendors can build and sell only what customers are demanding – with less focus on the press-release one-upsmanship that comes from claims about total bandwidth and maximum data rates from a single transceiver.
In the big picture of FPGA feature set and value, it is surprising how much is determined by I/O capabilities instead of core capacity and performance. As we move to smaller geometries, this effect is likely to only increase – gates will get closer and closer to “free” while the pricing curve will be tightly coupled to a device’s I/O capabilities.