feature article
Subscribe Now

Virtex-5 is Alive

The High End Gets Higher

Innovation is a hot topic these days. We ourselves have billed innovation as “the fuel that powers technological progress” in our industry. You’d think, with the industry’s first 65nm FPGA family rolling into production, we’d be saluting another salvo of innovation and inspiration from the world’s largest FPGA company. We are, of course, but even more than that, we’re chronicling the execution of an equally important factor in high-tech success – learning. In their introduction of V5, Xilinx has shown us all that they’re a company that learns – learns from their successes, learns from their mistakes, and learns from their competitors. All this learning has manifested itself in the technology of Virtex-5, the marketing of Virtex-5, and the introduction of Virtex-5.

First, the expected. Virtex-5 is Xilinx’s not-too-long awaited (it’s here sooner than we expected) 65nm sequel to their current 90nm Virtex-4 flagship FPGA family. As Virtex-4 passes that flag to the next generation, we’ll see the things one normally expects in such a transition. It turns out that Mr. Moore’s string of over four decades of precise prognostication is not over yet. Virtex-5 has more logic (a lot of it), more speed, uses less power, and will cost less than previous generations of FPGAs. How much better in all these critical areas? By the numbers – 65%, 30%, 35%, and 45%. There. We’re all done now, right? No complaints of burying the headline this time. We’ve spilled the beans in paragraph two and there’s really no need to read on unless you want some insight beyond the marketing-driven myopia of press release perfection.

Still with us? That “[more]” link is coming up fast, so you’ll have to make a decision. Do you want to know about the new underlying logic structure, the architectural changes for shorter, more predictable routing delays, the additional family member, and what exactly the word “unveils” means this time? You’ll have to bite the bullet and click to get the answers. Have we learned from the TV cliffhangers? Are we selling out to our own marketing department? Are our editors now being paid by the word?

Let’s get the important stuff out of the way first. We’re really happy that the new family is called Virtex-5. We were worried. We had Virtex, Virtex-II (no surprises so far) then Virtex-II Pro (Ok, strange – were the other families all for amateurs?), then, whoops, Virtex-4. What happened to Virtex-3? Speculation ran rampant. Were we doubling the number each time? Would the new family be Virtex-8? Would we double back and pick up Virtex-III because we forgot it the first time? Would we switch back to Roman numerals from Arabic and get “VV”? Would we drop back to the “Pro” way of thinking and get, perhaps, Virtex-4 “Executive Edition?”

For those who have become jaded and cynical from previous FPGA family announcements, we should cover what “unveils” means in the context of Virtex-5. In the past, the word “unveiled” could represent anything from an early disclosure of a planned architecture for a family that would be shipping in a couple of years to an almost-after-the-fact admission that a new technology had been in early customers’ hands for months already. In the case of Virtex-5, we’re happily closer to the latter. The LX version of the Virtex-5 family is shipping now and has been in early customers’ hands for at least long enough for a couple of them to catch a flight to the announcement party, recounting their early experiences and showing off sample boards with the new devices convincingly soldered into place – LEDs blazing.

Starting at the process level, we’ve now dropped to 1.0V core from 90nm’s 1.2V. We’re up to 12 layers of metal – 11 copper and 1 aluminum. If you’re using a previous generation FPGA and feel that those 10 or 11 layers of metal just weren’t getting the job done, you can rest easy now that layer number 12 has arrived. Also, after claiming low-K “wasn’t necessary” at 90nm, Xilinx has jumped on the low-K dielectric bandwagon. Moore’s Law can be a harsh mistress, it seems. One process node’s luxury item becomes the next one’s standard feature.

Also new is a “nickel-silicide self-aligned gate structure that lowers the gate resistance and minimizes some of the manufacturing margins that might otherwise detract from the transistor performance.” Hey, that sounds like something we would write. Virtex-5 also uses strained silicon to “deliver faster performance without the need to physically reduce device dimensions, thereby avoiding the excessive leakage currents traditionally seen with ultra-small features.” Cool.

Virtex-5 is the second Xilinx generation that has taken advantage of triple-oxide technology. Since thinner gate oxide delivers faster switching, you want the thinnest gate oxide on transistors that are in the performance path of the device. However, thin gate oxides are also associated with high leakage current. Since the majority of the transistors in an FPGA are not used in performance-critical roles, there are two thicker oxides that are used on the other, less performance-critical transistors to reduce overall leakage current. The result is a device that delivers performance without taking the leakage current hit on most of its transistors.

One place where Xilinx takes a distinctively different tack from arch-rival Altera is in their foundry strategy. Xilinx uses multiple foundries – today Toshiba and UMC — claiming that multiple foundries give a hedge against process problems and increase volume production capability. Altera, on the other hand, claims that working with a single fab – TSMC — gives them an edge because they can focus their engineering efforts on the capabilities and characteristics of a single supplier, allowing them to converge faster on a high-yield, high-performance design with each process generation. Both arguments make sense.

Ironically, based on the last two generations’ reality, neither company’s strategy has worked that way. At 90nm Xilinx had delivery problems with some of their Virtex-4 FX family despite their dual-foundry strategy. Now, at 65nm, Xilinx is delivering production devices first, despite Altera’s apparent advantage in process convergence from working with a single foundry. Strategy that sounds great in marketing materials doesn’t always deliver in practice.

A new family would be nothing if not for some architectural innovations to throw the design tool engineers into a tailspin. Virtex-5 continues the ASMBL (Advanced Silicon Modular Block) architecture that they introduced in Virtex-4. This architecture helps Xilinx more than you, because its flip-chip-based columnar architecture lets them easily manufacture a range of devices with different mixes of hard-IP features (such as DSP blocks, memory, processors, and multi-gigabit transceivers for serial I/O) with minimal re-engineering and without falling victim to I/O-to-core-logic constraints in device layout. The advantage to you is that it allows you to buy an FPGA with just the features your design actually needs without paying the price, area, and power penalty of unused hard IP sitting idly on your device.

In the “learning from your competitors” category, Xilinx has joined Altera in the position that four LUT (look up table) inputs is no longer enough. Both vendors have concluded that many logic functions can be more efficiently implemented with wider LUTs and that a flexible structure can allow a wider LUT to function as two smaller LUTs when required. This change has probably come about because of the increased contribution of routing to the overall area-and-delay picture with smaller geometries and because of the increasing trend toward wider functions in typical designs. Xilinx has settled on a 6-input LUT architecture for V5, which they claim gives higher utilization, higher performance, and superior local connections compared with the old architecture. It also should have a positive effect on power reduction by decreasing the number of levels of logic and interconnect requirements.

Xilinx’s new family will boast up to 330K logic cells (LCs). For the uninitiated, let’s take a quick historical side-trip to explain why you won’t find 330,000 of anything on one of these devices. A long time ago, in the really bad old days, FPGA companies like Xilinx quoted the density of their devices in fictitious units called “system gates”. Nobody really knew what a “system gate” was, and the industry liked it that way. However, burned by the temptation to assume that a system gate was equivalent to an ASIC gate (you know, a plain-old 2-input NAND), many system designers walked blindly and painfully into career-limiting project walls when they discovered that system gates were not like that and, in fact, yielded only somewhere in the 1:5 to 1:10 range of equivalent ASIC gates.

The FPGA industry, feeling a little guilty about all the nasty bumps on their customers’ foreheads, decided to pursue a truth-in-datasheet-numbers policy by dropping “system gates” and simply quoting the number of 4-input LUTs in their fabric. Everyone was happy for about twenty minutes, until somebody released the marketing guys from their temporary incarceration. These folks knew that an apples-to-apples number that represented some structure actually on the chip left no creative wiggle room. They went about inventing a new concept: the “equivalent cell” that multiplied the number of actual 4-input LUTs by some semi-arbitrary (>1, of course) factor that they claimed adjusted for inherent efficiencies in carry chains and other logic in the cell. What those actually adjusted for was the difference between their marketing claims and their competitor’s.

With these “equivalent cells,” the average FPGA designer could still put on marketing-sensitive goggles and see the truth. Until today, that is. Since both of the largest FPGA vendors have gone to wider look up tables, it just wouldn’t do to base the new numbers on anything directly related to the new structure. LC counts now reflect some new arbitrary adjustment of the old, arbitrarily-adjusted four-input LUT equivalent count. Confused yet? Yes? Good. That’s exactly the way the industry wants it. It doesn’t matter anyway. With all the hard IP in today’s large FPGA devices, you’ll never come up with a meaningful metric for figuring out how big a device is in any real sense, let alone be able to figure out accurately whose device is biggest. Just shut up, buy more chips, and get over it.

Also addressing the fact that interconnect becomes more important with each process generation, Xilinx has created a new architecture for their routing which they call a “Diagonally Symmetric Interconnect Pattern.” This routing structure improves the predictability of routing delays and reduces the number of “hops” required for the typical connection. The more symmetric pattern makes it easier for some design tools (particularly placers) to predict routing delay in their scoring metrics. Overall, it should give better quality of results (QoR), resulting in increased performance.

On the hard IP front, Xilinx has upped the ante on multiple counts. Virtex-5 comes with up to 14.5Mb of block RAM (a 45% increase over previous generation devices), up to 640 DSP slices (a 25% increase), and up to 1200 SelectIO (a 25% increase). They’ve increased the width of the DSP slice to 25X18 and have added a 550 MHz clock management tile offering both DCM and PLL. They’ve also dropped in a 36Kbit, cascadable dual-port block RAM / FIFO with integrated ECC. Although they’re only giving details on the currently-shipping LX family right now (kudos to Xilinx for learning one of the harder lessons from Virtex-4), Xilinx is announcing that they’re increasing the number of families within Virtex-5 from three to four, with the addition of a new “LXT” family (LX with multi-gigabit transceivers), reflecting demand for an increased variety of hard-IP biases by customers.

Where your device connects to the board, Xilinx is introducing a new “2nd Generation Sparse Chevron” packaging I/O layout that the company claims offers improved noise resistance and simplified PCB layout. The configuration includes a power/ground pair adjacent to every I/O to minimize crosstalk. With pin counts skyrocketing in these latest generation devices, it’s never too soon to come to the rescue of the board designer. Exploding routing complexity and increasing signal integrity concerns have made FPGAs into a major challenge for PCB folk.

In order for us to take advantage of all this new capability, we need tools, of course. Xilinx continues to bolster their already robust ISE tool suite with additional offerings and continuous improvements in capability and features. Simultaneous with the Virtex-5 announcement, a number of third-party EDA suppliers announced support for the new technology as well.

So, now that we have all this new capability, who really needs it? Are the applications that can take advantage of these improved specs simply the bigger, faster versions of the same old FPGA projects, or does this new technology enable new markets that weren’t using FPGAs before? We’re tackling those questions and a few more in an upcoming article, so stay tuned.

Leave a Reply

featured blogs
Nov 22, 2024
We're providing every session and keynote from Works With 2024 on-demand. It's the only place wireless IoT developers can access hands-on training for free....
Nov 22, 2024
I just saw a video on YouTube'”it's a few very funny minutes from a show by an engineer who transitioned into being a comedian...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

MCX Enablement: MCUXpresso Ecosystem
In this episode of Chalk Talk, Kyle Dando from NXP and Amelia Dalton discuss the multitude of benefits of the NXP’s MCUXpresso Ecosystem. They also explore how the Visual Studio Code, Application Code Hub and improved software delivery enhance MCX microcontroller development and how you can get started using the MCUXpresso ecosystem for your  next design. 
Nov 6, 2024
20,583 views