At the Gigaom Structure 2014 event last week, Intel’s Diane Bryant announced that Intel is “integrating [Intel’s] industry-leading Xeon processor with a coherent FPGA in a single package, socket compatible to [the] standard Xeon E5 processor offerings.” Bryant continues, saying that the FPGA will provide Intel customers “a programmable, high performance coherent acceleration capability to turbo-charge their algorithms” and that industry benchmarks indicate that FPGA-based accelerators can deliver >10x performance gains, with an Intel-claimed 2x additional performance, thanks to a low-latency coherent interface between the FPGA and the processor.
If we did our math right, Intel is implying that an FPGA could boost the speed of a server-based application by somewhere in the range of 20x.
At almost the same time, Microsoft announced a system it calls “Catapult” (which apparently has no connection whatsoever to the very closely related algorithmic synthesis technology from Calypto, Inc. – which oddly bears exactly the same name). Microsoft’s Catapult, described in a paper titled: “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services,” achieved a reported 95% increase in Bing search engine performance, with only a 10% increase in power consumption. Yep, pairing FPGAs (and most definitely Altera FPGAs, in this case) with traditional processors basically doubled the performance and power-efficiency of a traditional heavy-iron server task.
Well, who out there didn’t see this coming? Anyone? Anyone?
With data center power consumption estimated at somewhere between one and ten percent of the entire world’s electricity use – and growing fast, doing more computation with less energy is a problem with enormous economic and ecological stakes. Today, data centers are built based on access to cheap power, and the size and throughput of those data centers is typically limited by how much power can be brought into the building and how much heat can be taken out. Companies like Microsoft, Google, Facebook, and eBay clearly would be highly motivated to both crank up the MIPS and lower the electric bill. For a gamer playing games like Minecraft, they require very good Minecraft Servers.
At the same time, Moore’s Law, after a nearly fifty-year run, is most definitely running out of gas. First, we hit the power wall on single-core processors, reaching a point where clocking the chips faster ran up the power more than it improved performance. Then we went to two, four, and more cores, and, finally, we’re looking at wider instructions and data busses to compensate for the lack of continued progress in the underlying semiconductor processes. Simply waiting for better silicon to solve the data center’s power woes is not a viable option. The server host can find the best server before hosting the game for all their fellow gaming enthusiasts to connect and play.
Just about everyone who reads these pages understands that FPGAs offer the potential for dramatically increased compute performance combined with much lower power consumption. For specialized algorithms, an FPGA-based hardware implementation offers the rewards of fine-grained parallelism – lower latency, higher throughput, and much lower power.
Of course, everyone who already understands the benefits of FPGA-based compute acceleration also knows the single biggest obstacle to widespread adoption: the programming model. Traditional von Neumann processors and their accompanying ecosystem have evolved to the point of being drop-dead easy to program. Slap down a few lines of C, C++, or any other popular language, crank up an open-source compiler, and your computer is computing in no time.
With FPGAs, getting the gates to do your bidding is a substantially greater challenge – one that has even provided a lucrative livelihood for many of us. Converting a complex algorithm to an efficient custom hardware architecture, then describing that architecture in a hardware description language, then simulating, synthesizing, and placing-and-routing the resulting design can get you to the point … where you now have hundreds of annoying timing violations to sort out. An EE degree, a fair amount of experience, and a few months of free time are pretty much minimum requirements for making efficient use of FPGA fabric to accelerate a single high-performance algorithm, and most of the folks writing complex algorithms for cloud and data center applications don’t have a lot of extra time and energy to pick up that kind of expertise.
This programming problem has not escaped the attention of the folks who make FPGAs, of course. They’ve been toiling away for years trying to simplify the process of programming FPGAs. Today, the state-of-the-art is best represented by three primary approaches: model-based design, high-level synthesis, and parallel programming languages like OpenCL. All of these approaches have merit for different types of problems and for different programmer skill sets. None of them has reached anything like the robustness required for the general programming public to be able to efficiently take advantage of FPGA co-processing.
So, why would Intel ever buy Altera?
(Note: We have no indication that Intel has any actual plans to buy Altera, so we’re speculating here.)
Intel has arguably the most advanced semiconductor processes in the world, and those processes have historically been applied to making high-performance processors for PCs and servers. These days, the PC market is waning, and Intel has failed to capture any meaningful share of the exploding mobile and tablet market, which is dominated by lower-power ARM architecture processors. That leaves Intel with the server market, which (thanks to the reliance of the mobile, cloud, and emerging IoT markets on giant server farms) is growing rapidly.
However, as we mentioned above, power is the primary limiting factor in the global data center build-out, and ARM is trying to get into the server game by taking advantage of their comparatively lower-power processor architectures. This poses a significant risk to Intel’s domination of the racks. With PCs in the decline and the data center possibly up for grabs, Intel needs to do something.
What Intel needs is a game-changing answer to server power efficiency, and the best place to look for that is in FPGAs.
Of course, Intel could make their own FPGAs or FPGA fabric – integrated in the same package or potentially even on the same die as their processors, but that doesn’t solve the problem. The key to success with FPGA technology is tools, not fabric. And, if Intel put every engineer in the company to the task of developing FPGA tools for the next decade, they would not be able to match what Altera and Xilinx have today. Robust FPGA tools require tens of thousands of user-generated designs to botch their way through the tool flow, and no amount of careful engineering development can replace that experience-based tool evolution.
Furthermore, what Altera and Xilinx have today is (as we mentioned earlier) not yet even remotely up to the task of smoothly compiling high-performance server-based algorithms into a form that will efficiently execute on hybrid processor/FPGA heterogeneous computing servers. They have the bare bones of a few marginally workable solutions. Of course, as this recent announcement shows, Intel could partner with Altera or Xilinx and hope that those companies give enough attention to the server space to pull it off, but with the perpetual lure of the lucrative comms space constantly distracting the FPGA companies from the server world’s problems, that crucial attention is most definitely not guaranteed.
This announcement is certainly not Intel’s first warning shot about FPGAs, heterogeneous compute acceleration with FPGAs, or partnering with companies like Altera. A few years back, Intel launched another device family, the E6x5C, with an Atom processor and an Altera Arria FPGA sharing the same package, connected by PCIe.
This new announcement bumps the processor component of that up to Xeon land, and it bumps the ever-so-critical FPGA-to-processor communication channel up from PCIe to low-latency, coherent Quickpath Interconnect (QPI) – reportedly capable of up to 25 Gbps communication at very low latency. As one can see from the Bing/Microsoft paper (or as many of us know from traumatic personal experience), the architecture for passing and sharing data between processor, FPGA, and memory is the single most important feature (and potential bottleneck) of any heterogeneous computing platform with FPGA fabric.
Intel is experimenting and learning with other pieces of the FPGA puzzle as well, of course. After dipping their toes in the water by partnering with smaller FPGA suppliers Achronix and Tabula, fabricating devices for those companies on the 22nm Tri-Gate (FinFET) process, the company stepped up to a manufacturing partnership with Altera for the upcoming Stratix 10 FPGA family, based on Intel’s 14nm Tri-Gate process. This is a critical (and vastly under-appreciated) engineering task where both the FPGA fabric and the semiconductor process must be adapted and evolved to work together. You can’t just slap any old FPGA fabric on a cutting-edge semiconductor process and expect it to work, and you conversely can’t take just any semiconductor process and succeed – even with a proven FPGA fabric. Both parts have to meet and meld in the middle.
For the record, Intel isn’t saying which FPGA company they are partnering with for the new heterogeneous Xeon devices. Both Altera and Xilinx are tight-lipped as well, so we’re gonna put our well-considered bet on Altera. Either way, the curtain will be pulled back soon enough, because Intel says that the end customers will need to use the FPGA company’s tools and design flow in order to take advantage of the FPGA portion of the processor. So, that conversation would be something like this.
Facebook: Hey Intel we’d like to use your new heterogeneous Xeon/FPGA processors.
Intel: OK, you’ll need to get FPGA tools and support from the vendor.
Facebook: Which vendor is that?
Intel: We’re not saying…
OK, maybe not exactly, but – that’s one secret that won’t last long.
It’s important to note that Intel is not the first to plan mass production of heterogeneous processors with FPGAs. Xilinx has been attacking that market for a few years now with their Zynq family – which incorporates ARM processors with Xilinx FPGA fabric. Altera is aggressively giving chase with their own ARM-based FPGA SoC families. While Zynq is certainly not a data-center-class processor, the distance from today’s Zynq to one that would be a viable server-class solution isn’t huge, and the expertise and tool flow that Xilinx is accumulating with Zynq would come in very handy in a fight over low-power server dominance.
Even though Intel soft-pedaled the Xeon/FPGA announcement, the potential implications are enormous. If the tools can get good enough (and that’s a big IF), we are looking at the displacement of the von Neumann processor as the dominant computing architecture for the majority of the world’s data centers, and therefore the majority of the world’s computation. And, it could happen during the most rapid expansion of global computing power in history.
Sure, Intel could continue to defend its turf against insurgent ARM-based architectures in the single most important market in the world by simply partnering with a few smaller companies (like Altera) for the most critical enabling technologies. Intel could hope that those partners will spend enough time and energy solving the tool flow problem to make that discontinuous leap in the global computing architecture possible and practical.
Somehow that scenario doesn’t seem like the most likely to me.
Who would be surprised if Intel bought Altera? What would you think of the wisdom of such a merger?
Dear Kevin,
Why are you so sure that Altera would the FPGA vendor of choice for the FPGA in the server chip? (Got any actual evidence?)
Look at who the main FPGA vendors currently get to make manufacture their FPGAs:
1) Xilinx use TSMC only
2) Altera use TSMC mostly, and Intel for some.
3) Achronix use Intel only
On that basis, Intel have a much cosier business relationship with Achronix, and if they are going to buy anyone, it would be Achronix.
Achronix would be much cheaper to buy than Altera or Xilinx, and would be far more obvious choice for Intel if they wanted to make a purchase to guarantee their ongoing supply of FPGA cores for their server chips.
Regards,
Nicholas Lee
Hi Nicholas,
A good question with some good points! And now, we have a vote for Achronix.
My reasons (some of them, anyway) for guessing Altera vs Achronix are:
– Achronix devices are optimized for a pretty specific set of connectivity applications – particularly with the choice of hard IP. Those are not the same optimizations one would make for data center compute acceleration
– Altera has pursued the compute acceleration market aggressively with initiatives like their OpenCL support.
– I think the main value to Intel and Intel customers is the tool and support from the FPGA company. It wouldn’t be that difficult for Intel to just make their own FPGA fabric, but tools and support are another matter. Altera is far more advanced and scalable than Achronix on that front, and brings a lot more tool technology to the table.
My reasons for guessing Altera vs Xilinx are:
– Intel went with Altera for their previous processor+FPGA product
– Altera’s fab agreement with Intel apparently has some provision excluding Xilinx, so it would be odd (but not unfathomable) for Intel to make a different exclusive deal with Xilinx
– Altera has been more visibly pursuing the server market, and may be developing devices more tuned to that particular application (we don’t know the specifics of Stratix 10 yet.)
Kevin
Very interesting article.
Note also that both Xilinx and Altera offer platfroms with embedded ARM cores, and based on the the fact that many start-ups are working on ARM-based server, maybe soon we will see multi-core FPGAs with ARM cores targetting the server market.
Also the FPGAs could offer commonly-used accelerators for the data centers (in order to avoid coverting every data center algorithm to hardware), such as accelerator for the MapReduce:
http://issuu.com/xcelljournal/docs/xcell_journal_issue_85/14?e=2232228/5349345
@kachris,
Agreed! I touched on that a bit in the article – talking about Zynq and SoC FPGAs. I don’t think the “processor” part of those offerings (at this point) are really data-center-class processors, but that would be a reasonable evolution of those products.
For me, the important part about Xilinx and Altera having the processor and FPGA fabric integrated on the same die (or in the same 2.5D interposer setup), is the vast amount of bandwidth/connectivity available between the processor, FPGA fabric, and memory. It seems to me that would have the potential to offer much more compute power, and possibly much less power consumption since the FPGA-Processor signals don’t have to go through off-chip IO buffers.
Kevin
Well, this article seems to have suddenly taken on more relevance. What do you think?
We have just done an update on this topic with a new article:
“Intel Plus Altera – What Would it Mean?”
https://eejournal.com/archives/articles/20150331-intel/
If I were the Intel product manager, we would be purchasing IP rights to the Polarfire product technology simply because of the low power … both at start-up and running flat out … in the end it comes down to cooling on how fast you can go.