The market for accelerating specialized data center workloads is expected to grow dramatically over the next few years. The acceleration market was estimated to be in the two-billion-dollar range in 2018, and it is expected to grow to a staggering $21 billion by 2023 – around a 50% CAGR. This growth is not unexpected. The world is producing data at an exponentially growing rate. Estimates are that something like 90% of all the world’s data has been generated just in the last couple of years. And, in that same “very recent” time frame, AI technology has evolved to the point where we can do some really useful things with all that data.
Unfortunately, conventional processors have not evolved at anything remotely like that rate. While Moore’s Law has been very, very good to us, the slowdown of Mr. Moore, the collapse of Dennard scaling, and a bit of a leveling-off of the von Neumann architecture progress have conspired to slow the growth in performance – and particularly performance-per-watt – of conventional processors just when we need it most. And, when it comes to AI tasks and similar specialized workloads, von Neumann isn’t really all that great to begin with.
Enter, therefore, the wild and woolly world of data center acceleration. The race to build the fastest, most efficient chunk of silicon for boosting the performance of these algorithms in the data center is rabid these days. Piles of engineering investment – from giant tech corporations to nimble startups – is laser-focused on winning the battle to bridge the gap between what conventional CPUs can deliver and what the world needs, in terms of crunching and convoluting this gargantuan glacier of bits.
With the planet literally drowning in data, and with data centers and computation in general consuming an increasingly staggering proportion of the world’s energy output, there is a very lucrative opportunity for some bright, well-funded engineers to change the world once again. In that domain, there are essentially four categories of competitors – CPUs, GPUs, FPGAs, and custom (ASIC) architectures. Which of these will win, and what role they will play in the data center by 2023, is still remarkably unclear.
Starting with the 900-pound gorilla (800-pound gorillas are so last year), we have Intel. Intel has dominated the data center for quite a long time now, with approximately an infinity-percent market share. Intel is a huge company, and the data center is clearly their most important business. We expect they’ll do anything within reason to protect and nourish that business. They’ve been hammering hard on the CPU side of that equation, boosting the performance of their industry-standard Xeon processors by something like 30x on AI loads by optimizing specifically for those tasks.
This boost in baseline AI performance for Xeon processors has two effects on the market. First, it diminishes the size of the market for the remaining acceleration technologies (GPUs, FPGAs, and ASICs). If your servers already have to include a Xeon anyway, and that Xeon is fast enough to meet your AI needs, you won’t go shopping for an add-on accelerator in the first place. Second, it puts the programming model somewhat into Intel’s court. With Intel controlling the system architecture from the CPU side, they have a firm grip on the hoops you might have to jump through to take productive advantage of a competing technology.
Following after Intel, we have NVidia – who appears to be the dominant player in add-on acceleration at this point. GPUs are not the fastest or most efficient architectures for accelerating AI, but NVidia cleverly got the jump on everyone a few years ago by recognizing that GPUs could turn in some pretty impressive performance by parallelizing massive computing tasks, and they set about making that performance accessible with their CUDA programming environment. Because of CUDA, a bunch of folks developing software that was in need of acceleration could turn in impressive results with little specialized expertise required. NVidia pretty quickly turned data center acceleration and HPC into a multi-billion-dollar business while the rest of the industry was caught off guard.
As we’ve pointed out exhaustively in these pages, FPGAs are superb contenders for the acceleration crown. With the right expertise behind the development of the FPGA configuration and the right interface between the FPGA and the CPU (insert a bunch of footnote asterisks here), FPGAs could seriously whip up on GPUs. Unfortunately, the programming and debugging model for FPGAs seriously lags that for GPUs, so FPGA companies have been working late nights to come up with ways to make FPGAs easier to program for data center workload acceleration.
Market experts believe that FPGAs will grow at the highest GAGR of any of the technologies competing for the acceleration crown. And we don’t have to just take the analysts opinion. Intel spent over $16B a few years ago to acquire Altera – presumably based on the assumption that FPGAs would be a pretty big deal in the data center. Xilinx is even trying to claim that their latest FPGAs are a new category of device: “ACAP.” (They are not, but marketing works in mysterious ways).
When we talk about FPGAs for the data center, there are three realistic contenders: Intel, Xilinx, and Achronix. In the data center, Intel has the home field advantage in a big way. When you are driving the entire architecture of the server and providing most of the high-value chips in just about every server on the planet, you have a lot of influence on how FPGAs can be fit into the picture. You also already have relationships with the OEMs who are building the processors in the first place, so it’s a pretty easy sell to get folks to just check the box to boost their performance. “Sure, I’ll have FPGAs with that order – and a side of onion rings, please.” From that perspective, Intel is pretty much guaranteed to grab a giant share of the FPGA-based acceleration market even if their devices don’t turn out to be the most capable.
The company with probably the most on the line is Xilinx. Xilinx completely upended their long-term strategy in order to go after the acceleration market. They rebranded the company with a “Data Center First” motto and went for a rebrand of their high-end FPGA chips as “Adaptive Compute Acceleration Platforms” in an effort to underscore their commitment to the data center acceleration market. Fortunately, Xilinx is #1 in ACAPs today. Unfortunately, nobody except Xilinx even acknowledges that ACAP is a category. It’s easy to win a race in your own backyard if nobody else is even running.
That nonsense aside, Xilinx has made a highly competitive acceleration FPGA family with their 7nm “Versal ACAP” offering, which is now shipping. While we wait for Versal to be integrated into some data-center-ready acceleration cards, their “ALVEO” accelerator card family, based on their previous-generation UltraScale+ architecture, was just updated with the U50 model – a low-profile PCIe card with an FPGA, HBM2 memory, and PCI gen 4 support. The U50 pretty much mops the floor with GPU-based accelerators, and it can accelerate compute, network and storage workloads all on one board.
Intel also announced a win with FPGA-based accelerator cards this week, with their FPGA PAC D5005 accelerator card running in Hewlett Packard Enterprise (HPE) ProLiant DL380 servers. Intel had previously announced similar socket wins with accelerator cards based on their Arria 10 GX mid-range FPGAs, but this announcement brings the higher-performance Stratix 10 into the picture. Interestingly, Intel’s announcement doesn’t get into the fray of comparing against GPUs or other acceleration platforms. It appears that Intel’s strategy is simply to get as many servers shipped as possible with FPGAs already in them, making a big cut in the opportunity space for companies like Xilinx trying to win with third-party add-on boards and chips.
Achronix is in an interesting place in the market – chewing briskly at the ankles of Intel, NVidia, and Xilinx. The company has just announced their own 7nm FPGA family – optimized for workload acceleration (of course). Having the benefit of announcing later, Achronix appears to have bested the specs of Xilinx’s ACAP devices in several categories (which is a clever strategy when you have the luxury of announcing last). While Achronix is a much smaller company without the resources of a Xilinx or an Intel, they have significant leverage with their eFPGA strategy – essentially making their FPGA technology licensable IP for companies wanting to develop their own ASIC-based accelerators. Now, if you don’t want to pay the kind of margins that the regular FPGA companies charge on chips, you can spin your own using Achronix IP and have exactly the accelerator you need for your specific workload.
It will be interesting to watch this game unfold over the next couple of years. We are now at the point where all of the first generation of new chips designed specifically for acceleration will be shipping. The market strategies of companies like Intel, NVidia, Xilinx, and Achronix are all so significantly different that it’s difficult to predict how the fight will go. Because of their position in the data center, it’s likely that, regardless of who actually wins, Intel will win. The presence of accelerators makes their already-dominant architecture even more useful, and there is considerable inertia in their installed base. As the five-year replacement cycle for servers rolls through its next cycle, we’re likely to continue to see a prevalence of Intel hardware, with some of the other competitors making inroads on the acceleration piece, but with Intel greedily defending that turf as well.
And then, Cray and AMD wins the $600m El Capitan contract for a 1.5 exaflop super computer system, nearly an order of magnitude faster than Oak Ridge National Laboratory’s (ORNL) 150 petaflop system that’s held the top spot for a few years. Some of the press reports are claiming AMD Epyc CPUs and Radeon Instinct GPUs, others are saying TBD.
Clearly logic based acceleration isn’t for all markets yet.
At least in the short term, LLNL, ORNL, and Argonne are three $600M Exascale wins for Cray using traditional CPU/GPU architectures. And AMD made a huge 7nm bet for this one.
http://www.zdnet.com/article/amds-2nd-gen-epyc-processor-aims-to-set-a-new-data-center-standard/
The cool part is that Cray with their Shasta architecture introduced heterogeneous compute nodes so that you can mix and match several different architectures into the same cloud compute solution. Intel might still be able to win some blade space for Scalable Xeon or logic based acceleration if they can make a case for it in the big picture. Ship date on these monster systems is still a ways out.
So that leaves a big Exascale TBD for Intel fans. And it’s always difficult to guess what errata comes out of AMD and Intel fabs, that could be a show stopper. Intel security flaws to be noted.