At CES 2019 in Las Vegas this week, Navin Shenoy – Intel Data Center Group executive vice president, announced the Intel Nervana Neural Network Processor for Inference, which will go into production this year. Back in 2016, Intel acquired Nervana, a 48-person AI SAAS startup from San Diego, for (reportedly) something like $408 million. Nervana was a software company at the time, providing a full-stack software-as-a-service platform called Nervana Cloud, based on an open-source framework called Neon (that rivaled Caffe, Tensorflow, and others), enabling the development of custom deep learning applications.
Nervana was also reportedly working on the development of a custom chip for neural network processing at the time, which they claimed would outperform GPUs as AI accelerators by a factor of at least ten. Of course, developing a custom processor is a tall order for a small software team, but that ambition was made dramatically more realistic with their acquisition by Intel. Now, Intel is announcing the delivery of the first part of that vision – the Intel Nervana Neural Network Processor for Inference, or NNP-I. The company also announced that they will have a Neural Network Processor for Training, codenamed “Spring Crest,” available later this year. Nervana Engine was originally being developed on 28nm technology, with plans to move to 14nm before launch. Intel hasn’t said at this point, but we infer that the devices delivered this year will be on Intel’s 14nm FinFET technology, probably moving to 10nm sometime in the future.
Intel says Nervana is being developed in conjunction with Facebook, which is an interesting note because Facebook is the “super seven” data center company whose acceleration strategy has been most opaque. Google has developed their own processor, and Microsoft, Amazon/AWS and others have invested heavily in FPGA-based acceleration. Having Facebook as a development partner should give Nervana solid end-to-end credentials when it begins shipping broadly later this year.
Neural network training and inference are extremely compute-intensive, involving matrix multiplication of tensors and convolution. For years, graphics processing units (GPUs) have been the go-to solution for AI training acceleration, and FPGAs have worked hard to carve out a competitive niche in the inferencing game. As off-the-shelf chips go, GPUs are well suited to AI tasks, taking advantage of their highly parallel vector and linear algebra capabilities. But, because GPUs aren’t designed specifically for AI tasks, they still leave a lot on the table when it comes to architectural optimization for AI and deep learning.
Similarly, FPGAs can deliver incredible parallelism and performance on a miserly power budget for inferencing tasks which (unlike training) can be accomplished with reduced-precision fixed-point computations. Large data center and cloud installations have begun to take advantage of clusters of FPGAs for accelerating inferencing tasks, with remarkable results in terms of throughput, latency, and computational power efficiency. However, similar to GPUs, FPGAs were not designed specifically for AI, and there is a lot of hardware on a typical FPGA that is not involved in AI operations, and a number of architectural assumptions that make FPGAs great as general-purpose devices but suboptimal as AI processors.
Nervana came at the problem from their perspective as developers of GPU kernels for deep learning, which gave them tremendous insight into the limitations of GPUs for AI tasks. The company says that the Nervana engine was designed from a clean slate, discarding the GPU architecture and starting fresh. They analyzed a number of deep neural networks and came up with what they believed to be the best architecture for their key operations. They also came up with a new numerical format – dubbed FlexPoint, that tries to maximize the precision that can be stored within 16 bits.
Because AI computations can be extremely memory intensive, Nervana needed to be able to move a lot of data quickly. The Nervana device includes 32GB of in-package High Bandwidth Memory (HBM) that delivers incredibly high-capacity speed. The company claims 8 terabits per second of memory access bandwidth. HBM memories achieve high capacity by die-stacking. A single HBM chip stack can store 8GB of data with a stack of eight individual 1GB memory dies. The Nervana Engine includes four HBM stacks, providing 32GB in-package storage. Intel’s multi-die packaging technology connects the HBM to the array of processing cores. Again, Intel hasn’t said, but we assume this to be done with Intel’s 2.5D Embedded Multi-Die Interconnect Bridge (EMIB) technology (rather than the newly announced FOVEROS 3D packaging.)
The Nervana Engine is composed of an array of “Tensor Processing Cores” surrounded by HBM chiplets, memory interfaces, and high-speed IOs, which are designed to allow many Nervana devices to be combined to provide very large scale network implementations. Intel hasn’t given specific performance or power consumption figures for the new devices except to say that power consumption will be in the “hundreds of watts” – which puts Nervana clearly in the data center (compared with edge-targeted AI devices such as the company’s Movidius and Mobileye offerings).
The device includes six bi-directional high-bandwidth links, which the company says enables chips to be “interconnected within or between chassis in a seamless fashion.” The company says this “enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their models to unprecedented sizes without any decrease in speed.” Multiple devices connected together can act as one large processor.
Nervana seems to be aimed at GPUs’ and FPGAs’ increasing foothold as AI accelerators in the data center. Since Intel has some of the best FPGA technology in the world in their PSG division (formerly Altera), it would appear that the company thinks Nervana brings significant advantages over FPGAs in inferencing, and over GPUs in training. NVIDIA, in particular, has dominated the data center acceleration game for AI training and is obviously directly in Nervana’s crosshairs. It will be interesting to watch what happens as more purpose-build AI devices come on the market to challenge the current crop of general-purpose accelerators filling the gap in AI processing demand.
Davies made some critical reflections on “Deep Learning” slaming leCun … an interesting read is https://www.zdnet.com/article/intels-neuro-guru-slams-deep-learning-its-not-actually-learning/
“Backpropogation doesn’t correlate to the brain,” insists Mike Davies, head of Intel’s neuromorphic computing unit, dismissing one of the key tools of the species of A.I. In vogue today, deep learning. “For that reason, “it’s really an optimizations procedure, it’s not actually learning.”