Aaaaand… it’s memory time again. I don’t keep up with every release of memory (who could keep up with that without dedicating their lives to nothing but that?), but here and there we have either technology or application angles to new-memory stories. So, in that vein, we address memory in automotive and AI. Yes, two critical keywords in any tech article these days.
Automotive Moves to Graphics
I chatted with Micron Technologies about their latest GDDR6 release. And the name of the game here is bandwidth, starting at 14 Gbps per pin and moving to 16 (with 20 working in the labs now). Memory capacity runs from 8 to 32 Gb.
There are two independent channels to this memory. In fact, you could pretty much think of them as two separate co-packaged dice. Each channel has its own memory, read/write access, and refresh. They can be run mutually asynchronously. If you want to use it as a single memory, you can gang the signals and busses together externally.
One of the things that’s changing with high-performance memory is the size of the payload: it’s shrinking. (How often do you hear about future workloads getting smaller?) So each channel has a 16-bit bus to be more efficient. If you gang the two channels together, then you get the more standard 32-bit bus.
So… OK… memory for graphics… um… where does the promised automotive thing enter the picture? Did we toss the word in here just to come up in more searches? Nope. Turns out that automotive designs – traditional users of LPDDR or plain-ol’ DDR – are needing more bandwidth for ADAS applications and, in particular, high-definition displays for L4/L5 levels of autonomy (in other words, high levels). Micron has worked with several other companies to help put together the circuits needed to create an entire system: Rambus for the PHY, Northwest Logic for the controller, and Avery Design Systems for verification IP.
But, of course, adapting to the functional needs of cars can be a deal with the devil when it comes to operational requirements – including reliability. Like, 7 to 10 years of reliability. According to Micron, this is tough to achieve with other memories, making their GDDR6 a better fit.
Memory for AI and Crypto
Next we look at two other application areas that share one characteristic with automotive. The applications are AI and cryptography, and what they share is the smaller transaction. But they still need super-fast access.
Cadence raised this topic at DAC in a conversation with Marc Greenberg. We didn’t really focus on new products specifically, but rather on developments in system design and how that’s translating into possible future memory solutions (whenever any resulting products materialize).
With AI, you’re storing the weights for a neural-net engine; with cryptography, you’re storing hashes. According to Cadence, designers are looking for novel memory structures to give them higher bandwidth without necessarily delivering higher capacity. HBM2 and GDDR6 are examples of such newer memories that are up for consideration.
The reason for seeking out something new lies in a gap in capacity with the standard memory options available today. Given that these are working-memory tasks, the options are SRAM and DRAM. AI memories tend to need on the order of 10 GB of capacity (plus or minus), which isn’t nothing, but it’s less than DRAM tends to deliver. That said, it’s way more than cache – which has space for a few MB of data – can handle. So there’s this Goldilocks capacity region that these designers are jonesing for.
One thing that you might anticipate would be that SRAM-based cache would draw more power than DRAM. After all, an SRAM bit cell always burns power; as bit cells go, SRAM cells are considered pretty power-hungry. Of course, you get speed in the bargain, but it would be understandable if you thought (as I would have) that SRAM is the higher-power solution.
Not so, according to Cadence. Yes, the SRAM bit cell does draw more power, but it turns out that that’s not what dominates power usage: data movement does. And with cache, you’re moving data some nanometers across a die. With DRAM, you’re going out pins and through wires and into other pins, and the power cost of doing so makes DRAM the overall higher-power solution.
Is that solved with HBM2 and GDDR6? Not clear. GDDR6 power is lower than GDDR5 due to a lower VDD. HBM2 power is lower than HBM for the same reason. And, as far as I can tell, HBM2 runs with less power than GDDR6. But are they meeting the power needs of these non-graphics, smaller-payload applications?
I checked back in with Cadence, and Mr. Greenberg clarified that power isn’t the driving feature here: bandwidth is. The catch is that, as noted, capacity needs are modest. These applications require more memory than can economically be included on-chip, so an off-chip solution is required. HBM2 and GDDR6 fit this space; their relative lower power as compared to alternatives or past generations certainly helps to reduce the overall power of the solution, but it’s not the main story.
Sooo… what’ll it be? HBM2 or GDDR6? Or both? Poking around, HBM2 may have the power advantage, but it would appear to have a significant cost disadvantage. Where bandwidth matters – like, say, gaming (where you see most of the HBM2 discussions), HBM2 can win. Its market has certainly been slower to evolve than some expected, but new offerings suggest that it’s still moving forward.
The DDR franchise, with its LP and G variants, contains more familiar names, so you might expect them to experience easier going. And high pricing is never a great thing in the automotive market. But what about AI, or crypto? Well, it depends on where the system is. In the cloud? In a server locally? Or in a gadget?
Acceptable price, performance, footprint, and power points will depend strongly on where the memory finds itself. AI, in particular, is new enough that it has a lot of settling out to do before we know whether it pervades absolutely everything or remains focused in more limited platforms. So we still have plenty of time before we know exactly what’s going to be required where.
More info:
What do you think of Micron’s and Cadence’s thoughts on memory for automotive, AI, and crypto applications?
Doomed as an approach; it’s a long-term side effect of splitting the Silicon processes for CPUs and memory (DRAM).
The reason they need more bandwidth is that communication is usually a dimension down from storage, i.e. storage is over the area of the chip (2D) but communication is usually just the edge (1D), and if you die-stack you’ll have a volume (3D) vs at best the (bottom) surface (2D) for communication. Every process shrink makes the problem worse.
Also known as the commuting vs computing problem – spending more energy on moving data than actually computing.
Processor-in-memory works a lot better, but most CAD flows don’t support asynchronous design and RTL CPUs are generally too hot too stack, so my money is on these guys –
http://etacompute.com/products/low-power-ip/