Back in May, I discussed CXL and gave some reasons that I thought would lead to its success. (See “The Adventures of CXL Across the 3rd Dimension.”) In that article, I mentioned that my friend Jim Handy, “The Memory Guy,” was writing a report about CXL and applying his own analysis skills to the topic. Well, the report is finally finished. It’s titled “CXL Looks for the Perfect Home,” and Handy has a lot to say about CXL in the report’s 60 pages. I’m going to summarize some of his top-level findings here, but if you really want to dig into the details, you’re going to want the report. (Click here for more information about Handy’s CXL report.)
First, here’s a brief overview of CXL to refresh your memory. CXL or the Compute eXpress Link, is a coherent memory interface protocol, initially layered on top of the immensely successful PCIe standard. CXL is a solution to a set of problems, starting with the limited amount of memory you can attach to a CPU chip. Because processor vendors switched from faster clock rates to multiple on-chip processor cores to increase performance of their CPUs, memory bandwidth requirements for these CPUs have skyrocketed to provide the rivers of instructions and data needed to slake the thirst of these CPUs. To meet these increasing bandwidth requirements, CPU designers started to put more and more DDR memory controllers on the CPU chip, along with the growing number of CPU cores.
However, there’s a problem with this approach: it’s a solution with diminishing returns. Every DDR controller needs more than 200 pins to control one bank of DDR SDRAM. DDR5 SDRAM DIMMs have 288 pins, for example. Eliminate the power and ground pins, and you still need more than 150 control pins per DDR bank. These days, the fastest CPUs can have 8 or 12 DDR5 memory channels, so these ICs are dedicating nearly 2000 pins just to control the attached SDRAM.
CXL solves this problem by creating one or more high-bandwidth memory channels that are not dedicated to a particular type of memory. The attached memory can be DDR4 or DDR5 SDRAM or even NAND Flash memory. By abstracting the memory interface, CXL uncouples the CPU from the DRAM protocol. However, there’s a cost for this abstraction, as there always is. That cost is latency. The latest version of CXL, 3.0, defines a switch fabric that connects all of a system’s CPUs to CXL memory. Depending on its construction, this switch fabric can increase memory bandwidth to the CPU, but at the expense of the latency through the fabric switch.
CXL solves another problem, mostly found in hyperscaler data centers: stranded memory. This problem is associated with how servers are equipped with local memory. Hyperscaler data center architects don’t know what applications the servers in the data center will run and so must equip servers similarly with SDRAM. Applications’ need for memory varies widely. Some applications need relatively little memory, while applications such as video transcoding or AI training and inference need a substantial amount of memory. SDRAM that’s idle on a server running an application with a small memory requirement represents wasted investment. It would be far better if that idle memory could be allocated to another server, and that’s what CXL allows, through a mechanism called memory pooling, which is a way of sharing memory amongst servers.
Hyperscaler data centers are not currently set up to work with pooled memory, and, in fact, a lot of software development will be needed to create the needed memory-pooling infrastructure. I think that’s going to be one driver that finally brings IPUs (infrastructure processing units) to the fore. But that’s a different discussion.
CXL memory is offered on CXL modules that plug into special CXL slots. Currently, Micron, Samsung, and SMART Modular Technologies offer CXL modules. Because of their newness, their specialized nature, and their need for a separate CXL-centric SDRAM controller, CXL memory modules cost significantly more per bit than DDR SDRAM DIMMs. The Memory Guy has a very, er, handy graph that shows where CXL fits in the well-established computing memory hierarchy. (Yes, I used the same pun in the last article to introduce the same graph.) Here’s that graph once again, repeated from my previous article:
The computer memory hierarchy showing the price and performance of CXL memory (the elephant) compared to other, more well-established memory and storage media. Image credit: Objective Analysis
CXL memory is both slower and more expensive than SDRAM, but it permits memory pooling and therefore can reduce or eliminate stranded memory in hyperscaler data centers. In my previous article, I wrote this conclusion:
“The hyperscalers are right; CXL will allow them to reduce the DRAM needs of today’s systems by creating a memory pool that’s easily and dynamically distributed amongst multiple CPUs and servers. I emphasize the word “today” because in the entire 78-year history of electronic digital computers, tomorrow’s systems have always needed more memory than today’s systems because we continue to tackle larger and larger problems. Generative AI (GenAI) is the current poster child for this truism because tomorrow’s GenAI algorithms will need more parameters and will require more memory to store those parameters than today’s algorithms. Consequently, the DRAM makers’ position is also correct and, therefore, I submit that there’s no conundrum, merely different perspectives.”
As you might expect from a 60-page report on the subject, Handy’s “CXL Looks for the Perfect Home” report has a deeper and more nuanced set of conclusions:
1. CXL memory will be adopted for use only in hyperscale data centers and supercomputers over the next four years.
2. CXL memory will not significantly impact the present market for server DRAM connected directly to CPU chips.
3. CXL memory sales into hyperscale data centers will grow to represent only 10 percent of the overall DRAM purchases for data centers over the next four years.
4. Near term, the primary use for CXL memory modules will be for expanding server memory in hyperscaler data centers because of the lack of infrastructure and memory management software to exploit CXL’s more advanced features and advantages.
There’s far more detail in Jim Handy’s report and its conclusions, but you’ll need to get his report to see that detail. Although the CXL standard is already six years old, these are still early days for this new memory protocol. Although Intel and AMD now make server CPUs that support CXL, it seems to me that the industry’s needs have not yet caught up with CXL’s abilities, but I foresee a time when they will.