feature article
Subscribe Now

Jim Handy, The Memory Guy, steps in with a report on CXL’s near-term future

Back in May, I discussed CXL and gave some reasons that I thought would lead to its success. (See “The Adventures of CXL Across the 3rd Dimension.”) In that article, I mentioned that my friend Jim Handy, “The Memory Guy,” was writing a report about CXL and applying his own analysis skills to the topic. Well, the report is finally finished. It’s titled “CXL Looks for the Perfect Home,” and Handy has a lot to say about CXL in the report’s 60 pages. I’m going to summarize some of his top-level findings here, but if you really want to dig into the details, you’re going to want the report. (Click here for more information about Handy’s CXL report.)

First, here’s a brief overview of CXL to refresh your memory. CXL or the Compute eXpress Link, is a coherent memory interface protocol, initially layered on top of the immensely successful PCIe standard. CXL is a solution to a set of problems, starting with the limited amount of memory you can attach to a CPU chip. Because processor vendors switched from faster clock rates to multiple on-chip processor cores to increase performance of their CPUs, memory bandwidth requirements for these CPUs have skyrocketed to provide the rivers of instructions and data needed to slake the thirst of these CPUs. To meet these increasing bandwidth requirements, CPU designers started to put more and more DDR memory controllers on the CPU chip, along with the growing number of CPU cores.

However, there’s a problem with this approach: it’s a solution with diminishing returns. Every DDR controller needs more than 200 pins to control one bank of DDR SDRAM. DDR5 SDRAM DIMMs have 288 pins, for example. Eliminate the power and ground pins, and you still need more than 150 control pins per DDR bank. These days, the fastest CPUs can have 8 or 12 DDR5 memory channels, so these ICs are dedicating nearly 2000 pins just to control the attached SDRAM.

CXL solves this problem by creating one or more high-bandwidth memory channels that are not dedicated to a particular type of memory. The attached memory can be DDR4 or DDR5 SDRAM or even NAND Flash memory. By abstracting the memory interface, CXL uncouples the CPU from the DRAM protocol. However, there’s a cost for this abstraction, as there always is. That cost is latency. The latest version of CXL, 3.0, defines a switch fabric that connects all of a system’s CPUs to CXL memory. Depending on its construction, this switch fabric can increase memory bandwidth to the CPU, but at the expense of the latency through the fabric switch.

CXL solves another problem, mostly found in hyperscaler data centers: stranded memory. This problem is associated with how servers are equipped with local memory. Hyperscaler data center architects don’t know what applications the servers in the data center will run and so must equip servers similarly with SDRAM. Applications’ need for memory varies widely. Some applications need relatively little memory, while applications such as video transcoding or AI training and inference need a substantial amount of memory. SDRAM that’s idle on a server running an application with a small memory requirement represents wasted investment. It would be far better if that idle memory could be allocated to another server, and that’s what CXL allows, through a mechanism called memory pooling, which is a way of sharing memory amongst servers.

Hyperscaler data centers are not currently set up to work with pooled memory, and, in fact, a lot of software development will be needed to create the needed memory-pooling infrastructure. I think that’s going to be one driver that finally brings IPUs (infrastructure processing units) to the fore. But that’s a different discussion.

CXL memory is offered on CXL modules that plug into special CXL slots. Currently, Micron, Samsung, and SMART Modular Technologies offer CXL modules. Because of their newness, their specialized nature, and their need for a separate CXL-centric SDRAM controller, CXL memory modules cost significantly more per bit than DDR SDRAM DIMMs. The Memory Guy has a very, er, handy graph that shows where CXL fits in the well-established computing memory hierarchy. (Yes, I used the same pun in the last article to introduce the same graph.) Here’s that graph once again, repeated from my previous article:

 

The computer memory hierarchy showing the price and performance of CXL memory (the elephant) compared to other, more well-established memory and storage media. Image credit: Objective Analysis

 

CXL memory is both slower and more expensive than SDRAM, but it permits memory pooling and therefore can reduce or eliminate stranded memory in hyperscaler data centers. In my previous article, I wrote this conclusion:

“The hyperscalers are right; CXL will allow them to reduce the DRAM needs of today’s systems by creating a memory pool that’s easily and dynamically distributed amongst multiple CPUs and servers. I emphasize the word “today” because in the entire 78-year history of electronic digital computers, tomorrow’s systems have always needed more memory than today’s systems because we continue to tackle larger and larger problems. Generative AI (GenAI) is the current poster child for this truism because tomorrow’s GenAI algorithms will need more parameters and will require more memory to store those parameters than today’s algorithms. Consequently, the DRAM makers’ position is also correct and, therefore, I submit that there’s no conundrum, merely different perspectives.”

As you might expect from a 60-page report on the subject, Handy’s “CXL Looks for the Perfect Home” report has a deeper and more nuanced set of conclusions:

1.       CXL memory will be adopted for use only in hyperscale data centers and supercomputers over the next four years.

2.       CXL memory will not significantly impact the present market for server DRAM connected directly to CPU chips.

3.       CXL memory sales into hyperscale data centers will grow to represent only 10 percent of the overall DRAM purchases for data centers over the next four years.

4.       Near term, the primary use for CXL memory modules will be for expanding server memory in hyperscaler data centers because of the lack of infrastructure and memory management software to exploit CXL’s more advanced features and advantages.

There’s far more detail in Jim Handy’s report and its conclusions, but you’ll need to get his report to see that detail. Although the CXL standard is already six years old, these are still early days for this new memory protocol. Although Intel and AMD now make server CPUs that support CXL, it seems to me that the industry’s needs have not yet caught up with CXL’s abilities, but I foresee a time when they will.

Leave a Reply

featured blogs
Nov 22, 2024
We're providing every session and keynote from Works With 2024 on-demand. It's the only place wireless IoT developers can access hands-on training for free....
Nov 22, 2024
I just saw a video on YouTube'”it's a few very funny minutes from a show by an engineer who transitioned into being a comedian...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

ADI Pressure Sensing Solutions Enable the Future of Industrial Intelligent Edge
The intelligent edge enables greater autonomy, sustainability, connectivity, and security for a variety of electronic designs today. In this episode of Chalk Talk, Amelia Dalton and Maurizio Gavardoni from Analog Devices explore how the intelligent edge is driving a transformation in industrial automation, the role that pressure sensing solutions play in IIoT designs and how Analog Devices is reshaping pressure sensor manufacturing with single flow calibration.
Aug 2, 2024
60,237 views