The CXL consortium has been a regular presence at FMS (which this year rebranded itself from the Flash Memory Summit to the Future of Memory and Storage). At FMS 2022, the company announced v3.0 of the CXL specification. This was followed by CXL 3.1 introduction at Supercomputing 2023. Starting as a standard for host-device interaction, it has gradually included other competing standards such as OpenCAPI and Gen-Z. As a result, the specifications have begun to cover a wide range of use cases by building a protocol on top of the ubiquitous PCIe expansion bus. The CXL consortium consists of heavyweights like AMD and Intel, as well as a large number of startups trying to play in various segments on the device side. At FMS 2024, CXL was a leading presence in many vendors’ demo booths.
The migration of server platforms from DDR4 to DDR5, as well as the growth of workloads that require large amounts of RAM (but are not particularly sensitive to either memory bandwidth or latency), have opened up memory expansion modules as one of the first sets of widely available CXL devices. Over the past couple of years, we have seen product announcements from Samsung And Micron in this area.
SK hynix CMM-DDR5 CXL and HMSDK memory module
At FMS 2024, SK hynix demonstrated its CMM-DDR5 CXL memory module based on DDR5 with a capacity of 128GB. The company also detailed its associated Heterogeneous Memory Software Development Kit (HMSDK), a set of libraries and tools at both the kernel and user levels aimed at improving the usability of CXL memory. This is achieved in part by considering the memory pyramid/hierarchy and moving data between the server's main memory (DRAM) and the CXL device based on the frequency of use.
The CMM-DDR5 CXL memory module comes in an SDFF (E3.S 2T) form factor with a PCIe 3.0 x8 host interface. The internal memory is based on DRAM 1α technology, and the device promises DDR5-class bandwidth and latency within a single NUMA hop. Since these memory modules are designed for use in data centers and enterprises, the firmware includes features for RAS (reliability, availability, and serviceability) along with secure boot and other management features.
SK hynix also demonstrated Niagara 2.0, a hardware solution (currently FPGA-based) for memory pooling and sharing, i.e. connecting multiple CXL memory modules for optimal capacity sharing between different hosts (CPUs and GPUs). The previous version only allowed capacity sharing, but the latest version also allows data sharing. SK hynix presented These solutions were presented at CXL DevCon 2024 earlier this year, but it looks like some progress in finalizing the CMM-DDR5 specifications was made at FMS 2024.
Microchip and Micron Showcase CZ120 CXL Memory Expansion Module
Micron had disclosed The CZ120 CXL memory expansion module was based on the Microchip SMC 2000 series CXL memory controller last year. At FMS 2024, Micron and Microchip demonstrated the module on a Granite Rapids server.
Additional information about the SMC 2000 controller was also provided.
The CXL memory controller also includes DRAM die failure handling, and Microchip also provides diagnostic and debugging tools for analyzing faulty modules. The memory controller also supports ECC, which is part of the SMC 2000 series' enterprise-class RAS feature set. Its flexibility ensures that SMC 2000-based CXL memory modules using DDR4 can complement mainstream DDR5 DRAM in servers that only support the latter.
Marvell Announces Structera CXL Product Line
A few days before the start of FMS 2024, Marvell announced new CXL product line under the Structera tag. At FMS 2024 we had the opportunity to discuss this new line with Marvell and gather some additional information.
Unlike other CXL device solutions that focus on memory consolidation and expansion, the Structera product line also includes a compute accelerator portion in addition to a memory expansion controller. All are built on TSMC's 5nm process technology.
The accelerator portion of the Structera A 2504 (A for Accelerator) is a PCIe 5.0 x16 CXL 2.0 device with 16 integrated Arm Neoverse V2 (Demeter) cores running at 3.2 GHz. It includes four channels of DDR5-6400 with support for up to two DIMMs per channel, along with in-line compression and decompression. The integration of powerful server-class ARM CPU cores means that the memory expansion portion of the CXL scales the memory bandwidth available per core, as well as scaling the compute capabilities.
Applications such as Deep-Learning Recommendation Models (DLRM) can benefit from the compute capabilities available in the CXL device. Scaling the availability of bandwidth also comes with a reduction in energy consumption for the workload. The approach also facilitated disaggregation in the server for better thermal design overall.
The Structera X 2404 (X for eXpander) will be available as either a PCIe 5.0 device (one x16 or two x8) with four channels of DDR4-3200 (up to three DIMMs per channel). Features such as built-in (de)compression, encryption/decryption, and hardware-assisted secure boot are also present on the Structera X 2404. Compared to the 100W TDP of the Structera X 2404, Marvell expects this part to consume around 30W. The main purpose of this part is to allow hyperscalers to reuse DDR4 DIMMs (up to 6TB per expander) when increasing server memory capacity.
Marvell also has a Structera X 2504 part that supports four channels of DDR5-6400 (with two DIMMs per channel for up to 4TB per expander). Other aspects remain the same as the DDR4-recycling part.
The company highlighted some unique aspects of the Structera product line — built-in compression optimizes available DRAM capacity, and support for 3 DIMMs per channel for the DDR4 expander maximizes DRAM capacity per expander (compared to competing solutions). The 5nm process reduces power consumption, and the parts support access from multiple hosts. The integration of Arm Neoverse V2 cores is apparently a first for a CXL accelerator, allowing for the delegation of compute tasks to improve overall system performance.
While Marvell has announced specs for Structera parts, it appears that sampling is at least a few quarters away. One of the interesting aspects of Marvell's roadmaps/announcements in recent years has been their focus on creating products that are customized to the needs of large customers. The Structera product line is no exception – hyperscalers are eager to rework their DDR4 memory modules and apparently can't wait to get their hands on expander parts.
CXL is just beginning its slow climb, and the hockey stick segment of the growth curve is definitely not in the offing. However, as more CXL-enabled host systems begin to be deployed, products like the Structera line of accelerators are starting to make sense from a server efficiency perspective.