Intel Finally Launches Sapphire Rapids: It’s All About The Accelerators, Baby

After many delays, Intel has finally launched the long-awaited Sapphire Rapids family of server processors, now named the 4th Generation Intel Xeon Scalable processors and the Intel Xeon CPU Max Series. Both names are mouthfuls, which has become typical of Intel product naming. Also typical is Intel’s ability to change the playing field to its advantage. Here, the changed playing field emphasizes the greatly boosted capabilities of the numerous hardwired accelerators and new instruction set architecture (ISA) extensions that Intel has added to these new server CPUs. The accelerators deliver truly significant performance gains relative to previous generations of Xeon CPUs and CPUs from AMD for specifically targeted and common tasks executed almost universally in data center applications including artificial intelligence (AI), networking, 5G Radio Area Networks (RANs), data encryption and security, and high-performance computing (HPC). In all, Intel is launching 52 new Xeon product SKUs.

These built-in accelerators and include:

Advanced Matrix Extensions (AMX): Improves the performance of deep learning training and inference. It is used to implement workloads including natural language processing, recommendation systems, and image recognition.
QuickAssist Technology (QAT): Offloads data encryption, decryption and compression.
Data Streaming Accelerator (DSA): Improves the performance of storage, networking, and data-intensive workloads by speeding up streaming data movement and transformation operations across the CPU, memory caches, and main memory, as well as all attached memory, storage, and network devices.
Dynamic Load Balancer (DLB): Improves overall system performance by facilitating the efficient distribution of network processing across multiple CPU cores and threads and dynamically balancing the associated workloads across multiple CPU cores as the system load varies. Intel DLB also restores the order of networking data packets processed simultaneously on CPU cores.
In-Memory Analytics Accelerator (IAA): Increases query throughput and decreases the memory footprint for in-memory databases and big data analytics workloads.
Advanced Vector Extensions 512 (AVX-512): This accelerator is the latest in the company’s long line of evolved vector instruction sets. It incorporates one or two fused multiply-add (FMA) units and other optimizations to accelerate the performance of intensive computational tasks such as complex scientific simulations, financial analytics, and 3D modeling.
Advanced Vector Extensions 512 for virtualized radio access network (AVX-512 for vRAN): The Intel AVX-512 extensions, specifically tuned for the needs of vRAN, deliver greater computing capacity within the same power envelope for cellular radio workloads. This accelerator helps communications service providers increase the performance-per-watt figure of merit for their vRAN designs, which helps to meet critical performance, scaling and energy efficiency requirements.
Crypto Acceleration: Moves data encryption into hardware, which increases the performance of pervasive, encryption-sensitive workloads such as the secure sockets layer (SSL) used in web servers, 5G infrastructure, and VPNs/firewalls.
Speed Select Technology (SST): Improves server utilization and reduces qualification costs by allowing public, private, and hybrid cloud customers to configure a single server to match fluctuating workloads using multiple configurations, which improves total cost of ownership (TCO).
Data Direct I/O Technology (DDIO): Reduces data-movement inefficiencies by facilitating direct communication between Ethernet controllers and adapters and the host CPU’s memory cache, thus reducing the number of visits to main memory, which cuts power consumption while increasing I/O bandwidth scalability and reducing latency.

Extensions to the 4th Generation Intel Xeon CPUs include:

Software Guard Extensions (SGX): This previously existing set of security-related extensions to the x86 instruction set architecture (ISA) allow user-level and operating system (OS) code to improve the security of workloads running in virtualized systems by defining protected private regions of memory, called enclaves. Intel claims that SGX is the most researched, updated and deployed confidential computing technology in data centers on the market today, and these extensions are used by a wide range of cloud service providers (CSPs).
Trust Domain Extension (TDX): These new ISA extensions, available through select cloud providers in 2023, further increases confidentiality at the virtual machine (VM) level beyond SGX. Within a TDX-protected virtual machine (VM), the guest OS and VM applications are further isolated from access by the cloud host, hypervisor, and other VMs on the platform.
Control-Flow Enforcement Technology (CET): These hardware-based extensions help to shut down an entire class of system memory attacks by protecting against return-oriented and jump/call-oriented programming attacks, which are two of the most common software-based attack techniques.

It’s critical to note that these new CPUs make important and strategic use of Intel’s heterogeneous, chiplet-based packaging technology to assemble as many as four processor tiles into one package. In addition, the Intel Xeon CPU Max Series uses these same packaging technologies to add two high-bandwidth memory (HBM) chiplet stacks to each CPU tile. HBM is a high-capacity stack of DRAM chiplets that act as a large, high-speed memory cache. Intel claims that the CPU Max Series is the first x86 CPU to incorporate HBM.

Intel rolled out a long list of customers and testimonials for these new CPUs. This list included testimonials from CSPs, server vendors, partners, and end users including some surprise company names. At launch, the companies providing testimonials included Amazon Web Services (AWS), Cisco, Cloudera, Dell Technologies, Ericsson, Fujitsu, Google Cloud, Hewlett Packard Enterprise, IBM Cloud, Inspur Information, Lenovo, Los Alamos National Laboratory (LANL), Microsoft Azure, Nvidia, Numenta, Oracle, Red Hat, SAP, Supermicro, Telefonica, and VMware.

Of particular note from all of these testimonials:

Ericsson plans to deploy these new CPUs in its Cloud RAN.
LANL reports seeing as much as an 8.57x improvement in some HPC workloads using pre-release CPU silicon.
NVIDIA is pairing Intel’s 4th Gen Xeon CPUs with NVIDIA H100 Tensor Core GPUs and NVIDIA ConnectX-7 networking for its latest generation of NVIDIA DGX systems.
Supermicro is incorporating the 4th Generation Intel Xeon processors and the Intel Xeon CPU Max Series into more than 50 new server models.
VMware will support the new CPU features in vSphere.

Intel has often changed the playing field to gain the upper hand. In the late 1970s, when Intel’s 8086 microprocessor delivered far less performance and much less capability than competing microprocessors from Motorola and Zilog, Intel mounted a superior support and software program that transformed a self-admitted dog of a processor into a world beater. Although there’s nothing dog-like about these new Xeon CPUs, Intel has once more altered the playing field in an attempt to confound AMD’s attempts to gain more market share in the server CPU space. However, AMD has proven that it is game to engage Intel on any playing field. We will need to wait and see how AMD returns this latest volley.

FPGA maker Intel PSG spins out of Intel and adopts old name: Altera

A Fable: Once upon a time, in the incredibly wealthy Kingdom of Silicon, there lived a scrappy company named Duchess Altera, who shared the Duchy of FPGAs with an archrival named Duchess Xilinx. For more than three decades, Duchess Altera and Duchess Xilinx battled for control of the Duchy of…

Intel’s latest version of oneAPI takes advantage of new Intel Xeon improvements, supports AMD and Nvidia

In its quest to make oneAPI a viable alternative to Nvidia’s CUDA for parallel-processing software development, Intel has released the 2023.1 version of its oneAPI tools. Last August in EEJournal, I wrote: “Nvidia has something that Intel and AMD covet. No, it’s not GPUs. Intel and AMD both make GPUs.…

Intel’s 14nm ‘Tock’ Dilemma

With the production arrival of Broadwell, Intel has finally executed the ‘tick’ onto their impressive 14nm FinFET process. Broadwell is built on the Haswell microarchitecture, introduced some half-dozen quarters back on the 22nm FinFET process. The question under discussion is when and where Intel will execute the ‘tock’ onto the…