feature article
Subscribe Now

Ampere Ups the ARM Ante

Company Tees Up Next Generation of ARM-Based Server Chips

They say there’s no such thing as “the cloud.” It’s just somebody else’s computer. That’s true, but it doesn’t mean that their computer is the same as your computer. Today, most cloud datacenter servers are x86 machines just like your desktop PC except bigger and farther away. But that doesn’t have to be the case. 

Silicon Valley company Ampere Computing thinks that cloud datacenters really should be different from remote PCs, starting with the processor and its instruction set. And today, the company started to lift the veil on its plans to make that happen. 

Ampere’s first-generation Altra processor is already in the market and has been “shipping for revenue since last year,” according to Chief Product Officer Jeff Wittich. It’s about to be joined by the upgraded Altra Max chip, which should enter production in Q3 of this year. Both chips are based on ARM’s Neoverse N1 design running at 3 GHz in TSMC’s 7nm N7 process. 

But Altra and Max are just the warm-up act before Ampere’s second generation of processors debut, possibly by next year. The as-yet-unnamed devices will be based on an entirely new ARM core design that Ampere is designing in-house instead of borrowing from Neoverse. Like Apple and a small handful of other companies, Ampere has been quietly designing its own custom ARM implementations. 

Details are few regarding the new generation, except that it’ll be fabricated in TSMC’s 5nm N5 process and have more than 128 cores and faster memory and I/O compared to Altra, but it will remain fully ARM-compatible. The company isn’t saying if it will base the chip on the recently announced ARMv9 architecture specification. “It’s more nuanced than that,” hints Wittich. 

Ampere is able to design its own ARM-compatible CPU cores thanks to a rare (and expensive) ARM architectural license that it acquired indirectly from AppliedMicro and that company’s X-Gene project. “This is what we’ve been working on for the past three and half years,” says Wittich, pegging the start of CPU development with the founding of the company. In other words, this was their plan all along. 

Having an in-house processor gives Ampere “a more rapid annual cadence” of product introductions than it could have by waiting for ARM’s official rollouts. Ampere says it will add security features to its new core, along with new elements for manageability, telemetry, and resiliency – all things server operators want to see. 

In the meantime, existing Altra customers can look forward to Altra Max later this year. Altra Max ups the core count to 128 (from Altra’s 80). That’s over 50% more processor goodness in the same pin-compatible package. Both run at a solid 3.0 GHz, with no “turbo mode” or variable clock scaling like you’d see on a server-class x86 chip such as Intel’s Xeon or AMD’s Epyc processors. That’s deliberate, and part of what makes Altra different. 

Ampere believes that cloud server workloads are fundamentally different from client PC workloads, starting with the clocking. Servers are shared, and one processor core’s clock frequency shouldn’t affect that of its neighbors. Conventional x86 chips throttle clock speed to remain within a defined thermal envelope, which means a high-demand task running on one CPU core might force a slowdown of the other 31 cores in the same chip. Intel and AMD euphemistically refer to this as turbo mode because it sounds better than don’t-melt-the-chip mode. 

Altra and Altra Max, in contrast, run at a consistent clock rate all the time. In a sense, they’re always in turbo mode and the company says there’s no combination of workloads that will overheat the chips or force a slowdown. Predictability is preserved. 

Ampere’s chips also don’t implement hyperthreading. They’re all single-threaded CPU cores, so the number of cores equals the number of execution threads. That, too, is a nod toward independence and determinism. Server tasks are often broken down into microservices, where multithreading isn’t helpful. It’s more important, says Ampere, that tasks don’t compete for hardware resources or interfere with each other. 

That strategy plays out in the chips’ cache organization, too. Altra and Max both have large L1 and L2 caches, with a comparatively small L3. The last-level cache would be shared among CPU cores (and thus, among tasks), which doesn’t suit the multi-tenancy model of servers. 

The bottom line is that performance scales almost linearly with core count – assuming, of course, that you’re running single-threaded microservices that don’t interact with one another. Ampere hasn’t suddenly found a magical solution to multiprocessor load balancing problems; the company simply focuses its efforts on a subset of tasks that suit its target market. And its chip architecture. 

Wittich points out that users can reduce the processor’s clock frequency if they want to save power, but they never have to. Altra Max operates within the same physical, electrical, and thermal envelopes as Altra, despite having 48 additional CPU cores. At full speed, Altra Max delivers more performance than an x86 processor, or, with the voltage and frequency turned down, it can deliver the same performance for less energy. 

That performance-per-watt ratio has driven a lot of ARM-based server projects… right into the ground. It’s a compelling technical challenge and an attractive market. Who wouldn’t want 1% or 5% of Intel’s lucrative server-processor business? And yet, the failures outnumber the successes by a large irrational number. Ampere may be shipping Altra chips for revenue, but it’s not shipping a whole lot of them for revenue. Ampere’s big-name partners – Microsoft, Oracle, CloudFlare – seem to be kicking the tires, not backing up forklifts loaded with Altra chips. Only one customer, Equinix, has Altra-based servers online and ready for the average Joe to use. But hey, you gotta start somewhere.  

The market for PC processors started out with one or two dominant vendors, and then it had a brief period with a lot of startup competitors, then went back to one or two dominant vendors. Maybe Ampere is right. Maybe the cloud server market really will be different.

Leave a Reply

featured blogs
Nov 22, 2024
We're providing every session and keynote from Works With 2024 on-demand. It's the only place wireless IoT developers can access hands-on training for free....
Nov 22, 2024
I just saw a video on YouTube'”it's a few very funny minutes from a show by an engineer who transitioned into being a comedian...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

Tungsten 700/510 SMARC SOMs with Wi-Fi 6 / BLE
Sponsored by Mouser Electronics and Ezurio
In this episode of Chalk Talk, Pejman Kalkhorar from Ezurio and Amelia Dalton explore the biggest challenges for medical and industrial embedded designs. They also investigate the benefits that Ezurio’s Tungsten700 and 510 SOMs bring to these kinds of designs and how you can get started using them in your next design.
Nov 7, 2024
16,619 views