Ampere Altra Addresses ARM Aspirations

“My Zen garden’s more serene than yours.” — Dan Piraro, “Bizarro”

The most powerful force in the computer world is inertia. We stick with the software, tools, operating systems, and hardware architectures that we know, and it takes enormous effort to dislodge us from those foundations. Newton’s Third Law states, “a computer ecosystem in motion will remain in motion, pretty much forever.”

How else to explain the continued prevalence of QWERTY keyboards? Or Windows PCs running on x86 processors? Or of RS-485, TCP/IP, POP3, or ASCII? We like our standards old and dusty, and we’re loath to jettison them for something new. You can’t go reinventing the wheel every single day or you’d never get anything done.

Case in point: cloud servers. “The cloud” was supposed to change everything, and yet we’re still running our global infrastructure on x86 machines, just like we did 30-odd years ago. Why aren’t cloud datacenters CPU-agnostic? Why don’t we have newer, more efficient hardware running more modern software?

Granted, Intel’s (and AMD’s) server chips have gotten way better than before, but it’s still an x86 monoculture. Cloud computing was supposed to free us from that. It was supposed to open the doors for upstart CPU architectures and radical new operating systems and different software environments. Instead, we’ve got glorified PC motherboards mounted in racks. Where are the jetpacks we were promised?

All of this is by way of introducing the newest pretender to the server-CPU throne, Ampere Computing’s Altra. Altra is a massive, ambitious, behemoth of a server chip that’s based on the ARM architecture, not x86. Ampere calls it “the world’s first cloud-native processor,” a somewhat grand claim that underlines the company’s ambitions. Ampere wants nothing less than to finally displace the wheezing old x86 in cloud datacenters.

Ampere is hardly a startup. The company has about 600 employees including, famously, several ex-Intel executives with decades of experience. Investors include Oracle and SoftBank (owners of ARM). These people aren’t screwing around.

The Altra processor isn’t intended for PCs or even HPC (high-performance computing) applications. It’s for cloud servers only, and that focus affected every design decision along the way. It’s here to beat Intel or AMD processors at their own game, producing better performance (on the right benchmarks) with less power, less heat, and lower cost of ownership. It’s what cloud processors were supposed to be all along.

The hardware specs are impressive. It’s got 80 64-bit ARMv8 cores running at 3 GHz, humongous caches, eight DDR4 memory channels, two 128-bit SIMD units for vector and floating-point work, 128 lanes of PCIe Gen4, SBSA management, and CCIX in case you want to hook up external accelerators. The thing is not small: it comes packaged in a 4926-Pin FCLGA and requires multiple power supplies.

But, in return, you get SPECrate_int performance that almost exactly matches AMD’s high-end Epyc 7742 and more than doubles the performance of Intel’s Xeon Platinum 8280. All three benchmarks represent maxed-out two-socket systems. As always, these scores come with footnotes, including “scaling factors” in order to “normalize” the different compilers and options. And, Altra’s numbers are based on Ampere’s projections using 10% overclocking, not actual hardware runs. Caveat emptor.

In performance-per-watt (that is, the SPECrate_int numbers divided by TDP), Altra comes in about 14% ahead of AMD and about 2× better than Intel. In this case, Ampere chose to compare itself to the slightly down-market AMD Epyc 7702 and Intel Xeon 8276 on the basis that they’re the most power-efficient examples of their respective families.

Ampere also offers some cost-per-rack metrics intended to show that Altra-based servers save money over x86-based equivalents. That may be true, but since Ampere doesn’t publish pricing information, and since Intel and AMD both heavily discount their wares, it’s tough to make any fair comparisons. Xeon Platinum processors notionally sell for $10,000 and up, so there’s a big pricing umbrella for Ampere to fit under.

Altra wins points for semiconductor technology. Altra is built on TSMC’s 7nm silicon, just like AMD’s Epyc 7742. That’s a full generation or two ahead of the 14nm lithography used for Intel’s 8280. Ironically, Intel – once the poster child for leading-edge fabrication – is the laggard here in process technology.

That high-end silicon keeps Altra’s power and heat to sub-Chernobyl levels, but the device is no refrigerator. Ampere suggests a 210W TDP (thermal design power requirement), which is right on par with Intel’s 205W or AMD’s 225W. You can (just barely) cool them all with forced air, avoiding the costly and bulky complication of liquid cooling, which would be a deal-breaker for most server makers.

Why 80 CPU cores? Two reasons. One is specsmanship. Intel and AMD both have 64-core processors, and “We wanted to have leadership,” says Ampere’s SVP of Products, Jeff Wittich. The other reason is memory. Like some other high-end processors, Altra has eight DRAM channels, and it’s more efficient to have the processors aligned with their memory interface, rather than treating the two as one big pool. Any multiple of eight would’ve worked (56 CPUs, 64 CPUs, 72 CPUs, etc.) but the company chose 80 as its target.

Inside those 80 CPU cores is a hybrid of ARM’s 64-bit architectures. Ampere describes it as ARMv8.2+, meaning it’s got all the features of the official v8.2 specification, plus some features that Ampere cherry-picked from v8.3 and v8.5. Each CPU core is superscalar, with 4-wide out-of-order execution. What they don’t have is multithreading.

The lack of multithreading seems odd in such a high-end processor, especially since the obvious competitors from Intel and AMD have both offered that feature for years as a matter of course. It’s expected, and ARM fully supports it. AMD’s 64-core Epyc 7742 supports 128 threads, versus 80 threads for Altra. So why no multithreading in Altra?

Ampere seems almost defensive in justifying its single-threaded design decision, like a mother protecting her newborn, but the company may have a valid point. Server systems really do differ from their desktop ancestors in at least one case: they run workloads for several unrelated users. Most x86 systems, even big ones, tend to run large applications for a single user, and they benefit from chopping up that workload into multiple threads. The job is done when all threads complete. But cloud servers, by their very nature, tend to run smaller, unrelated tasks from isolated users. There’s less commonality among tasks (none at all, really) and therefore less reason to share computing resources, caches, and program state. It might even be counterproductive for two (or more) threads to share a processor core and its caches; they’d interfere with each other.

Whatever the reasoning, Altra executes just one thread per processor, giving it less performance potential than its x86 opponents. Whether that translates into less actual performance on the relevant server workloads remains to be seen. The benchmarks suggest it’s not a problem.

Altra also doesn’t have any special hardware accelerators, apart from the global pair of SIMD units. There are no ML training engines on the chip, nor any custom coprocessors, hardware assists, or instruction-set extensions. Ampere wanted “a general-purpose processor that runs well across a majority of common [cloud server] workloads,” said Wittich.

Ampere is producing Altra chips now, so get your checkbook ready if you’re in the market for ARM-based server silicon. The company also has upgrades on the horizon, with chips codenamed Mystique and Siryn due in 2021 and 2022, respectively. Mystique will be a drop-in upgrade for Altra, probably with more CPU cores. Siryn is still in the definition stage.

Which brings us back around to the initial question. What’s the point of ARM-based server chips? Or to be more optimistic, what’s different this time?

We’ve been expecting non-x86 servers for some time now, but they never seem to quite materialize. Amazon, AMD(!), AppliedMicro, Avago, Broadcom, Calxeda, Cavium, Fujitsu, HP, Huawei, Marvell, Qualcomm, Samsung, and probably a dozen others that I’ve forgotten about all took a mighty (and mighty expensive) whack at the problem, with little apparent effect. ARM-based servers are approximately zero percent of today’s market.

No less an authority than Linus Torvalds wrote just a year ago that ARM servers are a dumb idea. “I can pretty much guarantee that… the platform won’t be all that stable. Or successful. Guys, do you really not understand why x86 took over the server market?” He goes on to identify the usual hurdles, including that old foe, inertia. We’ve all invested too much time and energy in x86 development to abandon it for the sake of a few watts or a few dollars on reduced TCO. If those benefits even are real; past evidence suggests they’re not.

Not every ARM server chip took the big dirt nap. Marvell’s ThunderX2 is still alive, and AppliedMicro’s X-Gene project – including its valuable architectural license from ARM – was bought and sold twice, eventually winding up in Ampere’s hands two years ago. So, you could argue that Altra is X-Gene, with some updates.

There’s no question that Altra was a massive engineering undertaking. It’s a wonderful chip, impressive in its complexity, its performance, and its apparent competitiveness. But I can’t help feeling that it’s an elaborately engineered answer to a question nobody’s asking.