Low-Power Servers: Opportunity or Oxymoron?

You knew it had to be about ARM. Everything today is about ARM. So it’s no big surprise that ARM is elbowing its way into the formerly sacrosanct halls of server farms. You know, those big echoing hallways filled with racks upon racks of server blades, all humming along as they power the Matrix—er, the Internet.

Server racks have traditionally been the domain of big, burly, he-man microprocessors from the likes of Intel, Sun, or IBM. Only the beefiest, most power-mad processors need apply. “This is man’s work, sonny, and no little sprouts are welcome here. See what we’re doing here? We’re building the future! Why, we eat Google searches for lunch! Step aside as I process another million Amazon transactions.”

Yeah, well, good luck with that. Better check your rear-view mirrors, guys, because there’s an upstart about to run right up your tailpipe. There’s a 90-pound weakling in your midst and he’s fixin’ to kick sand in everybody’s face.

It’s finely refined sand, mind you. Silicon, in fact. And it’s about to get in the eyes of the big-iron server-chip vendors.

It’s no secret that ARM-based chips have pretty much taken over the world of cell phones and, by extension, tablets and other not-really-a-computer gadgets. ARM chips have a reputation for being stingy with power (partially true) and inexpensive (also partially true). But they’ve never been viewed as… you know… especially powerful. They’re fine for toys and games, but you wouldn’t want your server using one.

Servers require ultimate performance. Servers require massive data caches. Servers require big power cables, blinking LEDs, and loud fans. Running a server on an ARM-based chip would be like towing a Boeing 747 with hummingbirds. Lots and lots of hummingbirds.

Server, meet hummingbird. Two sane and reputable chip companies (plus a number of lesser-known startups) are readying production of ARM-based processor chips aimed specifically at servers. And not weenie little home servers, either. We’re talking big iron: server blades for Amazon, Google, Yahoo!, Netflix, and the other titans of the Internet. Bandwidth from the bandy-legged. Who woulda thunk it?

First among these upstarts is Calxeda, a Texas-based company that has designed its “EnergyCore” processor family around the familiar ARM Cortex-A9. Or, more accurately, multiple Cortex-A9s. Calxeda’s initial chip will have four A9s on it, all running a little north of 1 GHz apiece. That’s nice, but is it really server-level performance? After all, the cute little iPad 2 runs on a dual-core A9, and nobody’s wiring those up to optical backbones.

Hang on; it gets better. EnergyCore chips also have five (count ’em) 10-Gbit XAUI interfaces on them (that’s network-speak for fast pipe). And SGMII. And PCI Express. And SATA. The list goes on, but you get the idea: it’s a big network-I/O chip with some processors inside. The bandwidth of those network interfaces easily exceeds the processing power of the four A9s, but that’s okay, because not all network traffic will pass through the processors. And that brings us to the entire point of Calxeda’s strategy.

EnergyCore chips are intended to be installed in clusters, like a mesh. Clustered together, you get an awesome amount of network bandwidth and a semi-awesome amount of computing power, all through relatively low-cost chips that burn only a few watts apiece. You can add or subtract chips to get the price/performance level you want, all without gutting and replacing your expensive server hardware. It’s very scalable, which server buyers really like.

Over on the West Coast, Applied Micro (APM) is working on a similar strategy. APM hasn’t revealed the details if its chips yet, except to say that they’ll be based on the newly announced 64-bit ARM v8 architecture (see “When I’m Sixty-Four” at https://www.eejournal.com/archives/articles/20111102-64bit/). They’ll also run at faster clock rates, at about 3 GHz, and have up to 32 cores per chip. That should certainly position APM’s devices higher up the performance ladder than Calxeda’s.

How do ARM vendors get off thinking they can design server chips? Aren’t those little cell phone processors woefully underpowered? Well, yes and no. Turns out, most big server traffic is just that: traffic. It’s more important to move the data from Point A to Point B than it is to massage it along the way. Server processors don’t do all that much processing in the usual sense; that’s why we have dozens of companies making specialized communications chips. What processing they do perform is typically done in short bursts, on transient data, to a lot of unrelated packets at once. In that kind of environment, a cluster of independent processors can work just as well as (or even better than) one big processor. Big processors like Intel’s Xeon and Itanium were designed (initially, at least) for big computing problems. Network servers aren’t like that. They’re the canonical example of parallel multitasking, so parallel multicore chips do a pretty good job of it. And if you factor in the energy consumed per unit of work performed, they’re actually far more efficient.

That’s got Intel, Sun (now Oracle), and the PowerPC folks a bit worried. Servers were traditionally the preserve of “big iron” processors that pushed the limits of performance. Server processors were the company flagship and also a nice profit center. Intel’s server chips, for example, are astonishingly lucrative, considering they’re based on the same basic CPU architecture as the company’s embedded Atom processors.

So if ARM’s advantage is mainly power efficiency, why can’t Intel simply dial down the power of its Xeon chips and undermine the business strategy of Calxeda and APM? They’re trying. They’re trying very hard, in fact.

Intel’s problem—if you can call it that—is that it’s too popular. Its x86 chips have developed a huge installed base and an enormous third-party software ecosystem. That popularity has cemented the x86 architecture in place; Intel can’t change it. And the architecture needs to change if it is to compete effectively against ARM-based (or any RISC-based) intruders.

It’s not as though Intel’s chip designers aren’t smart enough; they are. I know some of them personally and they’re the smartest CPU architects and circuit designers in the room, if not the entire galaxy. It’s just that Intel’s whole value proposition is based on its microprocessors being x86 compatible, and you can’t be “partially compatible.” The designers are duty-bound to implement the entire x86 instruction set, warts and all, including oddball features accumulated over 40 years of development. Granted, some little-used instructions can be implemented in firmware or through emulation if the designers want to, but they absolutely, positively have to be in there in some form. You can’t jettison all that compatibility and RISC-ify the x86 and still call it an x86 chip. For good or ill, these guys are saddled with the oldest, strangest, and least orthogonal CPU architecture in the world. That they make it work at all is a testament to how smart they really are.

Intel has brand recognition and customer momentum in its favor. It also has the world’s best semiconductor manufacturing technology under its own roof. The ARM vendors have the advantage of a newer architecture but are handicapped slightly by less-advanced silicon production. They’re also starting from a position of weakness, having to build up relationships with new customers and displace the incumbent Intel (or IBM, et al).

Server users (meaning you and me) don’t really care what processor is inside a remote server blade. Server buyers (e.g., Amazon) do care; they’re paying for the hardware and they’re paying for the power and air conditioning to keep the hardware happy. They’re the ones who will decide which server design style wins. Tried and true, or new and improved? I think the old-school server chip vendors need to keep an eye on the rear view mirror. They’re about to be overtaken.