Pop quiz! What’s the second-most-popular CPU core in the world? First place goes to ARM, of course, but who’s the runner-up?
If you guessed MIPS, PowerPC, x86, Tensilica, 8051, or XMOS, you’re wrong. (In good company, but still wrong.) The correct answer is: ARC.
According to Synopsys, 1.3 billion ARC processors were embedded into chips last year, and that number is growing by about 300 million per year. That puts ARC second only to the mighty ARM. Must be something about the name. Maybe all those designers thought they were getting ARM but licensed ARC by accident.
Not likely. ARC and ARM are vastly different beasts, even though both occupy the same phylum (or is that genus?) of the microprocessor taxonomic tree. They’re both 32-bit RISC processors; both are offered as licensed IP; both are used in SoC development; and both have a number of variations and configuration options. One runs practically every cellphone and tablet in the world, while the other one appears in… uh… where do all those billions of ARC processors go?
In just about anything that’s not a cellphone or a tablet, really. ARC-based chips are in cameras, utility meters, televisions, flash drives, cars, and on and on. Think “embedded system” or “system on chip” and you run a good chance of identifying a product harboring at least one ARC processor. (Extra credit for knowing that ARC has more licensees than ARM does, too.)
This week, Synopsys (the owners of the ARC architecture since its acquisition in 2010) announced its latest and greatest ARC processor, the HS. The ARC-HS is more than a tweak or an upgrade from the existing ARC-EM; it’s a huge leap. In fact, the performance gap between the two suggests there may be a midrange ARC processor in the offing. Whereas the EM putters along in the MHz range, the HS is rated for at least 1.6 GHz (in 28nm high-k silicon), with 2.2 GHz totally doable. The EM’s puny three-stage pipeline is tossed overboard in favor of an all-new 10-stage design with lots of sexy performance-enhancing features. (Disclosure: your humble scribe was once employed by ARC.)
The new pipeline has big-boy features like dynamic branch prediction, out-of-order instruction retirement (albeit with in-order dispatch), and the ability to keep up to eight pending instructions in flight. A unique new feature of the HS is its second, or late, ALU. Arithmetic and logic operations typically execute in stage 6 of the 10-stage pipeline, which is pretty typical. But if the ALU operation depends on data just loaded from memory, that data is unlikely to be available in time. Rather than stall the operation, the HS postpones its resolution to stage 9, in the late ALU. This sidesteps the usual load/use penalty of long pipelines. If the stars align just right (i.e., by accident), the HS can occasionally execute instructions in both the early and the late ALU simultaneously.
As quick as it is, the overriding goal of the HS is to remain small, simple, and power-miserly. ARC isn’t trying to give MIPS, ARM, or PowerPC any serious competition. It’s intended as a deeply embedded CPU core for deeply embedded software. The HS has neither superscalar nor out-of-order execution, two tricks that could have improved performance at the cost of die area and power. Instead, its designers embraced RISC simplicity. In 28nm silicon, a minimally configured HS core measures just 0.12 mm2, which is about one-fifth the size of ARM’s Cortex-R7. An HS processor will likely be smaller than the SRAMs or caches it’s attached to.
The new features are swell, but performance isn’t the secret to ARC’s volume success. That would be its configurability. ARC built its reputation as a DIY processor, a CPU core that designers can tweak, twist, pull, and reshape to suit their own desires. It’s the Silly Putty of CPUs. Developers can add and remove registers, invent their own instructions, change the caches, swap byte ordering, include an FPU, configure a hardware multiplier to improve performance or to save space, and more. It’s not so much a prepackaged processor as a smorgasbord of processor features that designers can browse and select from. The end result may be radically different from your neighbor’s ARC core. Or it might be the same; it’s your call. (For the record, Synopsys also offers preconfigured ARC cores for the less adventurous.)
It’s this configurability – plus ARC’s low cost of ownership – that has led designers to include it seemingly everywhere. If you don’t need a “brand name” processor with a big third-party software base, ARC fits the bill. Its small size takes up less silicon than its better-known competitors, and its licensing terms are less onerous. Like Tensilica (now part of Cadence), ARC’s configurability means that you get the features that you want, with none of the baggage that you don’t.
On the down side, you’re on your own for software. ARC HS is supported by a few real-time operating systems, including ThreadX and MQX, but that’s about it. The processor doesn’t have an MMU, so there’s no Linux or Android port. The compiler and debugger are clever about tracking ARC’s configurability – remove a hardware feature and they automatically remove software support for it – but that’s useful only if you’re compiling your own code. Third-party applications are pretty much nonexistent.
Having said that, the HS is binary compatible with its EM sibling, and it is “source code compatible” with the earlier ARC 600 and 700-series CPUs. All ARC processors implement a core set of instructions that can’t be changed, so it’s not as though the ISA is entirely random.
So maybe the ARC HS isn’t going to power the next Windows Phone or Galaxy tablet. But it might wind up in ten times more devices that have lower profiles. If what you want is a small, unassuming little 32-bit CPU that spins away in some corner of your device, the HS may stand for “hidden secret.”
Hm. I really want to see a problem that a GHz 32-bit CPU *without MMU* is a solution for.