feature article
Subscribe Now

ARM Floats Helium for Cortex-M

New Instruction-Set Extensions aid DSP, ML Performance

“I’m sorry, but neon just doesn’t look good on anybody!” — Tiffani Thiessen

First there was Neon, now there’s Helium. ARM has pulled the wraps off a package of DSP and machine-learning extensions for its low-end Cortex-M processors, sort of like Neon but not. Whereas Neon added DSP features to the Cortex-A family, Helium adds similar, but different, features to Cortex-M. So, Helium is lighter than Neon.

Atomic weights aside, Helium will add a substantial boost to the performance and capabilities of future Cortex-M processors. We won’t see any of those chips for quite a while, but compiler writers and RTOS vendors can get started now, if they like.

ARM won’t say when the first Helium-enabled core will be released or what it will be called. What ARM would say is that the first actual silicon containing Helium is about two years away, which puts the release of the first Helium-enabled core about 6–9 months out.

With chips so far away, and no new cores announced, why release the technical specifications for Helium now? Software. The changes are substantial enough that every RTOS, compiler, and middleware vendor will need to update their support before Helium hits the street. A two-year head start should be enough.

Helium will almost certainly be part of every future Cortex-M core from here on out, although it will be a user-selectable option. That is, ARM will include it in your next Cortex-M license, but you don’t have to enable it. Neon works the same way: it’s included, but implementation is optional.

Significantly, Helium is not compatible with Neon, at either the binary or the source-code level. Even though they implement similar DSP and vector features, they do so in different ways. You won’t be able to simply port code from a Cortex-A with Neon to a Cortex-M with Helium. You’ll need to rewrite, not just recompile.

This is a real departure for ARM, a company that has historically been a stickler for unquestioned family-wide compatibility. With just a few exceptions (mostly from the company’s younger days), all ARM processors have been software compatible with their siblings, ancestors, and offspring. Helium is the first truly new and incompatible extension in a long time.

The company is acutely aware of this and promises to provide readymade DSP and vector libraries to help customers jump the gap from Neon to Helium. Programmers who are already familiar with Neon will have a leg up on those who’ve never used ARM’s DSP extensions before, but it won’t be a cakewalk.  

Helium, like Neon, is a huge package of features intended to bring signal-processing and vector-processing capability to an otherwise generic ARM processor. As such, it includes more than 150 new instructions, along with new internal registers and new data types. The idea is to displace separate DSP cores from Ceva, DSP Group, and others, and to bring that technology on-chip, with a single processor core and a single instruction set. Not incidentally, it also keeps the customer’s licensing revenue within ARM.

The DSP features include some new low-overhead loop instructions that eliminate the first and last instructions in a typical control loop, so that the CPU spends more time looping and less time deciding whether it’s supposed to keep looping. These are routine for a “real” DSP but are new to the Cortex-M family. There’s even a “loop tail” instruction to handle those awkward cases where the number of data elements isn’t evenly divisible by two or four.

Scatter/gather addressing modes are also included, and these help the processor walk through memory with programmable strides, another common DSP feature. Branch hints (i.e., branch-probably and branch-probably-not) are another ISA tweak for accelerating DSP loops.

Helium also adds saturating arithmetic as well as signed and unsigned rounding modes. The number and variety of data types has also changed, with Helium supporting 8, 16, 32, and (with a little work) 128-bit fixed-point data. On the floating-point side, there are the usual single-precision (32-bit) and double-precision (64-bit) data types, with a new half-precision (16-bit) data type debuting with Helium. This last type is expected to be useful for voice-activation features, where audio fidelity isn’t important, but lag and throughput are. A Cortex-M with Helium can process twice as many samples of half-precision data compared to single-precision data, an important feature for a relatively slow and cheap microcontroller.

Vector operations can broadside two or four elements though the ALU at once, giving future Cortex-M devices some lightweight SIMD credentials. New lane-predication features allow the CPU to conditionally process some elements in a SIMD package while ignoring others, a time- and code-saving trick that even Neon doesn’t offer. To save space and energy, Helium reuses the Cortex-M’s normal FPU registers as its vector registers – just one of the reasons Neon code isn’t transportable to Helium.

Although Helium is the showpiece component of the new ARM v8.1-M architecture specification, there’s more to the spec than Helium. Also included are some debug tweaks, updates to the MPU (memory protection unit), changes to TrustZone, and RAS (reliability, availability, scalability) extensions. Interestingly, ARM also says the ARMv8.1-M specification has been “cleaned up, with regard to unpredictable cases.” Any new Helium-compatible processor core will come bundled with the other new v8.1-M enhancements, although not all v8.1-M cores will have Helium.

On one hand, Helium is a useful and obvious addition to the Cortex-M family. ARM saw that many of its licensees were strapping a DSP alongside their microcontroller and took the obvious step of offering an in-house equivalent. The company had already done most of the groundwork with Neon; it just needed a lighter-weight version for Cortex-M.

On the other hand, the fact that the Helium is so different from Neon means there’s little software compatibility between the two, making the upgrade/downgrade story a tougher sell. Previously, anyone using a low-end Cortex-M could easily upgrade to the bigger, faster Cortex-R or Cortex-A families without too much stress. Integer code was upgradeable. Even floating-point code was upgradeable. But now, the all-important DSP code is not upgradeable. And DSP code is notoriously difficult to rewrite, as it’s often time-sensitive and performance-critical. Any decent toolchain can recompile integer control code, but porting motion-control loops or DSP filters to a new processor is another matter. Helium may have the ARM brand name written on the side, but it’s unfamiliar hardware underneath.

On the third hand, programmers will learn Helium’s tricks and quirks soon enough, helped along by ARM’s promised software libraries and a bevy of third-party development tools. This is ARM we’re talking about, after all. Only the most popular CPU architecture in the world. DSP/ML/vector operations are important for “node” devices at the edges of our everyday networks, so adding Helium (or something very much like it) was inevitable. Now that it’s here – in specification form, anyway – we know what we have to work with.

3 thoughts on “ARM Floats Helium for Cortex-M”

  1. It’s interesting that the headline features of lane predication and using the vectorized loop to deal with odd-length loop tails are shared with SVE and the RISC-V Vector extension, making MVE more similar to those than it is to NEON.

    It’s also interesting that the fixed 128 byte vector register file of MVE (the FP register file repurposed) is identical to the minimum configuration of the RISC-V Vector extension, but RVV also covers all the ground covered by SVE (and more … SVE maxes out at 2048 bit vector registers), but using a single instruction set and programming model at all processor sizes and for both 32 bit and 64 bit processors.

  2. Complex processors are hard to take advantage of without good programming models, many excellent efforts have died due to lack of usability, I suspect this will be one of those. There’s also no indication that ARM know anything more about this market than anyone else, and it seems likely the open-source guys will come up with something as good.

    Xilinx are probably in a similar boat trying to use SystemC for their DSP/AI effort.

    Can’t teach old dogs new tricks, and ARM is a very old dog.

Leave a Reply

featured blogs
Jan 22, 2025
Shouldn't Matter mean I can eliminate all my other smart home apps? Almost. When it comes to smart home apps, review what device types might need an app....
Jan 23, 2025
The publisher of Practical Electronics magazine has made my entire Arduino Bootcamp Collection of columns available for download (hurray!)...

featured chalk talk

FPGA-based Prototyping with the Latest High-Capacity FPGA Enables New Use Modes
FPGA-based prototyping is an essential tool for any SoC and digital chip design and verification. In this episode of Chalk Talk, Juergen Jaeger from Siemens and Amelia Dalton explore the multitude of benefits of the Veloce proFPGA CS platform from Siemens. They also investigate the debug capabilities, software prototyping and scalable hardware of this solution, and how you can use the Veloce proFPGA CS solution for your next design.
Jan 6, 2025
0 views