feature article
Subscribe Now

The Steady March of Progress

ARM’s New Cortex-A75 is More of the Same, and That’s a Good Thing

“Those who cannot change their minds cannot change anything.” – George Bernard Shaw

To no one’s great surprise, ARM has released a new set of microprocessor cores.

You could almost set your watch by ARM’s upgrade announcements, so regular and predictable have they become. What’s this – about the umpty-fifth new processor to come out of the British-based, Japanese-owned company in about the last ten years? Do these guys ever take a day off?

ARM has more flavors of CPU than Crest has of toothpaste. Between the Cortex A-series, the low-end M-series, and the little-known R-series, ARM has, like General Motors, a processor “for every purse and purpose.” Alfred P. Sloan would be proud.

New up this month are the Cortex-A75 and Cortex-A55. The A75 is the more interesting of the two because it’s the bigger, faster, better-looking sibling. The A75 more or less replaces the Cortex-A72 and/or -A73 as ARM’s high-end mobile processor. It is not, however, the much-rumored server processor that’s expected later this year. That CPU, code-named Ares (Wonder Woman’s nemesis) will be even faster but won’t be very mobile-friendly.

The new A55 and A75 are the first two new cores in the DynamIQ generation (see March 29, 2017), and the first to implement the ARM v8.2a architecture specification. Well, most of it, anyway. Even these cores don’t execute quite all the new instructions that appear in the spec, although they are an upgrade from current ARM ISAs.

The A75 looks a whole lot like its predecessor, the A73. They have the same 11-stage pipeline, the same seven execution units, and the same three levels of caching. There’s only so much you can change in one year. But it’s the little tweaks that count.

Although the A73 and A75 share the same mix of execution resources, the A75 now has seven instruction queues, one for each unit, up from four on the A73. That should result in less stalling. The A75 also has an additional instruction decoder – three, instead of two – some tweaked branch-prediction logic, and it fetches four instructions per cycle (up from three). Overall, the A75 is less congested than its predecessor, even though they both run similar instructions on similar hardware. It’s not so much that the A75 is faster than the A73. It just slows down less often.

ARM says the A75 is about 20% faster on integer code, and 30% faster on FP, compared to the A73, all things being equal. That’s a nice speed bump for what is essentially a refreshed, rather than a wholly redesigned, CPU core. The A75’s clock rates should be the same as the A73’s, since the pipeline didn’t get any longer or appreciably more (or less) complex. The A75 obviously contains more logic than the A73, yet ARM says the power consumption is the same between the two. Credit more tweaking. On the other hand, if the A75 delivers better performance at the same clock speed and power consumption, it should be able to finish a given task 20% to 30% quicker, permitting an earlier shutdown. Thus, battery-powered applications may actually see a decrease in power. Or better performance for the same power – your choice.

Because the A75 and A55 are compatible with DynamIQ, instead of (or in addition to) big.little, they can theoretically be clustered with any other DynamIQ-compatible ARM processors, not just themselves. Right now, however, that set includes exactly no other processors except the A75 and A55. All new ARM cores from here onwards will presumably be DynamIQ-aware, but in the meantime, these two are it.

DynamIQ’s flexibility comes with a cost. On the plus side, it enables heterogenous mixes of processors – up to 256 of them, in fact. Those CPUs can run at different clock speeds and have very different processing capabilities. Once the selection of DynamIQ-aware CPUs expands beyond just these two, it should be possible to mix and match ARM cores in almost infinite varieties. Furthermore, DynamIQ-compatible CPUs like the A75 and A55 have private, rather than shared, L2 caches, which improve on-core performance a bit.

The downside is that performance across clusters may suffer by a small amount, as the L2 caches are now private. And, since DynamIQ permits mixing CPU clusters running at different speeds, there are necessarily asynchronous interfaces between those clusters. That allows breaking up the clock tree, and it permits the faster cores to run at full speed, but it also requires time-consuming resynchronization any time data travels between clusters. You don’t get something for nothing.

The A75 has acquired some high-end features that its predecessor didn’t have; ones likely borrowed from the still-in-design Ares project. It now supports ECC for its caches, more hypervisor hooks, finer-grained performance monitoring, and an interesting feature known as data poisoning. Normally, when a CPU fetches bad data (i.e., with a parity or ECC error), it throws a hardware fault and everything grinds to a stop while the system figures out what to do with the bad data. But relatively high-performance processors like the A75 frequently fetch data they don’t actually use. They might fetch instructions on the far side of a branch that won’t be executed, or they’ll fetch a long cache line but use only one byte of it. Why pull the fire alarm when the bad data isn’t causing a problem?

With data poisoning, the CPU marks the newly fetched data as bad (“poisoned”), but takes no further action until or unless that data is about to be used. Only then does it throw a fault, at which point the system can go through its usual panic phase. When implemented correctly, data poisoning can avoid unnecessary alarms.

For chip designers on the ARM upgrade treadmill, it’s hard not to like the Cortex-A75. All of the same, but more of it. More, better, faster. For those not using ARM’s processors, it’s getting harder to avoid them. And if you’re planning to buy a new phone in 2018, it’ll be pretty much impossible.

Leave a Reply

featured blogs
Dec 19, 2024
Explore Concurrent Multiprotocol and examine the distinctions between CMP single channel, CMP with concurrent listening, and CMP with BLE Dynamic Multiprotocol....
Dec 3, 2024
I've just seen something that is totally droolworthy, which may explain why I'm currently drooling all over my keyboard....

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured chalk talk

Calibre DesignEnhancer: Layout Modifications that Improve your Design
In this episode of Chalk Talk, Jeff Wilson from Siemens and Amelia Dalton investigate the variety of benefits that the Calibre DesignEnhancer brings to IC design and how this tool suite can be used to find and fix critical design stage issues. They also explore how the Calibre DesignEnhancer can Identify and resolve issues early in design flow with sign-off quality solutions and how you can utilize Calibre DesignEnhancer for your next design.
Dec 16, 2024
3,120 views