Storm the Barricades and Pro Test

Those darn processors. They keep getting faster and we keep running out of ways to keep track of them.

In the beginning, there was clock speed. Faster clock meant faster processor, right? Not so fast. Some processors – like some people – did more work than others in a given amount of time. So your 5-KHz UNIVAC might or might not be faster than my 3-KHz ENIAC.

So we created benchmarks. Let’s run LINPAC or Whetstone on both of ’em and see who finishes first. That was better, but it still told you only how well your computer ran that test. Not how well it would run any test. Maybe your machine is designed for integer arithmetic while mine is geared toward floating-point math.

So the benchmarks got more complicated, clever, and creative. And componentized. Instead of just one test, we got test suites: bundles of benchmarks that ostensibly tested everything before divulging the details. Much better.

But, but… But your benchmark doesn’t measure memory latency. And my benchmark doesn’t exercise graphics performance. And the industry-standard benchmarks don’t record power consumption, HTML rendering time, or Java performance (an oxymoron).

And the vendors never quote any of those benchmarks in their literature, anyway. They’re still publishing Dhrystone MIPS ratings, as if we’re still in the 1970s. That’s like scoring BMX half-pipe with stone tablets or timing the 400-meter dash with a sundial.

Since the time of EEMBC (the Embedded Microprocessor Benchmark Consortium – the first E is silent – or maybe the second), things have gotten a lot better. The nonprofit group has done a bang-up job of developing all sorts of free benchmarks for all different kinds of technology, market niches, and applications. Chief among these was CoreMark – the main, or core, benchmark in most of its CPU measurements. And the microprocessor world looked upon CoreMark and saw that it was good.

Good, but not great. Trouble is, CoreMark is a fixed benchmark (duh), but the chips it’s testing get ever more complex. So it was just a matter of time before CoreMark was outmatched by the chips it was intended to quantify. For instance, CoreMark doesn’t really do multicore processors. Oh, sure, it’ll run on multicore chips. But it doesn’t really exercise the multi-core-ness. It’s not multithreaded, and it wasn’t really intended to measure parallel processing.

It is also – how shall we say it – small. Like Dhrystone before it (way before it), CoreMark now fits inside the cache of some larger processors. That eliminates the effects of memory latency and bandwidth, which partially defeats the purpose. Sometimes even L2 or L3 caches went untouched. So CoreMark needed to beef up and bulk up in order to meet the demands of today’s demanding processors.

Voila! Right on cue, CoreMark 2.0 arrives. Or, as EEMBC officially calls it, CoreMark-Pro. Despite the name, the Pro version is still free and still available to anyone who wants to download and run it. But it’s much more intensive than before.

Where the original CoreMark had a single integer workload, CoreMark-Pro has five, plus four floating-point workloads, for a total of nine. Each of the nine individual tests is bigger than the original CoreMark, so you’re expected to run them all separately, not all at once. That yields nine technology-specific scores, which Pro will obligingly combine into a single CoreMark-Pro score. (Everyone likes simple, single scores.)

Does it run on multicore chips? Yes. How? That’s up to you. EEMBC believes that task distribution is the proper responsibility of the compiler, developer, operating system, or magical tool – whatever you prefer. Hard-coding CoreMark-Pro for vectorization would have unintentionally favored chips that implemented the same sort of parallelism. Instead, the code is generic, in the sense that it’s easily compiled just like anybody else’s code. However you like to distribute code threads in real life is how you should do it with CoreMark-Pro. Just as with compiler optimizations, thread distribution is likely to affect performance quite a bit. Expect a lot of tweaking in this area, but that’s okay. What’s convenient for EEMBC is also fairer for everyone else. More scalable, too.

CoreMark-Pro does not, in fact, owe much to the original CoreMark. None of the code was reused from its older sibling, although one of the five integer tests is pretty similar to CoreMark in the type of work that it does. As before, users are free to download the source code and free to post their results anywhere they wish. No certification or approval from EEMBC is necessary; cheating is handled strictly on the honor system. If nobody else can verify your CoreMark-Pro scores, the bad Yelp reviews will shame you into compliance.

However, if you desire the imprimatur of third-party certification, EEMBC can perform the task for a nominal fee. But you’ll have to take a number; the waiting list is long and the testing lab has finite resources.

And what becomes of the old CoreMark? It lives on, happily providing a baseline for testing smaller processors (by today’s standards). Changing it now would instantly throw the 500+ published scores out the window. Nobody wants that, so CoreMark 1.0 is now the tool for low-end and midrange chips, while CoreMark-Pro tackles the big iron. For now. It can’t be long before CoreMark 3.0 is in the works