feature article
Subscribe Now

A Brave New World of Emulation and Software Prototyping

Like so many of the technologies we take for granted today, I managed to find myself embroiled in the very early days of hardware emulation. This refers to the process of imitating the behavior of one piece of hardware (typically a silicon chip you are in the process of designing) with another piece of hardware (typically a special-purpose emulation system).

For the purposes of these discussions, I’m going to throw the term application-specific integrated circuit (ASIC) around with gusto and abandon. However, everything I say in this column is equally applicable to application-specific standard parts (ASSPs) and system-on-chip (SoC) devices. (If you are a bit “fluffy” as to the nuances of ASIC, ASSP, and SoC nomenclature, may I be so bold as to make mention of my book Bebop to the Boolean Boogie, which explains everything in excruciating exhilarating detail).

There are two main “reasons for being” for emulation. The first is that when we are developing a great big hairy ASIC, it’s more than embarrassing if we build it and it fails to perform as planned (trust me, it’s frowny faces all round on such an inglorious day). If we have a real chip in our hands, we can apply stimulus to its inputs and monitor its outputs and say “Yay” or “Nay.” What we need is a way to do this before we have built the chip, only to discover that the answer is “Nay.”

The first solution we started out with circa the 1970s was software simulation. This is where we use a register transfer level (RTL) representation of the device to build a virtual model of the chip in a computer’s memory, and then we apply virtual stimulus to the virtual inputs and observe the responses on the virtual outputs. The problem back in the day was that the computers available to perform the simulations were pathetically poor performance-wise compared to today’s offerings. The fact that computers grew steadily more powerful didn’t help because—at the same time—the designs we were simulating grew steadily larger.

We still use software simulation today because it provides unmatched visibility into the design. However, we predominantly use it to intensively evaluate relatively small portions of the entire device in relative isolation.

One alternative to software simulation, which started to appear circa the mid-1980s, was to create hardware emulators, all based on arrays of chips. What chips? Well, some used off-the-shelf CPUs, some used off-the-shelf FPGAs, and some used custom creations (CPUs, ASICs, FPGAs, and even… but let’s not go there).

These emulators accept the same RTL representation of the design as the software simulator, but this representation is mapped onto the emulator’s processing elements. The initial role of these hardware emulators was to provide simulation acceleration.

The second main reason for emulation is to start developing and verifying software as early as possible in the development cycle. Today’s markets are moving quickly, and you can’t wait until you’ve built your ASIC before you start creating the software to run on it. The answer to this conundrum is to use an emulator to prototype your software on the virtual representation of the hardware—from low-level firmware to embedded software to high-level application software—far in advance or the real-world hardware in the form of your ASIC becoming physically available.

Software simulators are cheap (relatively speaking). Emulators aren’t, or they are, all depending on your point of view. If your emulator allows you to detect and fix problems in your design and get your ASIC out of the door the first time and on time, then it’s cheap when compared to the alternative.

There are a lot of factors in play. For example, we are now looking at chips containing billions of equivalent logic gates. We don’t just simulate/emulate to verify functionality and performance; we also need to evaluate our designs in the context of power consumption. New design starts for purpose-built SoCs and artificial intelligence (AI) accelerators are ramping up at an extraordinary rate. And, on top of this, software now largely defines the product, which means you need to get the software as soon as possible, and you need to get it right.

So, who offers the best emulators? That’s not for me to say (not if I want to keep all my friends). What I can say is that I was just chatting with Jean-Marie Brunet, who is more than confident that he and his colleagues are “kings of the emulation castle,” as it were. Jean-Marie is VP and GM of HAV at Siemens (I had to look at that twice. VP = Vice President, GM = General Manager, and HAV = Hardware Assisted Verification).

The chaps and chapesses at Siemens have identified a 3-tier solution space to address the verification and validation of complex ASICs and systems:

  • Emulation: Fast and deterministic compilation for design bring-up and iteration. Full visibility for fast debug and system-level power-and-performance (PnP) analysis.
  • Enterprise Prototyping: Early firmware and embedded software validation. Fast, congruent transition from emulation. 10x higher throughput per $ than emulation.
  • Software Prototyping: Early system software validation. At-speed interface and IP verification. Extreme flexibility, enabling the highest performance at the lowest cost.

To address this tiered solution space, the guys and gals at Siemens have just announced not one, not two, but three new families of congruent emulation solutions that bring tears of joy to my eyes.

Meet the three Veloce CS solutions (Source: Siemens)

The first thing to note is that I have no idea what the “CS” portion of these monikers stands for. I can’t believe I didn’t ask. Maybe it’s “Chip Speed” (maybe not), but we digress…

The next point to note is that the Veloce Strato CS platform is based on Siemen’s new purpose-built CrystalX chip, while both the Veloce Primo CS and Veloce proFPGA CS platforms are based on AMD’s latest and greatest VP1902 Adaptive SoC FPGA.

As compared to the previous Veloce Strato+ (2021), the Veloce Strato CS (2024) provides 4x the gate capacity, 5x the performance, and 5x the debug throughput. Similarly, as compared to the previous Veloce Primo (2021), the Veloce Primo CS (2024) provides 4x the gate capacity, 5x the performance, and 50x the debug throughput (yes, the 50x in this case is not a finger-slip on my part).

Another big point is that, as opposed to the custom cabinets of the 2021 models, both the Strato CS and Primo CS are presented as server blades, thereby supporting super scaling. In the case of the Strato CS, a single blade can be used to emulate ~170M gates.

Scaling with Veloce Strato CS (Source: Siemens)

The next step up is four of these blades, plus an interconnect blade, forming a 5-blade module. Next, we have a tower containing four modules capable of emulating ~3B gates. Four of these towers can emulate ~12B gates, while 16 towers can emulate ~40+B gates.

By comparison, In the case of the Primo CS, a single blade can be used to emulate ~500M gates, while a 5-blade module (4 Primo CS blades plus 1 interconnect blade) can be used to emulate ~2B gates.

Scaling with Veloce Primo CS (Source: Siemens)

A 4-module Primo CS tower is capable of emulating ~8B gates, while six of these towers can emulate ~40+B gates (the same number of gates as 16 Strato CS towers).

Earlier, I was throwing the term “congruent” around. What does this mean? Well, the dictionary definition suggests corresponding, consistent, matching, compatible, and harmonious. Oh, you don’t mean “What does the word ‘congruent’ mean?” You mean “What does congruent mean in this context?” You need to learn to articulate your questions better.

How about we try this in a sentence, like: “Veloce Strato CS and Veloce Primo CS provide a fully congruent HW/SW offering.” Hmmm. Perhaps a better way to explain this is by means of another illustration as shown below.

Veloce Strato CS and Veloce Primo CS provide a fully congruent HW/SW offering (Source: Siemens)

As we see, the two solutions employ common RTL compiler, synthesis, run time, and debug engines. The only differences in the flow are the place-and-route (PnR) tools used to map the design onto the diverse devices employed by the two platforms.

Both Strato CS and Primo CS allow users to save the current state of the emulation and restore that state later. This can be extremely efficacious if you are performing multi-hour or multi-day emulations. One thing on the roadmap is to provide the ability to save the state of an emulation on Primo CS and restore that state on Strato CS.

Why? Well, suppose you are running a multi-day emulation on a Primo CS and a bug is found on Day 3. The Strato CS has much higher visibility into the design, but do you really want to start the emulation from scratch on the Strato CS and then wait anywhere from 9 to 15 days to reach the problem point in the emulation? “No!” I cry, “One thousand times no!” But suppose you had instructed the Primo CS emulator to save its state once every two hours, for example. In this case, once the folks at Siemens make this feature available, you will be able to take the saved state from the Primo CS prior to the problem and restore that state on the Strato CS. Brilliant!

And let’s not forget the Veloce proFPGA CS (2024), which offers 2x the gate capacity, 2x the performance, and 50x the debug throughput of its Veloce proFPGA (2021) predecessor. Jean-Marie informs me that this system offers the lowest cost of entry on the market, all the way from a single-FPGA desktop board to a multi-blade rack system.

Transforming hardware and software for software prototyping (Source: Siemens)

This is where the Veloce operating system for prototyping (VPS) software comes into play to accelerate bring up. VPS features efficient compilation without requiring any modifications to your RTL. It also boasts automated multi-FPGA partitioning, timing-driven performance optimization, and sophisticated at-speed debug.

And one more thing lest I forget, all these new Veloce platforms—Strato CS, Primo CS, and proFOGA CS—support multiple users working on heterogeneous designs simultaneously. For example, if you have a 40B gate Strato CS system, then two groups can be working on different 20B gate designs (or a 30B gate and a 10B gate design, etc.), four groups can be working on different 10B designs, and… you see what I mean.

I’m thinking of the simple desktop hardware emulator the company I worked for designed back in the mid-1980s. If I could get my time machine working, travel back, and tell my friends what the emulation future held, I know exactly what they would say: “What happened to you? Where did all your hair go? How did you get to be so old? You only popped out to get a sandwich!”

Hmmm. Let’s leave my erstwhile friends having all the fun that was to be found in the 1980s and return to the present. What do you think of everything you’ve read here?

Leave a Reply

featured blogs
Dec 19, 2024
Explore Concurrent Multiprotocol and examine the distinctions between CMP single channel, CMP with concurrent listening, and CMP with BLE Dynamic Multiprotocol....
Jan 10, 2025
Most of us think we know something about quantum computing, right until someone else asks us to explain it to them'¦...

featured chalk talk

Shift Left Block/Chip Design with Calibre
In this episode of Chalk Talk, Amelia Dalton and David Abercrombie from Siemens EDA explore the multitude of benefits that shifting left with Calibre can bring to chip and block design. They investigate how Calibre can impact DRC verification, early design error debug, and optimize the configuration and management of multiple jobs for run time improvement.
Jun 18, 2024
46,571 views