feature article
Subscribe Now

A Brave New World of Emulation and Software Prototyping

Like so many of the technologies we take for granted today, I managed to find myself embroiled in the very early days of hardware emulation. This refers to the process of imitating the behavior of one piece of hardware (typically a silicon chip you are in the process of designing) with another piece of hardware (typically a special-purpose emulation system).

For the purposes of these discussions, I’m going to throw the term application-specific integrated circuit (ASIC) around with gusto and abandon. However, everything I say in this column is equally applicable to application-specific standard parts (ASSPs) and system-on-chip (SoC) devices. (If you are a bit “fluffy” as to the nuances of ASIC, ASSP, and SoC nomenclature, may I be so bold as to make mention of my book Bebop to the Boolean Boogie, which explains everything in excruciating exhilarating detail).

There are two main “reasons for being” for emulation. The first is that when we are developing a great big hairy ASIC, it’s more than embarrassing if we build it and it fails to perform as planned (trust me, it’s frowny faces all round on such an inglorious day). If we have a real chip in our hands, we can apply stimulus to its inputs and monitor its outputs and say “Yay” or “Nay.” What we need is a way to do this before we have built the chip, only to discover that the answer is “Nay.”

The first solution we started out with circa the 1970s was software simulation. This is where we use a register transfer level (RTL) representation of the device to build a virtual model of the chip in a computer’s memory, and then we apply virtual stimulus to the virtual inputs and observe the responses on the virtual outputs. The problem back in the day was that the computers available to perform the simulations were pathetically poor performance-wise compared to today’s offerings. The fact that computers grew steadily more powerful didn’t help because—at the same time—the designs we were simulating grew steadily larger.

We still use software simulation today because it provides unmatched visibility into the design. However, we predominantly use it to intensively evaluate relatively small portions of the entire device in relative isolation.

One alternative to software simulation, which started to appear circa the mid-1980s, was to create hardware emulators, all based on arrays of chips. What chips? Well, some used off-the-shelf CPUs, some used off-the-shelf FPGAs, and some used custom creations (CPUs, ASICs, FPGAs, and even… but let’s not go there).

These emulators accept the same RTL representation of the design as the software simulator, but this representation is mapped onto the emulator’s processing elements. The initial role of these hardware emulators was to provide simulation acceleration.

The second main reason for emulation is to start developing and verifying software as early as possible in the development cycle. Today’s markets are moving quickly, and you can’t wait until you’ve built your ASIC before you start creating the software to run on it. The answer to this conundrum is to use an emulator to prototype your software on the virtual representation of the hardware—from low-level firmware to embedded software to high-level application software—far in advance or the real-world hardware in the form of your ASIC becoming physically available.

Software simulators are cheap (relatively speaking). Emulators aren’t, or they are, all depending on your point of view. If your emulator allows you to detect and fix problems in your design and get your ASIC out of the door the first time and on time, then it’s cheap when compared to the alternative.

There are a lot of factors in play. For example, we are now looking at chips containing billions of equivalent logic gates. We don’t just simulate/emulate to verify functionality and performance; we also need to evaluate our designs in the context of power consumption. New design starts for purpose-built SoCs and artificial intelligence (AI) accelerators are ramping up at an extraordinary rate. And, on top of this, software now largely defines the product, which means you need to get the software as soon as possible, and you need to get it right.

So, who offers the best emulators? That’s not for me to say (not if I want to keep all my friends). What I can say is that I was just chatting with Jean-Marie Brunet, who is more than confident that he and his colleagues are “kings of the emulation castle,” as it were. Jean-Marie is VP and GM of HAV at Siemens (I had to look at that twice. VP = Vice President, GM = General Manager, and HAV = Hardware Assisted Verification).

The chaps and chapesses at Siemens have identified a 3-tier solution space to address the verification and validation of complex ASICs and systems:

  • Emulation: Fast and deterministic compilation for design bring-up and iteration. Full visibility for fast debug and system-level power-and-performance (PnP) analysis.
  • Enterprise Prototyping: Early firmware and embedded software validation. Fast, congruent transition from emulation. 10x higher throughput per $ than emulation.
  • Software Prototyping: Early system software validation. At-speed interface and IP verification. Extreme flexibility, enabling the highest performance at the lowest cost.

To address this tiered solution space, the guys and gals at Siemens have just announced not one, not two, but three new families of congruent emulation solutions that bring tears of joy to my eyes.

Meet the three Veloce CS solutions (Source: Siemens)

The first thing to note is that I have no idea what the “CS” portion of these monikers stands for. I can’t believe I didn’t ask. Maybe it’s “Chip Speed” (maybe not), but we digress…

The next point to note is that the Veloce Strato CS platform is based on Siemen’s new purpose-built CrystalX chip, while both the Veloce Primo CS and Veloce proFPGA CS platforms are based on AMD’s latest and greatest VP1902 Adaptive SoC FPGA.

As compared to the previous Veloce Strato+ (2021), the Veloce Strato CS (2024) provides 4x the gate capacity, 5x the performance, and 5x the debug throughput. Similarly, as compared to the previous Veloce Primo (2021), the Veloce Primo CS (2024) provides 4x the gate capacity, 5x the performance, and 50x the debug throughput (yes, the 50x in this case is not a finger-slip on my part).

Another big point is that, as opposed to the custom cabinets of the 2021 models, both the Strato CS and Primo CS are presented as server blades, thereby supporting super scaling. In the case of the Strato CS, a single blade can be used to emulate ~170M gates.

Scaling with Veloce Strato CS (Source: Siemens)

The next step up is four of these blades, plus an interconnect blade, forming a 5-blade module. Next, we have a tower containing four modules capable of emulating ~3B gates. Four of these towers can emulate ~12B gates, while 16 towers can emulate ~40+B gates.

By comparison, In the case of the Primo CS, a single blade can be used to emulate ~500M gates, while a 5-blade module (4 Primo CS blades plus 1 interconnect blade) can be used to emulate ~2B gates.

Scaling with Veloce Primo CS (Source: Siemens)

A 4-module Primo CS tower is capable of emulating ~8B gates, while six of these towers can emulate ~40+B gates (the same number of gates as 16 Strato CS towers).

Earlier, I was throwing the term “congruent” around. What does this mean? Well, the dictionary definition suggests corresponding, consistent, matching, compatible, and harmonious. Oh, you don’t mean “What does the word ‘congruent’ mean?” You mean “What does congruent mean in this context?” You need to learn to articulate your questions better.

How about we try this in a sentence, like: “Veloce Strato CS and Veloce Primo CS provide a fully congruent HW/SW offering.” Hmmm. Perhaps a better way to explain this is by means of another illustration as shown below.

Veloce Strato CS and Veloce Primo CS provide a fully congruent HW/SW offering (Source: Siemens)

As we see, the two solutions employ common RTL compiler, synthesis, run time, and debug engines. The only differences in the flow are the place-and-route (PnR) tools used to map the design onto the diverse devices employed by the two platforms.

Both Strato CS and Primo CS allow users to save the current state of the emulation and restore that state later. This can be extremely efficacious if you are performing multi-hour or multi-day emulations. One thing on the roadmap is to provide the ability to save the state of an emulation on Primo CS and restore that state on Strato CS.

Why? Well, suppose you are running a multi-day emulation on a Primo CS and a bug is found on Day 3. The Strato CS has much higher visibility into the design, but do you really want to start the emulation from scratch on the Strato CS and then wait anywhere from 9 to 15 days to reach the problem point in the emulation? “No!” I cry, “One thousand times no!” But suppose you had instructed the Primo CS emulator to save its state once every two hours, for example. In this case, once the folks at Siemens make this feature available, you will be able to take the saved state from the Primo CS prior to the problem and restore that state on the Strato CS. Brilliant!

And let’s not forget the Veloce proFPGA CS (2024), which offers 2x the gate capacity, 2x the performance, and 50x the debug throughput of its Veloce proFPGA (2021) predecessor. Jean-Marie informs me that this system offers the lowest cost of entry on the market, all the way from a single-FPGA desktop board to a multi-blade rack system.

Transforming hardware and software for software prototyping (Source: Siemens)

This is where the Veloce operating system for prototyping (VPS) software comes into play to accelerate bring up. VPS features efficient compilation without requiring any modifications to your RTL. It also boasts automated multi-FPGA partitioning, timing-driven performance optimization, and sophisticated at-speed debug.

And one more thing lest I forget, all these new Veloce platforms—Strato CS, Primo CS, and proFOGA CS—support multiple users working on heterogeneous designs simultaneously. For example, if you have a 40B gate Strato CS system, then two groups can be working on different 20B gate designs (or a 30B gate and a 10B gate design, etc.), four groups can be working on different 10B designs, and… you see what I mean.

I’m thinking of the simple desktop hardware emulator the company I worked for designed back in the mid-1980s. If I could get my time machine working, travel back, and tell my friends what the emulation future held, I know exactly what they would say: “What happened to you? Where did all your hair go? How did you get to be so old? You only popped out to get a sandwich!”

Hmmm. Let’s leave my erstwhile friends having all the fun that was to be found in the 1980s and return to the present. What do you think of everything you’ve read here?

Leave a Reply

featured blogs
Nov 12, 2024
The release of Matter 1.4 brings feature updates like long idle time, Matter-certified HRAP devices, improved ecosystem support, and new Matter device types....
Nov 13, 2024
Implementing the classic 'hand coming out of bowl' when you can see there's no one under the table is very tempting'¦...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

Introducing the TCKE9 eFuse: Advanced Circuit Protection for Modern Electronics
Sponsored by Mouser Electronics and Toshiba
eFuse ICs provide better protection performance than conventional mechanical fuses. In this episode of Chalk Talk, Amelia Dalton and Talayeh Saderi from Toshiba chat about the what, where, and how of eFuse technology. They also investigate the benefits that Toshiba’s TCKE9 eFuses bring to server power management and how you can get started using a TCKE9 eFuse in your next design. 
Oct 29, 2024
21,130 views