feature article
Subscribe Now

There’s Exciting News on the Multi-Modal AI SoC Front

As is often the case, I’m amazed by how so many things seem to be interrelated and interconnected. I’m sorry… I feel an aside coming on… I cannot help myself… just saying “interconnected” reminds me of the book “Dirk Gently’s Holistic Detective Agency,” which was written by the late great Douglas Adams.

As you may recall, Dirk is an unconventional detective who believes in the “fundamental interconnectedness of all things.” The idea is that everything in the universe is interrelated, meaning that even seemingly random events or trivial details can have a meaningful connection. Dirk employs this approach to solve cases by embracing bizarre coincidences, odd insights, and intuition, which leads him to surprising and often cosmic truths that would otherwise seem unrelated.

On the off chance you were wondering, there have been a couple of TV interpretations that are loosely based (or not) on the original books, which were Dirk Gently’s Holistic Detective Agency and The Long Dark Tea-Time of the Soul. The 2010-2012 TV series originated in Britain and starred Stephen Mangan as holistic detective Dirk Gently and Darren Boyd as his sidekick Richard MacDuff. By comparison, the 2016-2017 TV series (well, two series, really) originated in the United States and starred Samuel Barnett as Dirk and Elijah Wood as his reluctant sidekick Todd.

But we digress…

The first thing that triggered my meandering musings on the interconnectedness of things is that, in a recent column, Arrggghhh! Now I Want an NI mioDAQ! (Ignore the ‘!’), I made mention of the fact that oscilloscopes back in the day were big, clunky, and horrendously expensive.

Well, I just read Steve Leibson’s column: The Rise and Fall of Heathkit – Part 1: Early Days. All I can say is that it’s fascinating to hear how the Heath company evolved into the form we used to know and love when I was coming of age. Steve’s column is based on his interview with Chas Gilmore, who joined the Heath Company in 1966 as a design engineer. Chas explained how it was that the first kit from Heath was an oscilloscope called the O-1 that sold for only around $39.50 circa 1947. As Chas says, “… an oscilloscope at that stage of the game was one expensive instrument, and you know, $39.50? You’ve got to be kidding me. I mean, that must have been a tenth to a hundredth the cost of most oscilloscopes at that stage of the game.”

The second thing that caused my cogitations and ruminations on the interconnectedness of things involved a trio, triad, or troika, if you will, in the form of a column, a case study, and a press release. Let’s take these one at a time: 

The Column: I recently realized that, although anyone involved in the design of large digital silicon chips is familiar with the term Network-on-Chip (NoC), relatively few people are cognizant of the underlying concepts, which caused me to write a column for the Ojo-Yoshida Report titled Welcome to the Wonderful World of NoCs.

I ended that column by introducing a new NoC-based soft tiling capability that was recently launched by the folks at Arteris IP. This is of particular interest for people designing system-on-chip (SoC) devices targeted at artificial intelligence (AI) and machine learning (ML) applications.

The idea is that these SoCs often involve 2D arrays of processor clusters (where each cluster contains multiple processor cores) as part of the main SoC. These processor clusters will be connected using a coherent NoC. Also, any AI or ML blocks like neural processing units (NPUs) may involve 2D arrays of processing elements (PEs). These PEs will be connected by a non-coherent NoC.

Let’s use the term processing units (PUs) to embrace both processor clusters and PEs. The traditional way of implementing an array of PUs is to create the initial PU by hand, then to replicate (think “cut-and-paste”) this PU into an array of PUs, then to generate the NoC, then to hand-configure the network interface units (NIUs) associated with the PUs (each PU has an NIU, and each NIU requires a unique ID/address so that the packets of data flying around the NoC know where they are coming from and where they are going to).

All this hand configuring is resource-intensive, prone to error, and frustrating, especially if—just when you’ve finished—the boss says something like, “we’ve decided to make a small modification to the original PU” (to which one might be forgiven for responding “Arrggghhh!”).

The idea behind NoC-based soft tiling is that, after creating the original PU, you simply tell the NoC tools the required X-Y dimensions for your array, at which point it auto-replicates the PUs, auto-generates the NoC (either coherent or non-coherent, as required), and auto-configures the NIUs, all in a matter of seconds or minutes.

The Case Study: There’s a very interesting SiMa.ai Case Study on the Arteris website. This describes how—way back in the mists of time we used to call 2022—the folks at SiMa.ai developed and released the world’s first software-centric, purpose-built machine learning system-on-chip (MLSoC) platform that delivered an astounding 10X better performance per watt than its nearest competitive solution.

To be honest, I was so enthused by the contents of this case study that (and I know you are going to be surprised when you hear this) I wrote my How to Build a Multi-Billion-Transistor SoC column about it.

The point here is that, in order to create their MLSoC, the guys and gals at SiMa.ai used NoC technology provided by the chaps and chapesses at Arteris. In particular, the case study ended with a quote that caught my eye: “We’ve already started work on our next-generation device, and—with respect to the NoC—we didn’t even think of looking elsewhere because FlexNoC from Arteris was an automatic and obvious choice!” — Srivi Dhruvanarayan, VP of Hardware Engineering, SiMa.ai

The Press Release: All the above leads us to a recent press release: SiMa.ai Expands ONE Platform for Edge AI with MLSoC Modalix, a New Product Family for Generative AI. 

This press release informs us that industry’s first multi-modal edge AI product family, SiMa.ai’s MLSoC Modalix, supports CNNs, Transformers, LLMs, LMMs, and Generative AI (GenAI) at the edge and delivers industry leading performance—more than 10X the performance per watt of alternatives.

Also, we are informed that: “SiMa.ai MLSoC Modalix is the second generation of the successful, commercially deployed first generation MLSoC. MLSoC Modalix is offered in 25 (Modalix 25 or “M25”), 50 (Modalix 50 or “M50”), 100 (Modalix 100 or “M100”) and 200 (Modalix 200 or “M200”) TOPS configurations, in multiple form factors, and is purpose-built to provide effortless deployment of Generative AI for the embedded edge ML market. Fully software compatible with first generation MLSoC, the MLSoC Modalix product family was designed to enable the capability to run DNNs, as well as advanced Transformer models, including LLMs, LMMs and Generative AI. Samples of MLSoC Modalix will be available to customers in Q4 of 2024.”

Meet the MLSoC Modalix family (Source: SiMa.ai)

When we visit the MLSoC Modalix page on the SiMa.ai website, we discover that this truly is, as they say, “A Complete System-on-Chip.” In addition to a “super-secret sauce” machine learning accelerator, this device boasts (nay, flaunts) a cornucopia of high- and low-speed I/O subsystems to interface with external devices and sensors; multimedia processing with video encode, decode, and a programmable DSP; boot security, system management, and debugging; huge amounts of on-chip memory along with access to humongous amounts of off-chip memory; an Arm A65 x 8 application processor and an image signal processor; and a network-on-chip and TrustZone security extensions.

The MLSoC Modalix is a complete system-on-chip (Source: SiMa.ai)

Now I’m wondering if the ML accelerator in this device is implemented as an array of processing elements connected by a mesh NoC. If so, I bet its creators are looking at the new Arteris soft tiling technology with awe and desire (perhaps accompanied by some gnashing of teeth and rending of garb), wishing it had been available when they were working on their Modalix devices. Oh well, perhaps they will avail themselves of this technology on their next-generation designs.

In Conclusion

The aforementioned press release made note of the fact that the rise of generative AI is changing the way humans and machines work together. Also, that “The next wave of the AI technology revolution will advance multi-modal machines with the ability to understand and process multiple forms of inputs across text, image, audio and visual. This shift will ripple across every industry, from agriculture and logistics, to medicine, defense, transportation and more.”

I totally agree. I’m also blown away by how fast the folks at SiMa.ai are moving. And, as usual, I’m left wanting to know more. On what technology node are these devices implemented? How many transistors are in an M200? What will the world look like in 10-, 20-, 50-, and 100-years’ time? And—most importantly, how much will a bacon sandwich cost me in 2050? How about you? Do you have any thoughts you’d care to share on any of this?

Leave a Reply

featured blogs
Nov 15, 2024
Explore the benefits of Delta DFU (device firmware update), its impact on firmware update efficiency, and results from real ota updates in IoT devices....
Nov 13, 2024
Implementing the classic 'hand coming out of bowl' when you can see there's no one under the table is very tempting'¦...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

ROHM’s 3rd Gen 650V IGBT for a Wide range of Applications: RGW and RGWS Series
In this episode of Chalk Talk, Amelia Dalton and Heath Ogurisu from ROHM Semiconductor investigate the benefits of ROHM Semiconductor’s RGW and RGWS Series of IGBTs. They explore how the soft switching of these hybrid IGBTs contribute to energy savings and power generation efficiency and why these IGBTs provide a well-balanced solution for switching and cost.
Jun 5, 2024
33,754 views