Have you ever heard people talking about a new star of comedy, music, stage, or screen, describing them as “an overnight success”? If you investigate, you often discover that this “overnight success” followed a decade or more of hard work in the metaphorical trenches.
This reminds me of a cartoon I saw once talking about “the ‘peasant look’ that only the aristocracy can afford,” but I’m going to try to not wander off into the weeds…
Having said that, one more thing I do want to say is that it seems to me I often hear nothing of a new technological innovation until—suddenly—said innovation starts to pop up all over the place. Just a couple of week’s ago, for example, I was waffling on about what was, to me, a revolutionary new analog technique to perform the matrix multiplications found at the core of today’s artificial intelligence (AL), machine learning (ML), and deep learning (DL) algorithms (see Meet Mythic AI’s Soon-to-be-Legendary Analog AI).
Well, I was just chatting with Mark Reiten, who is vice president of the license division at Silicon Storage Technology (SST), which is itself a subsidiary of Microchip Technology. You can only imagine my surprise and delight to discover that SST’s SuperFlash memBrain technology, which only recently leapt onto the center stage with a concordance of contrabass saxophones, has been in development since 2015, which is practically the Dark Ages when you think about how fast things are moving these days.
SuperFlash memBrain is a neuromorphic multi-level non-volatile memory (NVM) solution that offers an in-memory-computing architecture for AI, ML, and DL applications using SST’s standard SuperFlash memory cell, which is already in production in many semiconductor foundries.
What do we mean when we use the term “neuromorphic”? Well, according to Wikipedia: “Neuromorphic engineering, also known as neuromorphic computing, is the use of very-large-scale integration (VLSI) systems containing electronic analog circuits to mimic neuro-biological architectures present in the nervous system. A neuromorphic computer/chip is any device that uses physical artificial neurons (made from silicon) to perform computations.”
In order to set the scene, let’s remind ourselves that our brains are formed from cells called neurons that communicate with each other by means of synapses. Let’s also remind ourselves that a typical Flash memory cell is based on a floating gate transistor, which can be used to represent digital 0 or 1 values. By comparison, a SuperFlash memBrain memory cell can be used to represent 256 different values, thereby allowing each cell to represent the “weight” (coefficient) associated with a “synapse” in an artificial neural network (ANN).
The important thing to remember here is that AI / ML / DL applications employ a humongous amount of matrix multiplication operations, which themselves employ multiply-accumulate (MAC) operations. Consider a really simple example comprising just four of these cells presented as a 2×2 array.
Simple MAC operation with memBrain (Image source: SST)
We can think of a SuperFlash memBrain memory cell as being a programmable resistance. From Ohm’s law, we know V = IR, so I (the current flowing through the cell) = V/R. The reciprocal of resistance (R) is conductance (G); that is, G = 1/R. This means I = V * G (our multiplication). Furthermore, the currents from each cell in the same column are additive (our accumulation). Thus, in the same way that each of these cells may be considered to perform the function of a synapse, each column can be considered to perform the function of a neuron.
The next step up the hierarchy is a memBrain tile, which is a large array of the aforementioned cells accompanied by appropriate signal conditioning on the inputs and outputs.
Meet a memBrain tile (Image source: SST)
Observe that the inputs and outputs to this array are very wide compared to standard digital embedded Flash memories. So, weights are first stored in the memBrain array as conductance values on the floating gates. The SuperFlash memBrain cells both store the weights and perform the multiply operation when stimulated by an input voltage, the resulting currents are additive, and millions of MAC operations can be performed simultaneously. The key thing to note here is that the performance and power consumption per silicon area are orders of magnitude better than optimized digital solutions.
The reason this SST implementation is so exciting is that the creators of System-on-Chip (SoC) devices can use SuperFlash memBrain (IP) in their designs. For example, according to a press release issued by Microchip earlier this year, a company called WITINMEM has incorporated their own custom incarnation of SuperFlash memBrain technology into an ultra-low-power SoC that takes full advantage of this computing-in-memory technology for ANN processing, including speech recognition, voice-print recognition, deep speech noise reduction, scene detection, and health status monitoring.
UPDATE: Mark Reiten, vice president of the license division at Silicon Storage Technology (SST), provided some clarification as it relates to customer WITINMEM as follows: “WITINMEM purchased a design license from SST and designed their own compute-in-memory block based on their architecture, which is similar to our memBrain design ware, but which has many of their own innovations. SST provided the embedded Flash cell and characterization data, but the design ware, system architecture, and related machine learning software tools were created by WITINMEM. WITINMEM is the first company worldwide to deploy a compute-in-memory product using this type of highly reliable and stable technology for audio processing, and we are excited to have such a sophisticated customer and team developing ground-breaking products based on this new and exciting compute paradigm.”
I date from the days of gate array ASICs. At that time, we dreamed of having anything more sophisticated than an AND gate or a D-type flip-flop to work with. All I can say is that the ability to incorporate this sort of analog matrix multiplication IP for use in AL / ML / DL applications would have left us shaking our heads in disbelief.
I’m going to include the press release I mentioned below. All that remains is for me to remind you that I welcome your comments and questions, and I look forward to hearing your thoughts on all of this.
Computing-in-Memory Innovator Solves Speech Processing Challenges at the Edge Using Microchip’s Analog Embedded SuperFlash® Technology
SuperFlash memBrain™ memory solution enables WITINMEN’s System on Chip (SoC) to meet the most demanding neural processing cost, power, and performance requirements.
CHANDLER, Ariz., Feb. 28, 2022 — Computing-in-memory technology is poised to eliminate the massive data communications bottlenecks otherwise associated with performing artificial intelligence (AI) speech processing at the network’s edge but requires an embedded memory solution that simultaneously performs neural network computation and stores weights. Microchip Technology Inc. (Nasdaq: MCHP), via its Silicon Storage Technology (SST) subsidiary, today announced that its SuperFlash memBrain neuromorphic memory solution has solved this problem for the WITINMEM neural processing SoC, the first in volume production that enables sub-mA systems to reduce speech noise and recognize hundreds of command words, in real time and immediately after power-up.
Microchip has worked with WITINMEM to incorporate Microchip’s memBrain analog in-memory computing solution, based on SuperFlash technology, into WITINMEM’s ultra-low-power SoC. The SoC features computing-in-memory technology for neural networks processing including speech recognition, voice-print recognition, deep speech noise reduction, scene detection, and health status monitoring. WITINMEM, in turn, is working with multiple customers to bring products to market during 2022 based on this SoC.
“WITINMEM is breaking new ground with Microchip’s memBrain solution for addressing the compute-intensive requirements of real-time AI speech at the network edge based on advanced neural network models,” said Shaodi Wang, CEO of WITINMEM. “We were the first to develop a computing-in-memory chip for audio in 2019, and now we have achieved another milestone with volume production of this technology in our ultra-low-power neural processing SoC that streamlines and improves speech processing performance in intelligent voice and health products.”
“We are excited to have WITINMEM as our lead customer and applaud the company for entering the expanding AI edge processing market with a superior product using our technology,” said Mark Reiten, vice president of the license division at SST. “The WITINMEM SoC showcases the value of using memBrain technology to create a single-chip solution based on a computing-in-memory neural processor that eliminates the problems of traditional processors that use digital DSP and SRAM/DRAM-based approaches for storing and executing machine learning models.”
Microchip’s memBrain neuromorphic memory product is optimized to perform vector matrix multiplication (VMM) for neural networks. It enables processors used in battery-powered and deeply-embedded edge devices to deliver the highest possible AI inference performance per watt. This is accomplished by both storing the neural model weights as values in the memory array and using the memory array as the neural compute element. The result is 10 to 20 times lower power consumption than alternative approaches along with lower overall processor Bill of Materials (BOM) costs because external DRAM and NOR are not required.
Permanently storing neural models inside the memBrain solution’s processing element also supports instant-on functionality for real-time neural network processing. WITINMEM has leveraged SuperFlash technology’s floating gate cells’ nonvolatility to power down its computing-in-memory macros during the idle state to further reduce leakage power in demanding IoT use cases.