editor's blog
Subscribe Now

Neural Networks are Finding a Place at the Adult’s Table

 

The deep learning revolution is the most interesting thing happening in the electronics industry today, said Chris Rowen during his keynote speech at the Electronic Design Process Symposium (EDPS), held last month at the Milpitas headquarters of SEMI, the industry association for the electronics supply chain. “The hype can hardly be understated,” continued Rowen. Search “deep learning” on Google and you’ll already get more than three billion hits. (Well, I got 20M for “deep learning” and 451M for “artificial intelligence,” but still, that’s a lot.) “There are 12,000 startups worldwide listed in Crunchbase,” he added. (I got 1497, again for “deep learing,” but still…) According to Rowen, 16,500 papers on deep learning and AI were published on arxiv.org in the past 12 months.

In other words, AI is hot (in case you’ve been living in a cave or an underground bomb shelter for the past few years).

Rowen is CEO of BabbleLabs, formerly BabbLabs, but the missing “e” turned out to confuse people who found they couldn’t pronounce it. BabbleLabs is a deep-learning startup. It’s devoted to applying deep learning and DNNs (deep neural networks) to speech processing.

Deep learning is a “mathematical layer cake model for learning,” explained Rowen. (I suspect he was referring to the various layers, hidden and otherwise, in the DNN model.) You take a large number of inputs and put them through a hidden system to get a desired output after a period of training. This model is very general and works for almost any kind of data, but you must have a way of gathering all of the required training data.

Currently, the biggest application for DNNs is, by far, vision systems. Training for these systems is enormously complex and running these systems consumes a lot of compute cycles. DNN-based vision systems gobble up TOPS (tera operations per second) like kids snack on candy corn during Halloween.

The fundamental question, said Rowen, is “Where do the smarts go?” In other words, where’s the best place to execute all of those tera-ops for vision systems? Is the best place close to the camera? That will give you low latency and will not overburden the network with traffic, but will degrade the ability to aggregate data from multiple cameras.

Is the best place to execute all of the tera-ops in some sort of aggregation location? At the cloud edge? In the cloud?

There’s no single answer. (That would be too easy, wouldn’t it?)

There are many critical tradeoffs to consider:

If you want to maximize system responsiveness, you make the processing local. That’s sort of obvious. You don’t want an autonomous car’s collision-avoidance DNN to be located in the cloud where a network dropout could cause a multi-car pileup; you want the processing in the car.

If you need global analysis of data from multiple cameras, such as in a surveillance system, then you want the processing in the cloud.

If you’re concerned about privacy, you don’t want raw video traversing the network. You want the processing to be local.

If you want to minimize cost, you’ll need to constrain the DNN and keep the processing local. Cloud computing is very flexible but it’s a pay-as-you-go system and the operating costs increase monotonically.

At this point, Rowen segued to the work of BabbleLabs. “Voice is vision,” he declared. “It’s the most human interface because there are five billion users (including those people listening to radio).

But there’s another aspect to AI-enhanced voice processing and recognition that indeed makes it a lot like video. “Voice recognition is essentially image recognition performed on spectrograms,” said Rowen.

Now there’s an intriguing idea.

Look at a spectrogram that plots frequency over time. It’s a 2D image, and just like any image, you can train a DNN to recognize traits buried in the spectrogram. Rowen demonstrated a BabbleLabs speech enhancer, which uses AI enhancements to strip road and wind noise from words spoken alongside a busy street in Montevideo, Uruguay. It works surprisingly well.

See for yourself (and watch to the end before making a hasty judgement):

 

The training wheels are coming off.

 

Leave a Reply

featured blogs
Dec 19, 2024
Explore Concurrent Multiprotocol and examine the distinctions between CMP single channel, CMP with concurrent listening, and CMP with BLE Dynamic Multiprotocol....
Dec 20, 2024
Do you think the proton is formed from three quarks? Think again. It may be made from five, two of which are heavier than the proton itself!...

Libby's Lab

Libby's Lab - Scopes Out Silicon Labs EFRxG22 Development Tools

Sponsored by Mouser Electronics and Silicon Labs

Join Libby in this episode of “Libby’s Lab” as she explores the Silicon Labs EFR32xG22 Development Tools, available at Mouser.com! These versatile tools are perfect for engineers developing wireless applications with Bluetooth®, Zigbee®, or proprietary protocols. Designed for energy efficiency and ease of use, the starter kit simplifies development for IoT, smart home, and industrial devices. From low-power IoT projects to fitness trackers and medical devices, these tools offer multi-protocol support, reliable performance, and hassle-free setup. Watch as Libby and Demo dive into how these tools can bring wireless projects to life. Keep your circuits charged and your ideas sparking!

Click here for more information about Silicon Labs xG22 Development Tools

featured chalk talk

Shift Left Block/Chip Design with Calibre
In this episode of Chalk Talk, Amelia Dalton and David Abercrombie from Siemens EDA explore the multitude of benefits that shifting left with Calibre can bring to chip and block design. They investigate how Calibre can impact DRC verification, early design error debug, and optimize the configuration and management of multiple jobs for run time improvement.
Jun 18, 2024
46,571 views