Say what? When it Comes to Voice Control, The Future is Now!

When I was a little lad about six years old growing up in England circa 1963, my mom and dad were both working, so I used to spend the halcyon days of summer up the road at my Auntie Barbara’s house hanging out with my cousin Gillian (who was one year younger than me) and the other kids on the street.

You have to remember that our homes were small by American standards (“three-up, three-down, semi-detached,” which they call a duplex in the USA), with correspondingly sized furniture and appliances. For example, the typical fridge, which fit under a kitchen countertop, could store only around one day’s worth of food along with a small ice box capable of holding six cubes of ice (reserved for special occasions), a pair of popsicles, and a pack of frozen peas.

Not that we actually had any frozen peas, you understand. I was about to make a joke (I didn’t say it was a good joke) that we dreamed of frozen peas but — if the truth be told — the idea had simply never struck me. At that time, the concept of frozen peas would have been as Jetson’s-esque to me as the idea of televisions presenting programs in color. On those days when peas were on the suppertime menu, Gillian and I would spend the afternoon sitting on the wooden bench in the back garden toiling away to remove the little scamps from their pods. As I recall, an approximately 1,000-pound sack of pods yielded only four small portions of peas, but I’m sure the exercise was good for us.

At some stage in the day, my aunt would call Gillian and yours truly into the house, hoist us up onto the kitchen counter, and scrub our hands, faces, elbows, and knees (all the kids wore shorts). To this day, I proudly sport the cleanest pair of knees in the known universe. Once we were considered presentable, we would set off on a trek to the small group of shops (or group of small shops) that served our local community.

In addition to the newsagent, chemist (pharmacist), and post office, we had a florist, a baker, a greengrocer, a butcher, an ironmonger (hardware store), a fishmonger (where, I presume, they monged fish), and a small general store.

Whilst on our travels, we would invariably meet other old people walking in the opposite direction, at which point my aunt would unvaryingly stop to talk. The way I recall these occasions is of droning conversations taking place way above my head reminiscent of Charlie Brown’s Teacher Speaking.

There are several things I remember regarding those far-off encounters. The first is that older people seemed to be really, really old in those days of yore, and also sort of washed-out color-wise. The waistband on the men’s trousers rose with the passing of the years until they rode high under their armpits. Meanwhile, the ladies wore floral dresses whose patterns could easily be mistaken for garish wallpaper, over which were draped heavy woolen coats that almost reached the floor. The really old ladies (50 and above) wore shapeless hats that were held in place by 10-inch steel hatpins. I was tremendously impressed by the fact that my grandmother never seemed to feel any pain — it was years before I realized that she inserted the hatpin only through her hair and not through her head.

The other thing I remember — from when I occasionally tuned into the adult’s conversation with the otiose hope of determining how much longer we were going to hang around — is the topics of their talks. In fact, there were only three main subjects: the weather (how good or bad it was at the time, how this compared to the previous 100 years, and predictions for the forthcoming decade), the health of themselves and everyone they had ever met (including recounting their myriad operations and comparing their numerous scars), and — the perennial favorite — selected aspects of theoretical physics (specifically, how time seemed to go faster the older you got).

As I travelled through time myself, taking the scenic route, with the days turning into weeks and the weeks turning into months and the months turning into years, the thought occasionally crossed my mind, “Can’t you think of something else to talk about, for goodness sake?”

Of course, time has taken its toll on your humble narrator. This year I will be celebrating the occasion of my 100th birthday (assuming we are counting in Base-8; see also this handy-dandy Base-8 to Base-10 Calculator). Thankfully, I don’t feel a day over 64. I do, however, feel that I owe my aunt and her compatriots an apology, because I’ve come to realize that time does indeed seem to pass faster as one grows older.

For example, the reason for my meandering musings about time here is that it seems like only a couple of weeks ago that I was waffling on about the activities of the folks at the DSP Group (see DSP Group Dives into the Hearables Market), but I just realized that was more than six months ago at the time of this writing. Give me strength! I’m too young for all this excitement!

Returning to the present with a sickening thud (I’ll un-muss my hair later), I just heard that the clever chaps and chapesses at the DSP Group have announced a new device targeting the smart voice market. The device in question is the DBM10, which is a teeny-tiny artificial intelligence (AI) / machine learning (ML) System-on-Chip (SoC) that boasts the tightly coupled combination of a digital signal processor (DSP) and the nNetLite neural network (NN) inference processor, both of which are optimized for low-power voice and sensor processing in battery operated devices.

Meet the teeny-tiny DBM10 (it’s the small square on the right)
(Image source: The DSP Group)

Of course, almost everyone building silicon chips these days likes to boast about their low power consumption, so what are we actually talking about here? Well, the typical goal for low-power inferencing is <= 1 mW, and the folks at the DSP Group brag that their nNetLite inference processor typically consumes only 400 to 500 µW, so I’m certainly impressed on that score.

In addition to its low-power DSP and ultra-low-power nNetLite inference processor, the DBM10 boasts a bevy of on-chip tightly coupled memories (TCMs), a wealth of microphone and audio connectivity (it supports 2x analog microphones, 4x digital microphones, and 3x time-division multiplexed (TDM) inputs), and a cornucopia of control and logic ports (it offers 2x SPI, 2x I2C, 2x UART, Timers, JTAG, and 17 general-purpose input-outputs (GPIO) interfaces).

Of particular interest to developers is that the DBM10’s AI/ML development flow supports all industry standard frameworks, such as Caffe2, Chainer, Keras, Microsoft Cognitive Toolkit, MXNet, PyTorch, and — of course — TensorFlow.

The DBM10’s AI/ML development flow (Image source: The DSP Group)

Once the developers have created their AI/ML model, they can export it to a standard format, such as Open Neural Network Exchange (ONNX), Neural Network Exchange Format (NNEF), and — you guessed it — TensorFlowLite.

The DSP Group’s nNetLite compiler accepts these standard formats as input and allows the developers to perform efficient and effective model size optimization and (patent pending) compression, thereby facilitating the porting of large (tens of megabytes) models without significant accuracy loss using quantization, post-training model pruning, and lossless entropy compression algorithms. The end result is that the nNetLite compiler allows the rapid optimization and deployment of any AI/ML model from any AI/ML framework to the DBM10.

The goal of the DBM10 is to allow developers to implement both short-range and far-field voice control in their products. Although it’s optimized for audio, the DBM10 is also applicable to sound event detection and general-purpose analysis of sensor data. A complete reference kit is provided, including multi-microphone voice-call algorithms and firmware with a certified Amazon wake word engine (WWE) and/or other WWEs.

The DBM10 is ideal for battery-operated devices such as smartphones, tablets, and remote controls, along with smart appliances and smart home devices such as thermostats. It’s also ideal for wearables and hearables such as true wireless stereo (TWS) headsets. This tiny device can enable AI/ML, voice, and sensor fusion functions that include voice trigger (VT), voice authentication (VA), voice command (VC), noise reduction (NR), acoustic echo cancellation (AEC), sound event detection (SED), proximity and gesture detection, sensor data processing, and equalization.

The original Amazon Echo was launched in 2014, which is only six years ago at the time of this writing. As I sit here in my office, I’m reflecting on how many times each day I use my office unit to set a reminder or check the status of a delivery. Meanwhile, at home, we have a number of these devices scattered around the house and we use them for a variety of tasks such as controlling the lights. I can no longer imagine walking into a room like our bedroom without being able to say “Alexa, turn the bedroom lights on” as I busily bustle along on my mission of the moment.

The AI powering these devices is getting better and better as time goes on, and it won’t be long before the vast majority of systems support voice control in addition to — in many cases, instead of — more traditional techniques. Of course, this relies on the availability of small, low-power, affordable devices that are easy for developers to deploy, and the DBM10 has check marks in all the appropriate boxes. What say you? Are you looking forward to a voice-controlled future with anticipation or trepidation?