feature article
Subscribe Now

Edge AI On The Cheap and Deep

Startup Deep Vision Emphasizes Programmability, Efficiency, Cost

There’s an old salesman’s adage that “confused customers never buy.” That’s why glossy sales brochures don’t have a lot of technical information, and why car salesmen don’t delve too deeply into features and benefits. Too much information can lead to analysis paralysis, and, while that might be fun for engineers, it’s bad for business. 

There’s a separate but related effect in engineering. A new technology might be interesting and impressive, but if you don’t immediately grasp how to use it, it won’t catch on. Sometimes the biggest hurdle to adoption is the learning curve. 

We saw this with the rise of DSPs (digital signal processors) in the 1990s: They were perfect for a range of applications, but few developers knew how to program one or where it was supposed to fit in their hardware block diagram. Consequently, DSP uptake was slow. The exception was at Texas Instruments, which spent a lot of corporate resources on software tools, training courses, and front-line technical support. DSP newcomers gravitated to TI, and the company converted a lot of early adopters into big customers. 

A similar strategy is playing out with AI and machine learning. We’re told that it’s the Next Big Thing, but few of us understand what it is, how it works, or where it fits in the block diagram. And confused engineers don’t design-in. 

One little company that hopes to knock down the Wall of Confusion is Deep Vision, a California startup that makes — and sells! — low-cost chips for AI-at-the-edge applications. At first blush, it’s a fabless chip company, but their real expertise is in the software tools. Deep Vision emphasizes accessibility and ease of use as much as ML performance and power efficiency. 

“Most customers don’t care what’s inside the chip,” says VP of Business Development Markus Levy. “It’s more about what you can do with it.” Indeed, most customers wouldn’t understand what’s inside since it’s probably their first AI/ML-related project. For the record, Deep Vision describes its ARA-1 chip as having a “polymorphic dataflow architecture” with a “neural-ISA core.” So now you know. 

Apart from its attitude toward tools, the company also takes a different approach to multitasking and context switching. Edge applications, they say, often continually switch between distinct ML models. For every video frame, a face-recognition app may start out searching for faces in a complex scene, then switch to identifying facial landmarks (eyes, noses, etc.), then switch again to determine an individual’s gaze, mood, drowsiness, or other characteristics. These all require different models, and switching models takes a lot of time on typical AI accelerators. 

Big datacenter applications don’t have this problem because they dedicate entire CPUs or GPUs to each model (or instances in cloud parlance). But an edge device doesn’t have that luxury. It must switch from one model to the other, while also keeping an eye on power, memory requirements, and cost. Deep Vision’s ARA-1 chip is designed to excel in those applications with its zero-overhead task switching capability.

The chip has eight identical 8/16-bit integer cores, called DLPs (deep learning processors). Each core has its own L1 cache, and they share a large 4MB L2. There’s also a control processor and a task manager, which manage the host communication and resource allocation. A 32-bit LPDDR4 interface handles external DRAM for models too large to fit in the on-chip memory. PCIe and USB interfaces provide interfaces to the host processor for passing commands and input data for the models (i.e., video frames). 

Like most AI/ML processors, ARA-1 is designed to be used in tandem with a conventional host processor (think ARM, x86, or RISC-V) running Linux. It’s a coprocessor or accelerator, designed to offload complex ML tasks from a CPU that’s not really designed for such workloads. (And that probably has enough to do already.) The two processors communicate over USB or PCIe, your choice. 

As an example, Deep Vision’s chip can be partnered with an i.MX 8M Nano host, a $10 part from NXP with multiple ARM Cortex-A53 cores and a gaggle of peripherals. A camera might feed data to the processor for preprocessing and then offloads the heavy lifting to a $15–$25 ARA-1. Once the offload is done, the NXP device can go back about its business. Deep Vision touts the fact that its processor requires less babysitting than other edge-AI parts, leaving more host CPU cycles free for other tasks. 

On the software side, Deep Vision’s toolchain accepts ML models in all the usual formats: ONNX, PyTorch, TensorFlow, Caffe2, and MXNet. The compiler’s output can go straight onto the ARA-1 chip or to the company’s bit-accurate simulator, profiler, and power optimizer. 

ARA-1’s internal microarchitecture is completely software programmable, and Deep Vision can extend its compiler to add new operators to support new models and/or satisfy customer requirements. That helps future-proof ARA-1 and its siblings as the family tree grows. 

There is definitely an ARA-2 coming, says Deep Vision’s Levy. It’ll have more on-chip memory, significant enhancements to the DLP cores, additional compute functions, and much higher on- and off-chip bandwidth, while retaining the current chip’s basic architecture and DDR, PCIe, and USB interfaces. Software will transfer from one to the other. 

Basic von Neumann processors were scary and unusual at some point. DSPs and GPUs and FPGAs were weird and unfamiliar, too. Now we’re all riding the ML accelerator wave while trying to maintain our balance. A low-cost, power-miserly chip with a friendly toolchain seems like a good place to start.

Leave a Reply

featured blogs
Nov 15, 2024
Explore the benefits of Delta DFU (device firmware update), its impact on firmware update efficiency, and results from real ota updates in IoT devices....
Nov 13, 2024
Implementing the classic 'hand coming out of bowl' when you can see there's no one under the table is very tempting'¦...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

ROHM’s 3rd Gen 650V IGBT for a Wide range of Applications: RGW and RGWS Series
In this episode of Chalk Talk, Amelia Dalton and Heath Ogurisu from ROHM Semiconductor investigate the benefits of ROHM Semiconductor’s RGW and RGWS Series of IGBTs. They explore how the soft switching of these hybrid IGBTs contribute to energy savings and power generation efficiency and why these IGBTs provide a well-balanced solution for switching and cost.
Jun 5, 2024
33,754 views