CEVA has introduced a new vision platform, which they’re calling the CEVA-XM4. We’ve looked at their prior platform, the MM3101, before; you could consider this the next stage. Almost literally.
CEVA describes vision processing as resembling a 3-stage pipeline. First come your basic vision processing steps to generate clean 3D data, which creates left and right images and a depth map. The next step is what’s typically called computational photography: using sophisticated algorithms to create higher resolution and other quality improvements than a given camera is capable of generating on its own.
Both of these were covered in the prior vision processor; the XM4 further enables the third stage, what they call “visual perception.” This means object identification and tracking, for instance, as well as algorithms for augmented reality and so-called natural user interfaces (NUI – “natural” being something of a dodgy concept, like “intuitive”). Depending on the application, all three stages can be implemented in a single XM4 core; if more juice is needed, then multiple cores can be instantiated.
(Image courtesy CEVA)
From a camera standpoint, part of the idea here is that higher-level processing tends to be done in the cloud, which involves huge transfers of data from camera to cloud. Part of the intent of the XM4 is to beef up the camera so that much of that heavy lifting is done first in the camera, abstracting all that raw data and moving less up to the cloud.
But the XM4 isn’t just about still cameras; it’s also about automotive vision as well as incorporating vision into the IoT – video cameras and such whose purpose it is to identify specific artifacts to enable some kind of action to be taken. It could be a security camera or simply a home video camera that’s “always watching,” but films only when your kid is in the frame. (Which means it’s actually filming and processing, but then discarding if it doesn’t identify your child.)
(Image courtesy CEVA)
To some extent, this is just a beefy DSP. But there are a couple important steps they’ve taken for targeting vision. First is simply optimizing the instruction set. The second is to optimize how memory is managed. They illustrated a couple of examples.
In one case, they have built in the ability to perform scatter and gather in a single clock cycle. Most vector algorithms require that the memory to be processed be tidily arranged in adjacent cells; if the required cells are spread all over the place, then either you need to copy them to a scratchpad area to work on them, copying them back later, or you can’t vectorize the algorithm.
With a scatter-gather capability, they can handle this quickly, allowing vectorization of algorithms that would likely otherwise remain serial.
The other is what I think of as a windowing capability; they call it “2D processing.” Many vision algorithms involve a sliding window, with significant overlap between what’s contained in the window in one position and what’s contained after the window shifts one notch. They enable efficient reuse of the overlapping areas memory rather than requiring copies to scratch memory.
These capabilities largely come through pre-optimized library components; the designer then doesn’t have to think through the details of how they work; it’s already done for them (similarly to the SmartFrame feature we described in the past).
While these low-level processors can involve low-level programming, their Android Multimedia Platform allows programming at the Android level, with the framework connecting via the CPU to the vision processor.
You can learn more in their announcement.