Last summer, we published an article welcoming the Embedded Vision Alliance to the world. With the incredible processing power available today – particularly when you consider the massive acceleration possibilities with devices like FPGAs and GPUs – real embedded vision becomes a realistic possibility. Making that possibility into reality, however, is an enormous task, requiring collaboration from dozens of companies and academia.
When we talk about embedded vision, we’re not talking about just bolting a camera onto your embedded system. Devices that simply capture, reformat, record, or transmit video don’t count. In order to count for our purposes here, your system needs to actually understand what it’s seeing and do something useful with that information. When it comes to problem complexity, that’s a whole ‘nother kettle of fish.
Machine vision has been a research topic for decades. Hundreds of academic and research papers and projects have led to an enormous body of published work. Researchers have tackled everything from ambitious broad-based vision algorithms to focused topics like facial recognition, gesture recognition, and object identification. The fundamental software algorithms involved in machine vision are highly complex and in a state of constant flux and improvement. The catalog of software doing machine-vision-related tasks is enormous.
As much work as there has been in machine vision, however, practical application of the technology is in its infancy. There have been several obstacles to commercialization of embedded vision technology on a large scale. First, even though there is a huge amount of research and software out there in the world, there is no standard approach that works for the general case. The choice of algorithm depends heavily on the specific problem being solved, and the quality of results required for most real-word commercial applications can be achieved only through exhaustive specialized tuning. You can’t just drop a piece of embedded vision software into your system and have it immediately start recognizing that “the man is giving something to the woman.”
The second major obstacle has been compute power. Most of the research projects so far have relied on post-processing of captured video – at what amounts to a terrible frame rate – sometimes far below one frame per second of processing speed. In order to do real-time vision, academic algorithms would need at least two to three orders of magnitude more of compute acceleration. If you want that kind of compute power in an embedded system, your options dwindle considerably.
This is where technologies like FPGAs and GPUs come to the forefront. Vision algorithms (by their very nature) are the kind of compute problems that are highly parallelizable. Since the algorithms are currently in an extreme state of flux, and since there are dramatic differences in the approach one would take for different sub-areas of embedded vision, it is unlikely that we’ll see application-specific standard parts (ASSPs) addressing embedded vision in an off-the-shelf way any time soon.
In order to do a commercial application that includes embedded vision today, you’re likely to need an accelerator like an FPGA or GPU that would allow you to accelerate the portions of your chosen algorithm that are performance intensive. In general, FPGAs will be able to achieve better performance and throughput at a lower power consumption than GPUs, but GPUs will be generally easier to program. The FPGA industry has been working hard to close that programming gap, however, so FPGAs may well become the go-to technology in vision applications. The advent of new hybrid processor/FPGA devices like Xilinx’s Zynq or Altera’s upcoming SoC FPGAs promises even better things for vision applications as the programmable hardware fabric can be used to accelerate critical algorithms and get data onto and off of the chip efficiently, while the embedded processing system does the less performance-critical housekeeping and control tasks.
The Embedded Vision alliance held an editor and analyst event in parallel with the recent Design West conference in San Jose. As we described previously, the mission of the Embedded Vision Alliance is to bring together companies and organizations with an interest in the development and commercialization of vision technologies. At the moment, the list of member companies is dominated by semiconductor and other hardware companies whose products could be used in an embedded vision system – primarily to help solve the compute acceleration problem. However, a growing number of participants come from the software side as well.
Commercializing embedded vision will take a concerted, collaborative effort from all of these players. The problem being solved is possibly one of the most complex ever tackled in an embedded computing space. Sorting and sifting through the massive amounts of research and finding the relevant, valuable technologies to push forward is a monumental task in itself. Then, turning academic ideas that work well in the lab into things that are manufacturable and perform well enough for commercial applications is an additional challenge.
The launch of Microsoft’s Kinect caused an explosion in awareness and experimentation in embedded vision technology. Kinect demonstrated some impressive capabilities in a mass-produced, low-cost system that set new standards for the industry. Following on the heels of that, we are on the verge of a revolution in the automotive industry centered around embedded vision technology. Within a few years, vision-based (or partially vision-based) systems like lane departure, collision warning and avoidance, adaptive cruise control, road sign recognition, automatic parking, and driver alertness monitoring will find their way into mass-production automobiles.
In other areas, industrial automation has already widely adopted machine vision for localized tasks, but broader application of the technology promises to far exceed the capabilities of today’s limited applications. Quality inspection, robotic guidance, and factory automation are obvious applications.
The security and defense industries are looking to implement some of the most ambitious (and creepy) applications of embedded vision technology. With the proliferation of high-capability, low-cost cameras for observation, the bottleneck in the surveillance and security areas quickly becomes human observers to watch all those screens. Intelligent systems that understand the activities they are monitoring can have a dramatic impact on the capabilities of surveillance systems. Computers are never looking at the wrong monitor, falling asleep, or munching on potato chips when the relevant action starts on one camera.
Despite the huge advances in machine vision technology, we’re just at the beginning. Over the next few years, look for an explosion of commercial applications and technology opportunities in the vision space. We expect devices like FPGAs and GPUs to be at the center – enabling much of this new technology. For those of us in the design community, the opportunities to participate and compete with innovative ideas and products should be rich.