Augmenting an Existing 2D CMOS Sensor to Provide 3D Depth Perception

The term “machine vision” refers to the technologies and processes by which machines extract, analyze, and interpret visual information from the physical world using sensors and computational algorithms. It enables machines to “see,” make decisions, and perform actions based on their visual inputs.

While 2D machine vision is great—I won’t hear a bad word said about 2D—it must be acknowledged that everything is better in 3D. Being equipped with 3D depth perception provides machine vision systems with more complete data for their AI applications. This enables robots, automobiles, and other systems equipped with machine vision to enjoy richer, more immersive, and more rewarding interactions between people, other devices, and their environment.

And it’s not just big systems like robots and automobiles. I can see a day coming (no pun intended) when machine vision is ubiquitous, with everything from electric teapots to electric toasters being able to observe the world around them and respond to visual cues.

As an aside (you knew one was coming), I remember when the height of machine vision came in the form of simple line-following robots. This is still a lot of fun for beginners to electronics and robotics today. The idea is to create a basic autonomous robot that can follow a pre-defined path in the form of a dark line on a white surface, or vice versa.

All this really requires is a chassis, a couple of geared DC motors and wheels, two light-dependent resistors (LDRs) or two infrared (IR) sensors with accompanying IR sources, a cheap-and-cheerful microcontroller, a motor driver module (like an L298N or L293D) to handle the power requirements of the motors, and a power supply in the form of batteries. I just found a couple of related videos on YouTube:

I don’t know why I do this to myself. Now I want to create my own super-fast line-following robot using a proportional, integral, derivative (PID) controller running on an Arduino, but we digress…

There are two things about implementing 3D depth perception that spring to most people’s minds. First, it isn’t easy. Second, it isn’t cheap, both in terms of the hardware (sensors and processor) that’s required and the software (the amount of computation that needs to be performed).

One option is to use two CMOS sensors (with accompanying lens assemblies) coupled with a humongous amount of computation to provide 3D depth perception via true binocular vision.

An alternative is to use a single CMOS sensor coupled with a humongous-squared amount of computation to provide 3D depth perception via monocular vision. I’m thinking of one-eyed humans—like professional baseball player Charles William “Whammy” Douglas—who use monocular depth cues to provide depth perception. These cues include:

Size (Static): If we know the typical size of an object, like a baseball, then we can couple that with the fact that an object appears smaller when it’s farther away and larger when it’s closer.
Size (Dynamic): If something like a baseball appears to be getting bigger and Bigger and BIGGER, then it may be time to DUCK!
Perspective: Converging parallel lines indicate distance (e.g., a road stretching into the horizon).
Occlusion: Objects blocking others are perceived as closer.
Shading and Shadows: Light and shadow patterns give clues about the shape and distance of objects.
Texture Gradient: Finer, less detailed textures suggest greater distance.
Motion Parallax: When the head moves, closer objects appear to shift more than distant ones.

The monocular approach is tempting because it requires only a single CMSO sensor and lens assembly. On the other hand, as I mentioned earlier, it requires a “humongous-squared” amount of computation (that’s a lot of computation), which will probably require a larger, faster, and more expensive processor.

The thing is that a lot of systems already have CMOS sensor-based cameras in them performing 2D vision tasks. If only there were a way to use this existing sensor to also provide 3D depth perception. Well, by golly, it turns out that there is!

I was just chatting with Feisal Afzal (Co-Founder and COO) and Skanda Visvanathan (VP of Business Development) at MagikEye. I have to say that I’m impressed by all they told me. The folks at MagikEye have come up with a cunning low-cost way to allow a regular CMOS sensor to provide accurate depth perception from anywhere from 5cm to 5m. This will address the needs of myriad applications, from household robots hoovering the floors, to autonomous robots trying not to run over people in factories and warehouses, to automobiles wishing to know who is doing what and where they’re doing it inside the car, to drones, to… we are limited only by our imaginations.

Look at the picture below, On the right (the black round bit) we see a cheap-and-cheerful CMOS image sensor. To its left (the small gray square bit), we see one of MagiKEye’s infrared (IR) projectors. The cable is connected to something like a Raspberry Pi running MagikEye’s software

Projector (left) and CMOS sensor (right) (Source: MagikEye)

Now, I’m not an expert in this area, so bear with me while I try to explain this in words that even I can understand. Let’s start with the fact that a lot of IR vision systems require their projector to “flood” the area with light, but this consumes a lot of power, relatively speaking. Rather than illuminate the entire scene, MagikEye’s device projects thousands of tiny dots that are invisible to the human eye, but that are visible to the CMOS sensor, the generation of which consumes much less power.

In the context of 3D sensing and measurement, the term “triangulation” refers to a technique that allows us to calculate the depth (or distance) to an object based on the geometry of the triangle formed by the projector, the sensor, and the object being measured. In this case, MagikEye’s software performs triangulation on each of the dots.

Even if all the dots are projected into a flat wall resulting in a regular pattern, this pattern appears distorted when viewed from a camera at an offset angle to the projector (the distance between the projector and sensor is called the “baseline”) allowing the depth (distance) to the wall to be calculated. Things become more interesting when a 3D object like a human head (preferably still attached to its body) enters the scene. In this case, the curvature of the head will cause further distortions in the locations of the dots. All of this allows MagikEye’s software to create a 3D point-cloud associated with the scene.

Now, this is the clever bit. You could create a new device from scratch, with its own camera (that is, CMOS sensor + lens assembly), projector, and processor. Alternatively, you could start with an existing device that already has its own camera and processor that it’s using to detect and process traditional 2D red, green, and blue (RGB) images. All you need to do is add the projector and modify the application to swap between the regular image processing software and MagikEye’s point-cloud generation software. Then you can fuse the 2D image with the 3D point cloud, and “Bob’s your uncle” (or aunt, depending on your family dynamic).

We should acknowledge that there are other systems that use a single CMOS sensor in conjunction with structured light. I’m thinking of devices like Microsoft Kinect and Apple Face ID. These often employ random patterns of dots because they provide dense, detailed depth data and are robust in real-world, textured environments. However, these systems consume more power because—the way they work—they require 9 to 10 dots for each measured point. Also, because they are based on a pattern-matching approach, these systems have a large memory footprint and consume a significant amount of processing power.

By comparison, MagikEye’s Invertible Light Technology (ILT) is an innovative approach to depth sensing in 3D imaging systems that is designed to overcome limitations of traditional depth-sensing methods. The pattern of dots that ILT projects onto the scene is specially designed to be “invertible,” meaning that the projected pattern has mathematical properties that allow it to be reconstructed and analyzed with high precision. Furthermore, ILT employs a unique fast linear algorithm. In addition to a low memory footprint, this algorithm requires much less computation and processor power.

Just for giggles and grins, here are a couple of short videos upon which you may feast your orbs.

Well, I for one am very impressed. If you want to learn more, reach out to the guys and gals at MagikEye and tell them, “Max says ‘Hi’” (but be prepared to explain who the “Max” is of whom you speak).

If you are planning on attending CES 2025, then the chaps and chapesses at MagikEye are inviting interested partners, product designers, and customers to arrange private demonstrations of their enhanced ILT technology. They tell me that, “These one-on-one sessions will provide an in-depth look at how to seamlessly integrate ILT into existing hardware and software platforms and explore its potential across a multitude of applications.”

To schedule a private demo at CES, you can email them (CES2025@magik-eye.com). In the meantime, as always, I await your captivating comments and insightful questions in great anticipation.

Augmenting an Existing 2D CMOS Sensor to Provide 3D Depth Perception

Related

Leave a Reply Cancel reply

Libby's Lab

Libby's Lab - Scopes out Eaton EHBSA Aluminum Organic Polymer Capacitors

featured paper

Agilex™ 3 vs. Certus-N2 Devices: Head-to-Head Benchmarking on 10 OpenCores Designs

featured chalk talk