Skip to content

How it works

Dmitry Zolotukhin edited this page Oct 15, 2022 · 7 revisions

How it works

The parallax effect requires two images.

Here's an intuitive demonstration of the parallax effect on an image (looking like a sad Australia):

parallax effect

In case images have parallel projection and rotate around a rotation point, the distance each point would move is proportional to the distance from that point to the rotation center.

Step 1: Find feature points

To determine the rotation axis, points are detected with the FAST keypoint detector.

This image illustrates points detected by FAST (highlighted with red):

fast points

Step 2: Match feature points

Feature points are matched by comparing each feature point on image A with every feature point on image B.

Only point matches that have a high correlation rate are kept.

Here's an example with lines connecting all point matches from image A to image B:

point matches

Step 3: Calculate direction

Using the RANSAC method, find the set of points that all match the same Fundamental matrix. In some cases, there's no clear consensus set (or multiple different sets) - which might cause the algorithm to either fail, or produce garbage results.

The previous step will give some false positives, which should be filtered out.

Here's an example with lines connecting all filtered FAST points from image A to image B (note the difference with the image from Step 2):

point matches

or the same matches, but drawing lines in the image coordinates:

point matches

The fundamental matrix significantly affects the next step, improving both performance and reliability of the reconstruction process.

Step 4: Cross-correlation

Now that the fundamental matrix has been established, it's time to start the most time-consuming process: for every pixel on image A, find a matching pixel on image B.

This is done by using image cross-correlation for areas in image A and image B.

Cross-correlating all pixel between images A and B is prohibitively time-consuming. This would also lead to a lot if false positives, especially if the image contains repeating textures.

Cybervision uses a better approach to only cross-correlate each pixel from image A with a small area in image B - where a match is most likely to be found:

  • Each pixel from image A is only searched in a narrow "corridor" in Image B (called an epipolar line - a "corridor" is a epipolar line that's a few pixels wide).

    This "corridor" matches the direction calculalated on the previous step (determined by a point's epipolar line).

  • A scale pyramid is used to perform cross-correlation on a downscaled image first.

    Then, on each larger scale the depth (distance) between neighboring pixels is used to approximate where a match is most likely to be found.

After the cross-correlation process completes, the resulting surface will look like this (no reliable correlation could be found in the blank areas):

reconstructed mesh

Step 5: Triangulation

Using the qhull library, convert points into a 3D mesh.

At this moment, the surface can be saved as a Wavefront OBJ file, or interpolated to get a depth map image.

The end result will look like this:

interpolated surface

reconstructed mesh

Clone this wiki locally