Before the story unrolls, 5 coordinate systems should be understood such that we will know how an object in the 3d world is transformed into the 2d image plane eventually onto a discrete pixel plane step by step.
The first step for transfor transforming the 3d object onto pixel plane is converting the 3d object in the world to camera coordinates. In a metaphor, we should see the world or the object in camera's viewpoint. Mathamatically, a transformation of coordiantes is required. Therefore, rigid transformation is introduced as follows.
- Rigid transformation only includes rotation and translation, which are respectivly named as
$R$ and$t$ . Some other useful facts are: (Click here to know the detailed properties of the orthogonal matrix)-
$R$ is orthogonal matrix whose vectors are all unit vectors such that$R^T = R^{-1}$ - row
$r_1, r_2, r_3$ are mutually perpendicular such that$r_i\cdot r_j = 0$ -
$r_i$ is an unit vector so that$|r_i| = |r_j| = 1$ such that$r_1^{T}r_1 = r_2^{T}r_2$
-
Mathamatically
Given
In homogenous form as follows:
Different from the image1, image2 projects the object in front of the pinhole(center of the projection), which has a better visualization due to not having the triangular upside down.
Following the properties of similar triangles:
Then write it into matrix form:
$$
\begin{align}
\begin{bmatrix}
x \
y \
1
\end{bmatrix}
&=
\begin{bmatrix}
f & 0 & 0 \
0 & f & 0 \
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
\frac{x_c}{z_c} \
\frac{y_c}{z_c} \
1
\end{bmatrix} \
&=
\frac{1}{z_c}
\begin{bmatrix}
f & 0 & 0 \
0 & f & 0 \
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
x_c \
y_c \
1
\end{bmatrix} \
\end{align}
$$
We represent an image with discrete pixels. However the image plane represented with (x, y) is continous. So we also need to define a proper conversion for that.
-
$d_x$ and$d_y$ indicate the size of a pixel in the real world, let's say 5mm/pixel such that$f_x$ and$f_y$ has a unit of pixel.
Note that the origin of the image plane is on center. But the pixel plane's is on the left top corner. Therefore, we also need to do the translation as follows:
-
$v$ and$u$ is the row_id and col_id of the image matrix
Note that
So far, we do a recap. From world to pixel plane, we connect the eq3, eq6, eq7, eq8 and eq9. We can represent the transformation as follows:
Note that $$ K = \begin{bmatrix} f_x & \gamma & c_x & 0 \ 0 & f_y & c_y & 0 \ 0 & 0 & 1 & 0 \end{bmatrix} $$
If you pay attention to the details, two new signs are added.
-
you will find there is one more cofficients named
$\gamma$ in camera matrix.$\gamma$ is the axis skew which is the angle of the x and y in image plane. For simplicity we set it 0 becuase usually$axis_x \perp axis_y$ -
a
$[0, 0, 0]^T$ is columned by the camera matrix. The modification is introduced considerting about the matrix multiplication of$K[3 \times 4] \times R|t[4 \times 4]$ -
$R|t$ : extrinscics parameter. Shape:[4x4] -
$K$ : intrinscis parameter -
$P$ : projective matrix ($P = K[R|t]$ )
Distortion types:
- ref:
-
Radial distortion $$ x_{dist} = x + x(k_1r^2 + k_2r^4 + k_3r^6) \ y_{dist} = y + y(k_1r^2 + k_2r^4 + k_3r^6) \ r = x^2 + y^2 $$
-
Tangential distortion $$ x_{dist} = x + 2p_1 x y + p_2 (r^2 + 2x^2) \ y_{dist} = y + 2p_2 x y + p_1 (r^2 + 2y^2) \ $$
-
Combine them together $$ x_{dist} = x + x(k_1r^2 + k_2r^4 + k_3r^6) + 2p_1 x y + p_2 (r^2 + 2x^2) \ y_{dist} = y + y(k_1r^2 + k_2r^4 + k_3r^6) + 2p_2 x y + p_1 (r^2 + 2y^2) $$
-
ref for details: DLT part1
-
The lesson from the above link details the equations below.
Futher explanations, (x1, y1, z1) ----> u # the first eq; (x1, y1, z1) ----> v # the second eq
The reason we did all of these things is because matrix is a powerful tool to solve set of equations. Now stretch way back into your memory from linear algebra. We are usually solving
ref for details: DLT part2
- Over constrained refers to we have more equations than the unknown.
- m is only valid up to scale meaning that if you look at image:H3, all the
$m_{ij}$ can be divided by a scale, because the right side equals zeros. We end it up with a fact that we don't care how much the$m_{ij}$ . So we can make it an unit vector. - So the question now is reduced to what unit vector
$m$ minimizes the$Am$
- ref for details eigen vector and eigen value
- the reason why
$U$ can be get rid of, and $V^T$can be added is because$U$ and$V$ are both composed of unit vectors. It's known that a vector multiplied with an unit vector doesn't change its magnitude but only its orientation.
Just pull out the last column of
- Least-Squares Rigid Motion Using SVD
- given H and K, compute R and t
- Compute_homography_Matrix
- compute H
- 相机标定(张正友标定算法)解读与实战(从理论到手撕c++)
- good c++ code to run
- Camera Calibration using Zhang's Method (Cyrill Stachniss)
- detail zhang's method well
- Direct Linear Transform for Camera Calibration and Localization (Cyrill Stachniss)
- detail DLT
- Zhengyou Zhang's paper
- Udacity Calibrating Camera
- detail SVD trick to get the H.