You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you very much for sharing such a good and high-quality implementation.
However, there is one instance in your code that I don't quite understand.
For backward warping, I understood that the depth map obtained from the target view is moved to the coordinates of the source view, and the value of the source image is taken to the coordinates generated, resulting in a loss between the reconstructed target view and the original target view, and this is the reason for using backward warping (I also read the other comments and found them very helpful).
But what I'm wondering is what the grid used in the F.grid_sample function means in this case, i.e. there is an assumption that the image in the target view should be seen in these positions in the source view, and I think it is the GT pose that ensures this.
If that warping is moving from target to source using GT poses, what exactly do the resulting coordinates mean? Because if they are complete source coordinates, it's a little awkward to represent the target view generated when putting the source image into F.grid_sample. (warping: photometric consistency, which is often used in self-supervised monocular depth estimation, and backward warping, which is often used in mvsnet to get the source view into the target view)
I understand the coordinates generated by going to Target View -> Source View as offsets that tell the source view where the pixels need to move to correspond to the reference view.
However, I have a question about geometric filtering.
Line 153 of the reproject_with_depth function in the eval.py file is a bit confusing. sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR)
I agree that the transformation is already done via relative pose to get from the reference view to the source view, and that source depth should be used to get from the source view back to the reference view.
However, I'm a little confused about what coordinate system the sampled_depth_src is in or what it means, as it was created by putting the depth_src back into the grid created for the existing backward warping.
As mentioned above, I understood the coordinates generated by moving the grid from the target view -> source view as offsets that tell me where in the source view I need to move pixels to correspond to the reference view. What coordinate system is the grid on line 153, and what does it mean?
I would appreciate if you could share your thoughts on that code.
The text was updated successfully, but these errors were encountered:
As far as I know, in MVSNet, backward warping involves using the distances from the reference image's frustum planes to the camera origin as candidate depths. The homogeneous coordinates corresponding to the pixels in the reference image are then back-projected into 3D space (this gives 3D coordinates). Next, using the relative pose between the reference and source images, along with the intrinsic parameters of the source camera, the 3D coordinates from the reference image are warped into 2D coordinates in the source image.
According to the photometric consistency assumption, the features corresponding to the reference image coordinates should match the features at the warped coordinates in the source image. However, the warped coordinates often do not perfectly align with a specific feature in the source image. Assuming the feature distribution in the source image is linearly continuous, interpolation is used to obtain the corresponding feature.
The evaluation of geometric consistency is similar, with the difference being that, in this case, backward warping is performed on the depth map rather than the feature map. The assumption here is that the depth map's distribution is linearly continuous.
First of all, thank you very much for sharing such a good and high-quality implementation.
However, there is one instance in your code that I don't quite understand.
For backward warping, I understood that the depth map obtained from the target view is moved to the coordinates of the source view, and the value of the source image is taken to the coordinates generated, resulting in a loss between the reconstructed target view and the original target view, and this is the reason for using backward warping (I also read the other comments and found them very helpful).
But what I'm wondering is what the grid used in the F.grid_sample function means in this case, i.e. there is an assumption that the image in the target view should be seen in these positions in the source view, and I think it is the GT pose that ensures this.
If that warping is moving from target to source using GT poses, what exactly do the resulting coordinates mean? Because if they are complete source coordinates, it's a little awkward to represent the target view generated when putting the source image into F.grid_sample. (warping: photometric consistency, which is often used in self-supervised monocular depth estimation, and backward warping, which is often used in mvsnet to get the source view into the target view)
I understand the coordinates generated by going to Target View -> Source View as offsets that tell the source view where the pixels need to move to correspond to the reference view.
However, I have a question about geometric filtering.
Line 153 of the reproject_with_depth function in the eval.py file is a bit confusing.
sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR)
I agree that the transformation is already done via relative pose to get from the reference view to the source view, and that source depth should be used to get from the source view back to the reference view.
However, I'm a little confused about what coordinate system the sampled_depth_src is in or what it means, as it was created by putting the
depth_src
back into thegrid
created for the existing backward warping.As mentioned above, I understood the coordinates generated by moving the grid from the target view -> source view as offsets that tell me where in the source view I need to move pixels to correspond to the reference view. What coordinate system is the grid on line 153, and what does it mean?
I would appreciate if you could share your thoughts on that code.
The text was updated successfully, but these errors were encountered: