Why Translation normalization have a huge impact on the rendering result? #59

Miaosheng1 · 2024-08-14T03:52:13Z

Hi, I'm training the Mvsplat to reconstruct the street scene, and I find a question:

when i normalize the extrinsics (eg. translation ) using the following code, i can get a good result ( orange curve)。
scale_factor /= np.max(np.abs(wordl2camera[:, :3, 3]))
wordl2camera[:, :3, 3] *= scale_factor
when i comment out the normalize code, the training PSNR has decreased significantly ( blue curve)。

The comparison curve of training process is as follows:

Can you provide some explanation for the phenomenon？

Normalize the translation Render Depth:

Unnormalize the translation Render Depth:

Corresponding Image:

The text was updated successfully, but these errors were encountered:

donydchen · 2024-08-27T12:10:07Z

Hi @Miaosheng1, sorry for the late reply. I have been busy for the past few weeks.

Glad to see that you're trying to apply MVSplat to other datasets. It looks like the normalization operation will significantly affect the depth scale, leading to performance differences.

Below, I listed some suggestions that might help identify the main issues,

Do you train from scratch or fine-tune from the RE10K released weight? The depth range in KITTI (I assumed the dataset is KITTI?) is quite large. In this case, normalizing the translation might be helpful to align that with the RE10K dataset, leading to better performance if you are fine-tuning from the RE10K pre-trained model.
What are the near and far values in your settings?
What is the average scale of the scale_factor?

How is the quality of the depth predicted by the encoder instead of the Gaussian rendered ones? The encoder's predicted depth reflects the model more directly; hence, it is better for debugging. You can plot it by following

mvsplat/src/paper/generate_point_cloud_figure_mvsplat.py

Lines 439 to 448 in 6cf9dee

    
           depth_vis = ( 
        
               (visualization_dump["depth"].squeeze(-1).squeeze(-1)).cpu().detach() 
        
           ) 
        
           for v_idx in range(depth_vis.shape[1]): 
        
               vis_depth = viz_depth_tensor( 
        
                   1.0 / depth_vis[0, v_idx], return_numpy=True 
        
               )  # inverse depth 
        
               # save_path = path / scene / f"color/input{v_idx}_depth.png" 
        
               # os.makedirs(os.path.dirname(save_path), exist_ok=True) 
        
               Image.fromarray(vis_depth).save(f"{base}_depth_{v_idx}.png")

How does it perform on the test set? Or how does it perform in the late training stage (I saw the curve ends at around 3K steps; how about 30K steps)?

With more details regarding the above-listed questions, we might be able to identify the main issues and figure out how to correctly configure the model in your dataset.

Miaosheng1 · 2024-09-19T03:37:19Z

Changing the near and far helps to improve the quality.

liucsg · 2024-09-25T01:22:05Z

Changing the near and far helps to improve the quality.

Could you tell how to apply MVSplat to kitti datasets?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Translation normalization have a huge impact on the rendering result? #59

Why Translation normalization have a huge impact on the rendering result? #59

Miaosheng1 commented Aug 14, 2024 •

edited

Loading

donydchen commented Aug 27, 2024

Miaosheng1 commented Sep 19, 2024

liucsg commented Sep 25, 2024

Why Translation normalization have a huge impact on the rendering result? #59

Why Translation normalization have a huge impact on the rendering result? #59

Comments

Miaosheng1 commented Aug 14, 2024 • edited Loading

The comparison curve of training process is as follows:

Can you provide some explanation for the phenomenon？

Normalize the translation Render Depth:

Unnormalize the translation Render Depth:

Corresponding Image:

donydchen commented Aug 27, 2024

Miaosheng1 commented Sep 19, 2024

liucsg commented Sep 25, 2024

Miaosheng1 commented Aug 14, 2024 •

edited

Loading