Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Translation normalization have a huge impact on the rendering result? #59

Open
Miaosheng1 opened this issue Aug 14, 2024 · 3 comments

Comments

@Miaosheng1
Copy link

Miaosheng1 commented Aug 14, 2024

Hi, I'm training the Mvsplat to reconstruct the street scene, and I find a question:

  • when i normalize the extrinsics (eg. translation ) using the following code, i can get a good result ( orange curve)
    scale_factor /= np.max(np.abs(wordl2camera[:, :3, 3]))
    wordl2camera[:, :3, 3] *= scale_factor
  • when i comment out the normalize code, the training PSNR has decreased significantly ( blue curve)。

The comparison curve of training process is as follows:

image

Can you provide some explanation for the phenomenon?

Normalize the translation Render Depth:

image

Unnormalize the translation Render Depth:

image

Corresponding Image:

image

@donydchen
Copy link
Owner

Hi @Miaosheng1, sorry for the late reply. I have been busy for the past few weeks.

Glad to see that you're trying to apply MVSplat to other datasets. It looks like the normalization operation will significantly affect the depth scale, leading to performance differences.

Below, I listed some suggestions that might help identify the main issues,

  • Do you train from scratch or fine-tune from the RE10K released weight? The depth range in KITTI (I assumed the dataset is KITTI?) is quite large. In this case, normalizing the translation might be helpful to align that with the RE10K dataset, leading to better performance if you are fine-tuning from the RE10K pre-trained model.
  • What are the near and far values in your settings?
  • What is the average scale of the scale_factor?
  • How is the quality of the depth predicted by the encoder instead of the Gaussian rendered ones? The encoder's predicted depth reflects the model more directly; hence, it is better for debugging. You can plot it by following
    depth_vis = (
    (visualization_dump["depth"].squeeze(-1).squeeze(-1)).cpu().detach()
    )
    for v_idx in range(depth_vis.shape[1]):
    vis_depth = viz_depth_tensor(
    1.0 / depth_vis[0, v_idx], return_numpy=True
    ) # inverse depth
    # save_path = path / scene / f"color/input{v_idx}_depth.png"
    # os.makedirs(os.path.dirname(save_path), exist_ok=True)
    Image.fromarray(vis_depth).save(f"{base}_depth_{v_idx}.png")
  • How does it perform on the test set? Or how does it perform in the late training stage (I saw the curve ends at around 3K steps; how about 30K steps)?

With more details regarding the above-listed questions, we might be able to identify the main issues and figure out how to correctly configure the model in your dataset.

@Miaosheng1
Copy link
Author

Changing the near and far helps to improve the quality.

@liucsg
Copy link

liucsg commented Sep 25, 2024

Changing the near and far helps to improve the quality.

Could you tell how to apply MVSplat to kitti datasets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants