A confusion comparing code with paper

Hi, authors:

Great work! While, I'm a bit confused about the description and code, in paper A.5:
<img width="768" alt="image" src="https://github.com/bytedance/MVDream-threestudio/assets/25632694/3e4bc724-bcdb-43b7-9a8f-016d3ea96512">
x_t is called noisy image. However, in code,
```python
        if self.cfg.recon_loss:
            # reconstruct x0
            latents_recon = self.model.predict_start_from_noise(
                latents_noisy, t, noise_pred
            )
            # x0-reconstruction loss from Sec 3.2 and Appendix
            loss = (
                0.5
                * F.mse_loss(latents, latents_recon.detach(), reduction="sum")
                / latents.shape[0]
            )
            grad = torch.autograd.grad(loss, latents, retain_graph=True)[0]
```
x_0, x_t is actually latent after vae and noisy latent, if correct.

There do exist methods that apply loss on image-space such as HiFA, and ReconFusion, which may be confusing.

Please clearify that I'm understanding it right, Thanks!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A confusion comparing code with paper #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A confusion comparing code with paper #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions