You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work! I have been reading your published paper manuscript as well as the code implementation, and I came into a problem about the used loss function. It would be highly appreciated if you could explain how this works.
Here is how it goes. In the paper manuscript, specifically in Eq. (2), the overall training objective of AnyDoor is an MSE loss between the U-net output and the ground-truth image latents, corresponding to the following figure:
In the code implementation, the loss type is controlled by self.parameterization, where self.parameterization is set to "eps" by default. It is also not changed in the configuration file (configs/anydoor.yaml).
Therefore, in the p_losses() function of ldm/models/diffusion/ddpm.py (line 367 to line 411), we can see:
if self.parameterization == "eps", target will become random Gaussian noise, where the loss function will be MSE loss between the U-net output and random Gaussian noise. This is confict with the one shown in the paper manuscript.
According to Eq. (2) in the paper manuscript, I suppose that self.parameterization should be set to "x0", resulting in that target will become x_start, so that the code implementation could align with the formula. Am I understanding this correct? Please enlighten me if I have get anything wrong. Looking forward to your reply.
Best regards
The text was updated successfully, but these errors were encountered:
Hi @XavierCHEN34 ,
Thanks for your great work! I have been reading your published paper manuscript as well as the code implementation, and I came into a problem about the used loss function. It would be highly appreciated if you could explain how this works.
Here is how it goes. In the paper manuscript, specifically in Eq. (2), the overall training objective of AnyDoor is an MSE loss between the U-net output and the ground-truth image latents, corresponding to the following figure:
In the code implementation, the loss type is controlled by
self.parameterization
, whereself.parameterization
is set to"eps"
by default. It is also not changed in the configuration file (configs/anydoor.yaml
).Therefore, in the
p_losses()
function ofldm/models/diffusion/ddpm.py
(line 367
toline 411
), we can see:if
self.parameterization == "eps"
,target
will become random Gaussian noise, where the loss function will be MSE loss between the U-net output and random Gaussian noise. This is confict with the one shown in the paper manuscript.According to Eq. (2) in the paper manuscript, I suppose that
self.parameterization
should be set to"x0"
, resulting in thattarget
will becomex_start
, so that the code implementation could align with the formula. Am I understanding this correct? Please enlighten me if I have get anything wrong. Looking forward to your reply.Best regards
The text was updated successfully, but these errors were encountered: