-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
I tried to implement this for flow models as described in the appendix, but the results are complete collapse (exploding images). Did I make a mistake or is this technique fundamentally incompatible with flow models (which have no renoising step)? Also the paper doesn't define v lambda.
def euler_cfgpp_update(
x_t: torch.Tensor,
t: float,
dt: float,
v_u: torch.Tensor,
v_c: torch.Tensor,
lambda_val: float,
) -> Tensor:
# Unconditional velocity at (x_t, t)
# v_u = model_uncond(x_t, t)
# Conditional velocity at (x_t, t)
# v_c = model_cond(x_t, t)
# Unconditional “Tweedie” estimate: x̃ₐ⁽∅⁾ = xₜ - t * v_u
x_null = x_t + (1 - t) * v_u
# Conditional “Tweedie” estimate: x̃ₐ⁽ᶜ⁾ = xₜ - t * v_c
x_cond = x_t + (1 - t) * v_c
# normal cfg prediction
# x_cfg = x_t + (1 - t) * (v_u + 2.3 * (v_c - v_u))
# CFG++ “Tweedie” estimate (interpolation):
# x̃ₐ⁽λ⁾ = (1-λ)* x̃ₐ⁽∅⁾ + λ * x̃ₐ⁽ᶜ⁾
x_cfgpp = x_null + lambda_val * (x_cond - x_null)
# Next time = t + dt
t_next = t + dt
# Euler step for CFG++:
# xₜ₁ = x̃ₐ⁽λ⁾(xₜ₀) + ( xₜ - x̃ₐ⁽∅⁾(xₜ₀) ) / t₀ * t₁
# (Make sure t != 0 to avoid divide-by-zero!)
# eps = 1e-12
x_next = x_cfgpp + (x_t - x_null) * ((1 - t_next) / (1 - t))
# vanilla cfg
# x_next = x_cfg + (x_t - x_cfg) * ((1 - t_next) / (1 - (t + eps)))
return x_nextMetadata
Metadata
Assignees
Labels
No labels