Description
Hi, This is an impressive work for introducing diffusion model in E2E-AD. I would like to ask some questions.
ref core code
step_num = 2
step_ratio = 20 / step_num
roll_timesteps = (np.arange(0, step_num) * step_ratio).round()[::-1].copy().astype(np.int64)
roll_timesteps = torch.from_numpy(roll_timesteps).to(device)
for k in roll_timesteps[:]:
...
timesteps = k # actually k is 10 and 0
...
img = self.diffusion_scheduler.step(model_output=x_start, timestep=k, sample=img).prev_sample
...
In the code above, the roll_timesteps is actually [10, 0], for each self.diffusion_scheduler.step, second param timestep list is [10, 0].
Considering the.diffusion_scheduler.num_train_timesteps is 1000, the self.diffusion_scheduler.num_inference_steps is also 1000, the interval of timestep in inference stage is 1.
The code in DDIM.step for calculating prev_timestep is
# 1. get previous step value (=t-1)
prev_timestep = timestep - self.config.num_train_timesteps // self.num_inference_steps
The prev_timestep for the first denoise step (with k=10) is 9, so the denoised sample is at timestep 9.
The second denoise step (with direct k=0), actually directs return the prediceted 'pose_reg' result.
My question, "The denoising step is executed 2 times, but the timestep is not consecutive. For example consecutive timestep is 10->9->8 in DDPM, or 10->8->6->4... in DDIM, with 'time_spaceing' interval."
Could you explain why the forward denoise is not a standard DDIM process? How much impact does it have on the results?
@LegendBC