Open
Description
Describe the bug
I tried to train a UNet1DModel, DDPMScheduler Diffusion Pipeline using AdamW optimizer and mse_loss. No matter what I tried, I never got the model to produce a loss below 0.5
. As a sanity check, I also tried to replace the UNet1DModel with a UNet2DModel, which performed significantly better. Both Pipelines should produce silence or a blank image respectively. It seems like something is wrong with the UNet1DModel since this is the only part which was changed. #3203 Also mentions problems with UNet1DModel, but I tried to train my model with different learning rates using HPO allready.
Reproduction
import torch
from diffusers import UNet1DModel, DDPMScheduler, UNet2DModel
from diffusers.utils.torch_utils import randn_tensor
from torch.nn.functional import mse_loss
import matplotlib.pyplot as plt
def test_diffusers(dimensions: int):
device = torch.device("cpu")
sample_size = 32
generator = torch.Generator(device=device)
noise_scheduler = DDPMScheduler(num_train_timesteps=1000)
if dimensions == 1:
model = UNet1DModel(
sample_size=sample_size,
in_channels=1,
out_channels=1,
block_out_channels=(
64,
),
down_block_types=(
"DownBlock1D",
),
up_block_types=(
"UpBlock1D",
),
).to(device)
shape = (16, 1, sample_size)
elif dimensions == 2:
model = UNet2DModel(
sample_size=sample_size,
in_channels=1,
out_channels=1,
block_out_channels=(
64,
),
down_block_types=(
"DownBlock2D",
),
up_block_types=(
"UpBlock2D",
),
).to(device)
shape = (16, 1, sample_size, sample_size)
else:
raise Exception("only 1D and 2D are supported")
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
losses = []
for i in list(range(100)):
audio = torch.zeros(shape, device=device)
noise = randn_tensor(shape, generator=generator, device=device, dtype=audio.dtype)
batch_size = audio.shape[0]
time_steps = torch.randint(0, noise_scheduler.num_train_timesteps, (batch_size,), device=device).long()
noisy_voice = noise_scheduler.add_noise(audio, noise, time_steps)
target = noise
pred = model(noisy_voice, time_steps, return_dict=False)[0]
loss = mse_loss(pred, target)
loss.backward(loss)
losses.append(loss.item())
optimizer.step()
optimizer.zero_grad()
return losses
if __name__ == "__main__":
losses_1d = test_diffusers(1)
losses_2d = test_diffusers(2)
plt.plot(losses_1d)
plt.plot(losses_2d)
plt.legend(["1D", "2D"], loc="upper right")
plt.savefig(f"plot.png")
Logs
System Info
diffusers: 0.32.2
torch: 2.6.0
Python: 3.10.14
OS: Manjaro Linux
CPU: AMD Ryzen 5 1600X
GPU: Nvidia RTX 3090 24GB
RAM: 32 GB
Who can help?
No response