Skip to content

UNet1DModel does not converge #11171

Open
@MrInformatic

Description

@MrInformatic

Describe the bug

I tried to train a UNet1DModel, DDPMScheduler Diffusion Pipeline using AdamW optimizer and mse_loss. No matter what I tried, I never got the model to produce a loss below 0.5. As a sanity check, I also tried to replace the UNet1DModel with a UNet2DModel, which performed significantly better. Both Pipelines should produce silence or a blank image respectively. It seems like something is wrong with the UNet1DModel since this is the only part which was changed. #3203 Also mentions problems with UNet1DModel, but I tried to train my model with different learning rates using HPO allready.

Reproduction

import torch
from diffusers import UNet1DModel, DDPMScheduler, UNet2DModel
from diffusers.utils.torch_utils import randn_tensor
from torch.nn.functional import mse_loss
import matplotlib.pyplot as plt

def test_diffusers(dimensions: int):
    device = torch.device("cpu")
    sample_size = 32
    generator = torch.Generator(device=device)

    noise_scheduler = DDPMScheduler(num_train_timesteps=1000)

    if dimensions == 1:
        model = UNet1DModel(
            sample_size=sample_size,
            in_channels=1,
            out_channels=1,
            block_out_channels=(
                64,
            ),
            down_block_types=(
                "DownBlock1D",
            ),
            up_block_types=(
                "UpBlock1D",
            ),
        ).to(device)

        shape = (16, 1, sample_size)
    elif dimensions == 2:
        model = UNet2DModel(
            sample_size=sample_size,
            in_channels=1,
            out_channels=1,
            block_out_channels=(
                64,
            ),
            down_block_types=(
                "DownBlock2D",
            ),
            up_block_types=(
                "UpBlock2D",
            ),
        ).to(device)

        shape = (16, 1, sample_size, sample_size)
    else:
        raise Exception("only 1D and 2D are supported")

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

    losses = []
    for i in list(range(100)):
        audio = torch.zeros(shape, device=device)
        noise = randn_tensor(shape, generator=generator, device=device, dtype=audio.dtype)

        batch_size = audio.shape[0]
        time_steps = torch.randint(0, noise_scheduler.num_train_timesteps, (batch_size,), device=device).long()

        noisy_voice = noise_scheduler.add_noise(audio, noise, time_steps)

        target = noise

        pred = model(noisy_voice, time_steps, return_dict=False)[0]

        loss = mse_loss(pred, target)
        loss.backward(loss)
        losses.append(loss.item())

        optimizer.step()
        optimizer.zero_grad()

    return losses

if __name__ == "__main__":
    losses_1d = test_diffusers(1)
    losses_2d = test_diffusers(2)

    plt.plot(losses_1d)
    plt.plot(losses_2d)
    plt.legend(["1D", "2D"], loc="upper right")

    plt.savefig(f"plot.png")

Image

Logs

System Info

diffusers: 0.32.2
torch: 2.6.0
Python: 3.10.14
OS: Manjaro Linux
CPU: AMD Ryzen 5 1600X
GPU: Nvidia RTX 3090 24GB
RAM: 32 GB

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions