Skip to content

UNet1DModel does not converge #11171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MrInformatic opened this issue Mar 29, 2025 · 0 comments
Open

UNet1DModel does not converge #11171

MrInformatic opened this issue Mar 29, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@MrInformatic
Copy link

Describe the bug

I tried to train a UNet1DModel, DDPMScheduler Diffusion Pipeline using AdamW optimizer and mse_loss. No matter what I tried, I never got the model to produce a loss below 0.5. As a sanity check, I also tried to replace the UNet1DModel with a UNet2DModel, which performed significantly better. Both Pipelines should produce silence or a blank image respectively. It seems like something is wrong with the UNet1DModel since this is the only part which was changed. #3203 Also mentions problems with UNet1DModel, but I tried to train my model with different learning rates using HPO allready.

Reproduction

import torch
from diffusers import UNet1DModel, DDPMScheduler, UNet2DModel
from diffusers.utils.torch_utils import randn_tensor
from torch.nn.functional import mse_loss
import matplotlib.pyplot as plt

def test_diffusers(dimensions: int):
    device = torch.device("cpu")
    sample_size = 32
    generator = torch.Generator(device=device)

    noise_scheduler = DDPMScheduler(num_train_timesteps=1000)

    if dimensions == 1:
        model = UNet1DModel(
            sample_size=sample_size,
            in_channels=1,
            out_channels=1,
            block_out_channels=(
                64,
            ),
            down_block_types=(
                "DownBlock1D",
            ),
            up_block_types=(
                "UpBlock1D",
            ),
        ).to(device)

        shape = (16, 1, sample_size)
    elif dimensions == 2:
        model = UNet2DModel(
            sample_size=sample_size,
            in_channels=1,
            out_channels=1,
            block_out_channels=(
                64,
            ),
            down_block_types=(
                "DownBlock2D",
            ),
            up_block_types=(
                "UpBlock2D",
            ),
        ).to(device)

        shape = (16, 1, sample_size, sample_size)
    else:
        raise Exception("only 1D and 2D are supported")

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

    losses = []
    for i in list(range(100)):
        audio = torch.zeros(shape, device=device)
        noise = randn_tensor(shape, generator=generator, device=device, dtype=audio.dtype)

        batch_size = audio.shape[0]
        time_steps = torch.randint(0, noise_scheduler.num_train_timesteps, (batch_size,), device=device).long()

        noisy_voice = noise_scheduler.add_noise(audio, noise, time_steps)

        target = noise

        pred = model(noisy_voice, time_steps, return_dict=False)[0]

        loss = mse_loss(pred, target)
        loss.backward(loss)
        losses.append(loss.item())

        optimizer.step()
        optimizer.zero_grad()

    return losses

if __name__ == "__main__":
    losses_1d = test_diffusers(1)
    losses_2d = test_diffusers(2)

    plt.plot(losses_1d)
    plt.plot(losses_2d)
    plt.legend(["1D", "2D"], loc="upper right")

    plt.savefig(f"plot.png")

Image

Logs

System Info

diffusers: 0.32.2
torch: 2.6.0
Python: 3.10.14
OS: Manjaro Linux
CPU: AMD Ryzen 5 1600X
GPU: Nvidia RTX 3090 24GB
RAM: 32 GB

Who can help?

No response

@MrInformatic MrInformatic added the bug Something isn't working label Mar 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant