UNet1DModel does not converge

### Describe the bug

I tried to train a UNet1DModel, DDPMScheduler Diffusion Pipeline using AdamW optimizer and mse_loss. No matter what I tried, I never got the model to produce a loss below `0.5`. As a sanity check, I also tried to replace the UNet1DModel with a UNet2DModel, which performed significantly better. Both Pipelines should produce silence or a blank image respectively. It seems like something is wrong with the UNet1DModel since this is the only part which was changed. #3203 Also mentions problems with UNet1DModel, but I tried to train my model with different learning rates using HPO allready.

### Reproduction

```py
import torch
from diffusers import UNet1DModel, DDPMScheduler, UNet2DModel
from diffusers.utils.torch_utils import randn_tensor
from torch.nn.functional import mse_loss
import matplotlib.pyplot as plt

def test_diffusers(dimensions: int):
    device = torch.device("cpu")
    sample_size = 32
    generator = torch.Generator(device=device)

    noise_scheduler = DDPMScheduler(num_train_timesteps=1000)

    if dimensions == 1:
        model = UNet1DModel(
            sample_size=sample_size,
            in_channels=1,
            out_channels=1,
            block_out_channels=(
                64,
            ),
            down_block_types=(
                "DownBlock1D",
            ),
            up_block_types=(
                "UpBlock1D",
            ),
        ).to(device)

        shape = (16, 1, sample_size)
    elif dimensions == 2:
        model = UNet2DModel(
            sample_size=sample_size,
            in_channels=1,
            out_channels=1,
            block_out_channels=(
                64,
            ),
            down_block_types=(
                "DownBlock2D",
            ),
            up_block_types=(
                "UpBlock2D",
            ),
        ).to(device)

        shape = (16, 1, sample_size, sample_size)
    else:
        raise Exception("only 1D and 2D are supported")

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

    losses = []
    for i in list(range(100)):
        audio = torch.zeros(shape, device=device)
        noise = randn_tensor(shape, generator=generator, device=device, dtype=audio.dtype)

        batch_size = audio.shape[0]
        time_steps = torch.randint(0, noise_scheduler.num_train_timesteps, (batch_size,), device=device).long()

        noisy_voice = noise_scheduler.add_noise(audio, noise, time_steps)

        target = noise

        pred = model(noisy_voice, time_steps, return_dict=False)[0]

        loss = mse_loss(pred, target)
        loss.backward(loss)
        losses.append(loss.item())

        optimizer.step()
        optimizer.zero_grad()

    return losses

if __name__ == "__main__":
    losses_1d = test_diffusers(1)
    losses_2d = test_diffusers(2)

    plt.plot(losses_1d)
    plt.plot(losses_2d)
    plt.legend(["1D", "2D"], loc="upper right")

    plt.savefig(f"plot.png")
```

![Image](https://github.com/user-attachments/assets/7c6d93e8-8e8a-4c09-841e-14d190e6f53b)

### Logs

```shell

```

### System Info

diffusers: 0.32.2
torch: 2.6.0
Python: 3.10.14
OS: Manjaro Linux
CPU: AMD Ryzen 5 1600X
GPU: Nvidia RTX 3090 24GB
RAM: 32 GB

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UNet1DModel does not converge #11171

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UNet1DModel does not converge #11171

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions