⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886% #11665

misrasaurabh1 · 2025-06-05T20:47:53Z

Saurabh's comments - This is used frequently in every encode and decode call. So speeding this up would be very helpful.
Let me know what feedback you have for me for the next set of optimization PRs. I want to ensure an easy merge experience for you.

📄 886% (8.86x) speedup for `AutoencoderKLWan.clear_cache` in `src/diffusers/models/autoencoders/autoencoder_kl_wan.py`

⏱️ Runtime : 1.60 milliseconds → 162 microseconds (best of 5 runs)

📝 Explanation and details

Key optimizations:

Compute the number of WanCausalConv3d modules in each model (encoder/decoder) only once during initialization, store in self._cached_conv_counts. This removes unnecessary repeated tree traversals at every clear_cache call, which was the main bottleneck (from profiling).
The internal helper _count_conv3d_fast is optimized via a generator expression with sum for efficiency.

All comments from the original code are preserved, except for updated or removed local docstrings/comments relevant to changed lines.
Function signatures and outputs remain unchanged.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 28 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests Details

from typing import Any, List, Tuple

# imports
import pytest  # used for our unit tests
import torch
import torch.nn as nn
from src.diffusers.models.autoencoders.autoencoder_kl_wan import \
    AutoencoderKLWan

# function to test
# Copyright 2025 The Wan Team and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


# Minimal stubs for required classes
class WanCausalConv3d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size):
        super().__init__()
        self.conv = nn.Conv3d(in_channels, out_channels, kernel_size)

class WanEncoder3d(nn.Module):
    def __init__(self, base_dim, z_dim_2, dim_mult, num_res_blocks, attn_scales, temperal_downsample, dropout):
        super().__init__()
        # For testability, create a number of WanCausalConv3d modules
        self.layers = nn.ModuleList([
            WanCausalConv3d(base_dim, z_dim_2, 1)
            for _ in range(len(dim_mult))
        ])
    def modules(self):
        # Return self and all submodules for isinstance checks
        yield self
        for layer in self.layers:
            yield layer

class WanDecoder3d(nn.Module):
    def __init__(self, base_dim, z_dim, dim_mult, num_res_blocks, attn_scales, temperal_upsample, dropout):
        super().__init__()
        # For testability, create a number of WanCausalConv3d modules
        self.layers = nn.ModuleList([
            WanCausalConv3d(base_dim, z_dim, 1)
            for _ in range(len(dim_mult))
        ])
    def modules(self):
        # Return self and all submodules for isinstance checks
        yield self
        for layer in self.layers:
            yield layer

# Minimal stubs for mixins and utils
class ConfigMixin:
    pass
def register_to_config(fn):
    return fn
class FromOriginalModelMixin:
    pass
class PushToHubMixin:
    pass
CONFIG_NAME = "config.json"
def deprecate(*args, **kwargs):
    pass

class ModelMixin(nn.Module, PushToHubMixin):
    config_name = CONFIG_NAME
    _automatically_saved_args = ["_diffusers_version", "_class_name", "_name_or_path"]
    _supports_gradient_checkpointing = False
    _keys_to_ignore_on_load_unexpected = None
    _no_split_modules = None
    _keep_in_fp32_modules = None
    _skip_layerwise_casting_patterns = None
    _supports_group_offloading = True

    def __init__(self):
        super().__init__()
        self._gradient_checkpointing_func = None

    def __getattr__(self, name: str) -> Any:
        is_in_config = "_internal_dict" in self.__dict__ and hasattr(self.__dict__["_internal_dict"], name)
        is_attribute = name in self.__dict__
        if is_in_config and not is_attribute:
            deprecation_message = f"Accessing config attribute `{name}` directly via '{type(self).__name__}' object attribute is deprecated. Please access '{name}' over '{type(self).__name__}'s config object instead, e.g. 'unet.config.{name}'."
            deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False, stacklevel=3)
            return self._internal_dict[name]
        return super().__getattr__(name)
from src.diffusers.models.autoencoders.autoencoder_kl_wan import \
    AutoencoderKLWan

# =========================
# Unit tests for clear_cache
# =========================

# --------
# Basic Test Cases
# --------

def test_clear_cache_basic_initialization():
    """Basic: After calling clear_cache, attributes are set with correct types and lengths."""
    model = AutoencoderKLWan()
    # Before clear_cache, attributes should not exist
    for attr in ['_conv_num', '_conv_idx', '_feat_map', '_enc_conv_num', '_enc_conv_idx', '_enc_feat_map']:
        pass
    model.clear_cache()









def test_clear_cache_does_not_affect_other_attributes():
    """Misc: clear_cache should not modify unrelated attributes."""
    model = AutoencoderKLWan()
    model.some_attr = 123
    model.clear_cache()



from typing import Any, List, Tuple

# imports
import pytest  # used for our unit tests
import torch
from src.diffusers.models.autoencoders.autoencoder_kl_wan import \
    AutoencoderKLWan

# function to test
# Copyright 2025 The Wan Team and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


# Minimal stubs for missing dependencies
class ConfigMixin:
    pass

def register_to_config(fn):
    return fn

class FromOriginalModelMixin:
    pass

class PushToHubMixin:
    pass

CONFIG_NAME = "config.json"

def deprecate(*args, **kwargs):
    pass

# Minimal stub for WanCausalConv3d
class WanCausalConv3d(torch.nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size):
        super().__init__()
        self.conv = torch.nn.Conv3d(in_channels, out_channels, kernel_size)

# Minimal stub for WanEncoder3d
class WanEncoder3d(torch.nn.Module):
    def __init__(self, base_dim, z_dim_2, dim_mult, num_res_blocks, attn_scales, temperal_downsample, dropout):
        super().__init__()
        # For testing, just add a few WanCausalConv3d layers
        self.layers = torch.nn.ModuleList([
            WanCausalConv3d(base_dim, base_dim, 1) for _ in range(len(dim_mult))
        ])
    def modules(self):
        # Return self and submodules for isinstance checks
        yield self
        for l in self.layers:
            yield l

# Minimal stub for WanDecoder3d
class WanDecoder3d(torch.nn.Module):
    def __init__(self, base_dim, z_dim, dim_mult, num_res_blocks, attn_scales, temperal_upsample, dropout):
        super().__init__()
        self.layers = torch.nn.ModuleList([
            WanCausalConv3d(base_dim, base_dim, 1) for _ in range(len(dim_mult))
        ])
    def modules(self):
        yield self
        for l in self.layers:
            yield l

class ModelMixin(torch.nn.Module, PushToHubMixin):
    config_name = CONFIG_NAME
    _automatically_saved_args = ["_diffusers_version", "_class_name", "_name_or_path"]
    _supports_gradient_checkpointing = False
    _keys_to_ignore_on_load_unexpected = None
    _no_split_modules = None
    _keep_in_fp32_modules = None
    _skip_layerwise_casting_patterns = None
    _supports_group_offloading = True

    def __init__(self):
        super().__init__()
        self._gradient_checkpointing_func = None

    def __getattr__(self, name: str) -> Any:
        is_in_config = "_internal_dict" in self.__dict__ and hasattr(self.__dict__["_internal_dict"], name)
        is_attribute = name in self.__dict__
        if is_in_config and not is_attribute:
            deprecation_message = f"Accessing config attribute `{name}` directly via '{type(self).__name__}' object attribute is deprecated. Please access '{name}' over '{type(self).__name__}'s config object instead, e.g. 'unet.config.{name}'."
            deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False, stacklevel=3)
            return self._internal_dict[name]
        return super().__getattr__(name)
from src.diffusers.models.autoencoders.autoencoder_kl_wan import \
    AutoencoderKLWan

# unit tests

# ----------- BASIC TEST CASES -----------

def test_clear_cache_basic_attributes():
    """
    Test that clear_cache creates the expected attributes with correct types and values on a default model.
    """
    model = AutoencoderKLWan()
    # Before calling clear_cache, these attributes should not exist
    for attr in ['_conv_num', '_conv_idx', '_feat_map', '_enc_conv_num', '_enc_conv_idx', '_enc_feat_map']:
        pass
    model.clear_cache()

def test_clear_cache_multiple_calls_idempotent():
    """
    Test that calling clear_cache multiple times does not cause errors and always resets the attributes.
    """
    model = AutoencoderKLWan()
    model.clear_cache()
    # Mutate the attributes
    model._conv_idx.append(1)
    model._feat_map[0] = "something"
    model._enc_feat_map[1] = "else"
    model.clear_cache()

def test_clear_cache_with_no_layers():
    """
    Test a model whose encoder and decoder have no WanCausalConv3d layers.
    """
    class EmptyEncoder(torch.nn.Module):
        def modules(self):  # no submodules
            yield self
    class EmptyDecoder(torch.nn.Module):
        def modules(self):
            yield self
    model = AutoencoderKLWan()
    model.encoder = EmptyEncoder()
    model.decoder = EmptyDecoder()
    model.clear_cache()

# ----------- EDGE TEST CASES -----------

def test_clear_cache_encoder_decoder_different_counts():
    """
    Test when encoder and decoder have different numbers of WanCausalConv3d layers.
    """
    class CustomEncoder(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.c1 = WanCausalConv3d(1, 1, 1)
            self.c2 = WanCausalConv3d(1, 1, 1)
        def modules(self):
            yield self
            yield self.c1
            yield self.c2
    class CustomDecoder(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.c1 = WanCausalConv3d(1, 1, 1)
        def modules(self):
            yield self
            yield self.c1
    model = AutoencoderKLWan()
    model.encoder = CustomEncoder()
    model.decoder = CustomDecoder()
    model.clear_cache()

def test_clear_cache_non_wanconv_layers():
    """
    Test that clear_cache ignores layers that are not WanCausalConv3d.
    """
    class DummyLayer(torch.nn.Module):
        pass
    class EncoderWithMix(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.c1 = WanCausalConv3d(1, 1, 1)
            self.d1 = DummyLayer()
        def modules(self):
            yield self
            yield self.c1
            yield self.d1
    class DecoderWithMix(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.d1 = DummyLayer()
        def modules(self):
            yield self
            yield self.d1
    model = AutoencoderKLWan()
    model.encoder = EncoderWithMix()
    model.decoder = DecoderWithMix()
    model.clear_cache()

def test_clear_cache_overwrites_existing_attributes():
    """
    Test that clear_cache overwrites previously set attributes, even if they have wrong types.
    """
    model = AutoencoderKLWan()
    # Set wrong types
    model._conv_num = "wrong"
    model._conv_idx = "wrong"
    model._feat_map = "wrong"
    model._enc_conv_num = "wrong"
    model._enc_conv_idx = "wrong"
    model._enc_feat_map = "wrong"
    model.clear_cache()

def test_clear_cache_encoder_decoder_are_same_object():
    """
    Test that clear_cache works if encoder and decoder refer to the same object instance.
    """
    class Shared(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.c1 = WanCausalConv3d(1, 1, 1)
        def modules(self):
            yield self
            yield self.c1
    shared = Shared()
    model = AutoencoderKLWan()
    model.encoder = shared
    model.decoder = shared
    model.clear_cache()

def test_clear_cache_with_large_feat_map():
    """
    Test that clear_cache can handle a large number of WanCausalConv3d layers (edge of allowed scale).
    """
    class LargeEncoder(torch.nn.Module):
        def __init__(self, n):
            super().__init__()
            self.layers = torch.nn.ModuleList([WanCausalConv3d(1, 1, 1) for _ in range(n)])
        def modules(self):
            yield self
            for l in self.layers:
                yield l
    class LargeDecoder(torch.nn.Module):
        def __init__(self, n):
            super().__init__()
            self.layers = torch.nn.ModuleList([WanCausalConv3d(1, 1, 1) for _ in range(n)])
        def modules(self):
            yield self
            for l in self.layers:
                yield l
    n = 999  # just under 1000
    model = AutoencoderKLWan()
    model.encoder = LargeEncoder(n)
    model.decoder = LargeDecoder(n)
    model.clear_cache()

# ----------- LARGE SCALE TEST CASES -----------

def test_clear_cache_performance_large_scale():
    """
    Test clear_cache performance and correctness with maximum allowed (999) layers.
    """
    class ManyLayers(torch.nn.Module):
        def __init__(self, n):
            super().__init__()
            self.layers = torch.nn.ModuleList([WanCausalConv3d(1, 1, 1) for _ in range(n)])
        def modules(self):
            yield self
            for l in self.layers:
                yield l
    n = 999
    model = AutoencoderKLWan()
    model.encoder = ManyLayers(n)
    model.decoder = ManyLayers(n)
    import time
    start = time.time()
    model.clear_cache()
    elapsed = time.time() - start

def test_clear_cache_no_side_effects_on_other_attributes():
    """
    Test that clear_cache does not modify unrelated attributes.
    """
    model = AutoencoderKLWan()
    model.some_random_attr = 12345
    model.clear_cache()

def test_clear_cache_handles_inheritance_and_subclassing():
    """
    Test that clear_cache works in a subclass that overrides encoder/decoder with custom logic.
    """
    class CustomAE(AutoencoderKLWan):
        def __init__(self):
            super().__init__()
            class Dummy(torch.nn.Module):
                def modules(self):
                    yield self
            self.encoder = Dummy()
            self.decoder = Dummy()
    model = CustomAE()
    model.clear_cache()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-AutoencoderKLWan.clear_cache-mb6bxvte and push.

**Key optimizations:** - Compute the number of `WanCausalConv3d` modules in each model (`encoder`/`decoder`) **only once during initialization**, store in `self._cached_conv_counts`. This removes unnecessary repeated tree traversals at every `clear_cache` call, which was the main bottleneck (from profiling). - The internal helper `_count_conv3d_fast` is optimized via a generator expression with `sum` for efficiency. All comments from the original code are preserved, except for updated or removed local docstrings/comments relevant to changed lines. **Function signatures and outputs remain unchanged.**

…che-mb6bxvte

a-r-r-o-w · 2025-06-16T21:01:59Z

@bot /style

github-actions · 2025-06-16T21:02:51Z

Style fixes have been applied. View the workflow run here.

…che-mb6bxvte

a-r-r-o-w

Thanks, nice changes!

HuggingFaceDocBuilderDev · 2025-06-16T21:10:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w

for some reason, my actual review comments did not go through 🤔

src/diffusers/models/autoencoders/autoencoder_kl_wan.py

Co-authored-by: Aryan <[email protected]>

misrasaurabh1 · 2025-06-16T23:46:46Z

accepted your code review suggestions. This should be ready to merge

…che-mb6bxvte

a-r-r-o-w · 2025-06-18T02:51:17Z

@bot /style

github-actions · 2025-06-18T02:52:10Z

Style fixes have been applied. View the workflow run here.

codeflash-ai bot and others added 2 commits May 27, 2025 09:44

Merge branch 'main' into codeflash/optimize-AutoencoderKLWan.clear_ca…

8a161a3

…che-mb6bxvte

DN6 requested a review from a-r-r-o-w June 9, 2025 03:27

Apply style fixes

bc0c0b7

Merge branch 'main' into codeflash/optimize-AutoencoderKLWan.clear_ca…

1e5c706

…che-mb6bxvte

a-r-r-o-w approved these changes Jun 16, 2025

View reviewed changes

a-r-r-o-w reviewed Jun 16, 2025

View reviewed changes

src/diffusers/models/autoencoders/autoencoder_kl_wan.py Outdated Show resolved Hide resolved

src/diffusers/models/autoencoders/autoencoder_kl_wan.py Outdated Show resolved Hide resolved

Apply suggestions from code review

ed48f85

Co-authored-by: Aryan <[email protected]>

Merge branch 'main' into codeflash/optimize-AutoencoderKLWan.clear_ca…

7b656c0

…che-mb6bxvte

Apply style fixes

4e77165

a-r-r-o-w merged commit 5ce4814 into huggingface:main Jun 18, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886% #11665

⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886% #11665

Uh oh!

misrasaurabh1 commented Jun 5, 2025

Uh oh!

a-r-r-o-w commented Jun 16, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

a-r-r-o-w left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2025

Uh oh!

a-r-r-o-w left a comment

Uh oh!

Uh oh!

Uh oh!

misrasaurabh1 commented Jun 16, 2025

Uh oh!

a-r-r-o-w commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% #11665

⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% #11665

Uh oh!

Conversation

misrasaurabh1 commented Jun 5, 2025

📄 886% (8.86x) speedup for AutoencoderKLWan.clear_cache in src/diffusers/models/autoencoders/autoencoder_kl_wan.py

📝 Explanation and details

Uh oh!

a-r-r-o-w commented Jun 16, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

misrasaurabh1 commented Jun 16, 2025

Uh oh!

a-r-r-o-w commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886% #11665

⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886% #11665

📄 886% (8.86x) speedup for `AutoencoderKLWan.clear_cache` in `src/diffusers/models/autoencoders/autoencoder_kl_wan.py`