Description
Checklist
- The issue exists after disabling all extensions
- The issue exists on a clean installation of webui
- The issue is caused by an extension, but I believe it is caused by a bug in the webui
- The issue exists in the current version of the webui
- The issue has not been reported before recently
- The issue has been reported before but has not been fixed yet
What happened?
Environment:
- GPU: AMD Radeon RX 9070 XT with 16GB GDDR6 VRAM
- Operating System: [Your Windows Version, e.g., Windows 10 Pro 22H2 / Windows 11 23H2]
- **AMD Driver Version: 25.5.1
- Python Version: 3.10.6 (within WebUI venv)
- Stable Diffusion WebUI DirectML Version: v1.10.1-amd-36-g679c645e
Problem Description:
Despite torch-directml
successfully detecting my AMD GPU, the Stable Diffusion WebUI consistently falls back to CPU for SDXL image generation. The GPU remains at 0% utilization, while the CPU is pinned at 100% and system RAM usage is extremely high (15GB+ for SDXL). This results in excessively long generation times (e.g., 11-30 minutes for a 1024x1024 SDXL image).
The core issue appears to be a persistent failure during model loading related to copying tensors, specifically the VAE, to the GPU device.
Steps to Reproduce:
- Launch Stable Diffusion WebUI via
webui-user.bat
. - Select an SDXL model (e.g.,
sd_xl_base_1.0.safetensors
). - Ensure a compatible
sdxl_vae.safetensors
is selected in the UI (placed inmodels/VAE
). - Set image resolution to 1024x1024.
- Enter any simple prompt (e.g., "a cat").
- Initiate image generation.
Expected Behavior:
The AMD Radeon RX 9070 XT GPU should be fully utilized (high 3D/Compute usage in Task Manager, significant VRAM utilization) to process the SDXL model, resulting in generation times significantly faster than CPU-only processing (e.g., typically within 3-8 minutes for 1024x1024 SDXL on DirectML for similar hardware).
Observed Behavior:
During SDXL image generation (1024x1024):
- GPU Usage (Task Manager Performance Tab - 3D/Compute): Consistently 0%.
- Dedicated GPU Memory (VRAM): Remains very low (e.g., 1.9GB out of 16GB available).
- CPU Usage: Pinned at 100%.
- System RAM Usage: Extremely high (e.g., over 15GB).
- Generation Time: Excessively long (e.g., 11 minutes for a 20-step 1024x1024 SDXL image, with estimates up to 30 minutes for other runs).
- The final generated image is correct, indicating the VAE eventually functions despite the underlying loading issue.
Relevant Logs/Errors:
During initial model loading, the following errors were observed consistently in the console (these were from earlier attempts, but describe the underlying problem causing CPU fallback):
While copying the parameter named "first_stage_model.decoder.up.3.block.2.norm2.weight", whose dimensions in the model are torch.Size([512]) and whose dimensions in the checkpoint are torch.Size([512]), an exception occurred : ('Cannot copy out of meta tensor; no data!',).
... (many similar lines for first_stage_model.decoder parameters) ...
While copying the parameter named "first_stage_model.post_quant_conv.bias", whose dimensions in the model are torch.Size([4]) and whose dimensions in the checkpoint are torch.Size([4]), an exception occurred : ('Cannot copy out of meta tensor; no data!',).
(Note: While the "meta tensor" errors might not appear in every latest log, the symptoms (CPU pinned, 0% GPU) indicate this underlying failure to load model data onto the GPU is still the root cause.)
Additionally, the following non-critical warning is also present during model loading but does not prevent the model from eventually loading:
Repository Not Found for url: https://huggingface.co/None/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Invalid username or password.
...
Failed to create model quickly; will retry using slow method. Model loaded in 18.7s
webui-user.bat
Content:
@echo off
set HSA_OVERRIDE_GFX_VERSION=10.3.0
set TORCH_COMMAND=pip install torch-directml
git pull
set COMMANDLINE_ARGS=--autolaunch --skip-torch-cuda-test --no-half --no-half-vae
call webui.bat
Troubleshooting Performed (and their results):
- Initial Diagnosis: Identified extremely slow SDXL generation (20+ mins for 1024x1024) and suspected CPU fallback due to Task Manager observations.
xformers
Removal: Previously encounteredNotImplementedError: No operator found for memory_efficient_attention_forward
. Uninstalledxformers
viapip uninstall xformers --yes
.- Result: Eliminated the
NotImplementedError
, but CPU fallback for SDXL persisted.
- Result: Eliminated the
COMMANDLINE_ARGS
Adjustments: Ensuredwebui-user.bat
included--autolaunch --skip-torch-cuda-test --no-half
. Later added--no-half-vae
.- Result: No change in GPU utilization for SDXL.
torch-directml
Device Detection: Ran Python commands withinvenv
:import torch_directml
print(torch_directml.is_available())
-> Output:True
print(torch_directml.device_name(0))
-> Output:AMD Radeon RX 9070 XT
- Result: Confirmed that
torch-directml
can detect and identify the GPU.
- SD 1.5 Test: Performed a generation with the built-in SD 1.5 model at 512x512.
- Result: Generated a "perfect" image in approx. 2 minutes. While this is faster than SDXL, it's still slower than optimal for SD 1.5 on this hardware (expected 15-45s). CPU was still observed as highly active.
- Clean Reinstallation of PyTorch/DirectML:
- Uninstalled
torch
,torchvision
,torchaudio
,torch-directml
. - Reinstalled using the simplified command:
pip install torch-directml
. - Result: Installation completed successfully.
- Uninstalled
- SDXL Test after Reinstall: Performed a 1024x1024 SDXL generation using
sd_xl_base_1.0.safetensors
andsdxl_vae.safetensors
(100MB version).- Result: No change. GPU remained at 0% utilization, CPU pinned, high system RAM usage. Generation time approx. 11 minutes.
Steps to reproduce the problem
- Ensure you have the
stable-diffusion-webui-directml
repository cloned to your system. - Set your
webui-user.bat
file to the following:@echo off set HSA_OVERRIDE_GFX_VERSION=10.3.0 set TORCH_COMMAND=pip install torch-directml git pull set COMMANDLINE_ARGS=--autolaunch --skip-torch-cuda-test --no-half --no-half-vae call webui.bat
- Ensure
torch-directml
and its dependencies are correctly installed by manually running.\venv\Scripts\activate
thenpip install torch-directml
in thestable-diffusion-webui-directml
directory (after uninstalling previous versions if necessary). - Download the official
sd_xl_base_1.0.safetensors
model and place it inmodels/Stable-diffusion/
. - Download the official
sdxl_vae.safetensors
and place it inmodels/VAE/
(create the folder if it doesn't exist). - Double-click
webui-user.bat
to launch the WebUI. - In the WebUI interface:
- Select
sd_xl_base_1.0.safetensors
as the main Stable Diffusion checkpoint. - Explicitly select
sdxl_vae.safetensors
from the VAE dropdown. - Set the image resolution to 1024x1024.
- Enter a simple prompt (e.g., "a cat").
- Select
- Click "Generate".
- While generation is in progress, open Windows Task Manager (Ctrl + Shift + Esc), navigate to the "Performance" tab, and observe your GPU (specifically "3D" or "Compute" graphs), CPU, and System RAM usage.
What should have happened?
The AMD Radeon RX 9070 XT GPU should be primarily utilized for the image generation process. During generation:
- GPU Usage (Task Manager Performance Tab - 3D/Compute): Should show significant activity (e.g., 50% or higher utilization).
- Dedicated GPU Memory (VRAM): Should show high utilization (e.g., above 10GB for 1024x1024 SDXL).
- CPU Usage: Should be lower, handling coordination and data pre-processing, but not pinned at 100%.
- System RAM Usage: Should be lower, as the model and tensors should reside in VRAM.
- Generation Time: Should be significantly faster than CPU-only processing, typically completing a 20-step 1024x1024 SDXL image within 3-8 minutes on similar DirectML-enabled hardware.
What browsers do you use to access the UI ?
Mozilla Firefox
Sysinfo
Console logs
Microsoft Windows [Version 10.0.26100.4061]
(c) Microsoft Corporation. All rights reserved.
C:\Users\thesa>cd C:\stable-diffusion-webui-directml
C:\stable-diffusion-webui-directml>.\venv\Scripts\activate
(venv) C:\stable-diffusion-webui-directml>pip uninstall torch torchvision torchaudio --yes
Found existing installation: torch 2.4.1
Uninstalling torch-2.4.1:
Successfully uninstalled torch-2.4.1
Found existing installation: torchvision 0.19.1
Uninstalling torchvision-0.19.1:
Successfully uninstalled torchvision-0.19.1
WARNING: Skipping torchaudio as it is not installed.
(venv) C:\stable-diffusion-webui-directml>pip uninstall torch-directml --yes
Found existing installation: torch-directml 0.2.5.dev240914
Uninstalling torch-directml-0.2.5.dev240914:
Successfully uninstalled torch-directml-0.2.5.dev240914
(venv) C:\stable-diffusion-webui-directml>pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 torch-directml
Looking in indexes: https://download.pytorch.org/whl/rocm5.4.2
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch
(venv) C:\stable-diffusion-webui-directml>pip uninstall torch torchvision torchaudio --yes
WARNING: Skipping torch as it is not installed.
WARNING: Skipping torchvision as it is not installed.
WARNING: Skipping torchaudio as it is not installed.
(venv) C:\stable-diffusion-webui-directml>pip uninstall torch-directml --yes
WARNING: Skipping torch-directml as it is not installed.
(venv) C:\stable-diffusion-webui-directml>pip install torch-directml
Collecting torch-directml
Using cached torch_directml-0.2.5.dev240914-cp310-cp310-win_amd64.whl.metadata (6.2 kB)
Collecting torch==2.4.1 (from torch-directml)
Using cached torch-2.4.1-cp310-cp310-win_amd64.whl.metadata (27 kB)
Collecting torchvision==0.19.1 (from torch-directml)
Using cached torchvision-0.19.1-cp310-cp310-win_amd64.whl.metadata (6.1 kB)
Requirement already satisfied: filelock in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.4.1->torch-directml) (3.18.0)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.4.1->torch-directml) (4.13.2)
Requirement already satisfied: sympy in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.4.1->torch-directml) (1.14.0)
Requirement already satisfied: networkx in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.4.1->torch-directml) (3.4.2)
Requirement already satisfied: jinja2 in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.4.1->torch-directml) (3.1.6)
Requirement already satisfied: fsspec in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torch==2.4.1->torch-directml) (2025.5.0)
Requirement already satisfied: numpy in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torchvision==0.19.1->torch-directml) (1.26.2)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from torchvision==0.19.1->torch-directml) (9.5.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from jinja2->torch==2.4.1->torch-directml) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\stable-diffusion-webui-directml\venv\lib\site-packages (from sympy->torch==2.4.1->torch-directml) (1.3.0)
Using cached torch_directml-0.2.5.dev240914-cp310-cp310-win_amd64.whl (9.0 MB)
Using cached torch-2.4.1-cp310-cp310-win_amd64.whl (199.4 MB)
Using cached torchvision-0.19.1-cp310-cp310-win_amd64.whl (1.3 MB)
Installing collected packages: torch, torchvision, torch-directml
Successfully installed torch-2.4.1 torch-directml-0.2.5.dev240914 torchvision-0.19.1
(venv) C:\stable-diffusion-webui-directml>deactivate
C:\stable-diffusion-webui-directml>webui-user.bat
Already up to date.
venv "C:\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
NVIDIA driver was found.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.10.1-amd-36-g679c645e
Commit hash: 679c645ec84e40dd14d527dbeb03fab259087187
WARNING: you should not skip torch test unless you want CPU to work.
C:\stable-diffusion-webui-directml\venv\lib\site-packages\onnxscript\converter.py:816: FutureWarning: 'onnxscript.values.Op.param_schemas' is deprecated in version 0.1 and will be removed in the future. Please use '.op_signature' instead.
param_schemas = callee.param_schemas()
C:\stable-diffusion-webui-directml\venv\lib\site-packages\onnxscript\converter.py:816: FutureWarning: 'onnxscript.values.OnnxFunction.param_schemas' is deprecated in version 0.1 and will be removed in the future. Please use '.op_signature' instead.
param_schemas = callee.param_schemas()
C:\stable-diffusion-webui-directml\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
Launching Web UI with arguments: --autolaunch --skip-torch-cuda-test --no-half --no-half-vae
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
ONNX: version=1.22.0 provider=CUDAExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
Loading weights [31e35c80fc] from C:\stable-diffusion-webui-directml\models\Stable-diffusion\sd_xl_base_1.0.safetensors
Running on local URL: http://127.0.0.1:7860
Creating model from config: C:\stable-diffusion-webui-directml\repositories\generative-models\configs\inference\sd_xl_base.yaml
To create a public link, set `share=True` in `launch()`.
Startup time: 12.7s (prepare environment: 19.7s, initialize shared: 0.8s, load scripts: 0.7s, create ui: 0.5s, gradio launch: 0.5s).
creating model quickly: OSError
Traceback (most recent call last):
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 409, in hf_raise_for_status
response.raise_for_status()
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\requests\models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\utils\hub.py", line 342, in cached_file
resolved_file = hf_hub_download(
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py", line 1008, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py", line 1115, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py", line 1643, in _raise_on_head_call_error
raise head_call_error
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py", line 1531, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py", line 1448, in get_hf_file_metadata
r = _request_wrapper(
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py", line 286, in _request_wrapper
response = _request_wrapper(
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py", line 310, in _request_wrapper
hf_raise_for_status(response)
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 459, in hf_raise_for_status
raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-682ec8dc-5d76afce6cadc6ad5a4fc169;6d81d0b7-e77f-4183-af74-72ba481f67f5)
Repository Not Found for url: https://huggingface.co/None/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\thesa\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File "C:\Users\thesa\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\thesa\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\stable-diffusion-webui-directml\modules\initialize.py", line 149, in load_model
shared.sd_model # noqa: B018
File "C:\stable-diffusion-webui-directml\modules\shared_items.py", line 190, in sd_model
return modules.sd_models.model_data.get_sd_model()
File "C:\stable-diffusion-webui-directml\modules\sd_models.py", line 693, in get_sd_model
load_model()
File "C:\stable-diffusion-webui-directml\modules\sd_models.py", line 831, in load_model
sd_model = instantiate_from_config(sd_config.model, state_dict)
File "C:\stable-diffusion-webui-directml\modules\sd_models.py", line 775, in instantiate_from_config
return constructor(**params)
File "C:\stable-diffusion-webui-directml\repositories\generative-models\sgm\models\diffusion.py", line 61, in __init__
self.conditioner = instantiate_from_config(
File "C:\stable-diffusion-webui-directml\repositories\generative-models\sgm\util.py", line 175, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "C:\stable-diffusion-webui-directml\repositories\generative-models\sgm\modules\encoders\modules.py", line 88, in __init__
embedder = instantiate_from_config(embconfig)
File "C:\stable-diffusion-webui-directml\repositories\generative-models\sgm\util.py", line 175, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "C:\stable-diffusion-webui-directml\repositories\generative-models\sgm\modules\encoders\modules.py", line 361, in __init__
self.transformer = CLIPTextModel.from_pretrained(version)
File "C:\stable-diffusion-webui-directml\modules\sd_disable_initialization.py", line 68, in CLIPTextModel_from_pretrained
res = self.CLIPTextModel_from_pretrained(None, *model_args, config=pretrained_model_name_or_path, state_dict={}, **kwargs)
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\modeling_utils.py", line 262, in _wrapper
return func(*args, **kwargs)
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\modeling_utils.py", line 3540, in from_pretrained
resolved_config_file = cached_file(
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\utils\hub.py", line 365, in cached_file
raise EnvironmentError(
OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
Failed to create model quickly; will retry using slow method.
C:\stable-diffusion-webui-directml\modules\safe.py:156: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
return unsafe_torch_load(filename, *args, **kwargs)
Applying attention optimization: InvokeAI... done.
Model loaded in 20.4s (load weights from disk: 0.7s, create model: 10.3s, apply weights to model: 7.0s, apply float(): 1.8s, calculate empty prompt: 0.4s).
Loading VAE weights specified in settings: C:\stable-diffusion-webui-directml\models\VAE\fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors
Applying attention optimization: InvokeAI... done.
VAE weights loaded.
Calculating sha256 for C:\stable-diffusion-webui-directml\models\VAE\fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors: 235745af8d86bf4a4c1b5b4f529868b37019a10f7c0b2e79ad0abca3a22bc6e1
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [09:38<00:00, 28.92s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [09:31<00:00, 28.55s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [09:31<00:00, 29.34s/it]
Additional information
No response