[Bug]:ZLUDA doesn't use AMD gpu, only runs on CPU. EDIT: speed comparison DirectML, ZLUDA, ROCK fot gfx1201

### Checklist

- [ ] The issue exists after disabling all extensions
- [x] The issue exists on a clean installation of webui
- [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
- [x] The issue exists in the current version of the webui
- [ ] The issue has not been reported before recently
- [ ] The issue has been reported before but has not been fixed yet

### What happened?

ZLUDA version uses only CPU. While after patching ROCm to see and use gfx1201 (RX 9070) and setting HIP_VISIBLE_DEVICES=1 env, it still stubbornly runs on CPU only (R7 7700), both gfx1036 and gfx1201 are idle.

### Steps to reproduce the problem

1. Install clean stable-diffusion-webui-amdgpu
2. Follow instructions from here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides#amd-automatic1111-with-zluda
3. Patch ROCm 6.2.4 to use gfx1201 (RX 9070 and 9070XT), method here: https://github.com/IAHispano/Applio/issues/1005#issue-2936981353
4. Launch webui-user.bat with COMMANDLINE_ARGS=--use-zluda --update-check --skip-ort --no-half
5. Start generation. Using win task manager you can see that it uses CPU only.

### What should have happened?

It should utilize gfx1201 (GPU RX 9070) in full

### What browsers do you use to access the UI ?

Google Chrome

### Sysinfo

[sysinfo.json](https://github.com/user-attachments/files/20824726/sysinfo.json)

### Console logs

```Shell
venv "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-37-g721f6391
Commit hash: 721f6391993ac63fd246603735e2eb2e719ffac0
ROCm: agents=['gfx1201']
ROCm: version=6.2, using agent gfx1201
ZLUDA support: experimental
ZLUDA load: path='D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\.zluda' nightly=False
Skipping onnxruntime installation.
You are up to date with the most recent release.
D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\__init__.py:936: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\cuda\CUDAFunctions.cpp:109.)
  r = torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count
D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --update-check --skip-ort --no-half
Warning: caught exception 'CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.', memory monitor disabled
Loading weights [6ce0161689] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 6.6s (prepare environment: 8.5s, initialize shared: 0.1s, other imports: 0.2s, load scripts: 0.3s, create ui: 0.5s, gradio launch: 0.1s).
Applying attention optimization: InvokeAI... done.
Model loaded in 1.6s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 0.6s).
  4%|██▉                                                                                | 1/28 [00:30<13:41, 30.41s/it]
T
```

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]:ZLUDA doesn't use AMD gpu, only runs on CPU. EDIT: speed comparison DirectML, ZLUDA, ROCK fot gfx1201 #614

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]:ZLUDA doesn't use AMD gpu, only runs on CPU. EDIT: speed comparison DirectML, ZLUDA, ROCK fot gfx1201 #614

Description

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions