Unexpected error from cudaGetDeviceCount() #240

Aniket22156 · 2025-01-15T13:02:32Z

This is for bugs only

Did you already ask in the discord?

Yes

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes

Describe the bug

(venv) root@27e72a4032d1:/workspace/ai-toolkit# python run.py config/examples/train_lora_flux_24gb.yaml
Running 1 job
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/cuda/init.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/timm/models/registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models
warnings.warn(f"Importing from {name} is deprecated, please import via timm.models", FutureWarning)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
{
"type": "sd_trainer",
"training_folder": "output",
"performance_log_every": 1000,
"device": "cuda:0",
"trigger_word": "DishaPatani",
"network": {
"type": "lora",
"linear": 16,
"linear_alpha": 16
},
"save": {
"dtype": "float16",
"save_every": 200,
"max_step_saves_to_keep": 8,
"push_to_hub": false
},
"datasets": [
{
"folder_path": "workspace/Disha",
"caption_ext": "txt",
"caption_dropout_rate": 0.05,
"shuffle_tokens": false,
"cache_latents_to_disk": true,
"resolution": [
512,
768,
1024
]
}
],
"train": {
"batch_size": 1,
"steps": 3000,
"gradient_accumulation_steps": 1,
"train_unet": true,
"train_text_encoder": false,
"gradient_checkpointing": true,
"noise_scheduler": "flowmatch",
"optimizer": "adamw8bit",
"lr": 0.0001,
"ema_config": {
"use_ema": true,
"ema_decay": 0.99
},
"dtype": "bf16"
},
"model": {
"name_or_path": "black-forest-labs/FLUX.1-dev",
"is_flux": true,
"quantize": true
},
"sample": {
"sampler": "flowmatch",
"sample_every": 250,
"width": 1024,
"height": 1024,
"prompts": [
"DishaPatani standing at beach in bikini during sunset looking at viewer",
"DishaPatani holding a coffee cup, in a beanie, sitting at a cafe",
"DishaPatani playing the guitar, on stage, singing a song, laser lights, punk rocker",
"DishaPatani holding a sign'I am not real' standing in bedroom at night"
],
"neg": "",
"seed": 88888888,
"walk_seed": true,
"guidance_scale": 4,
"sample_steps": 20
}
}
Using EMA
/workspace/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py:61: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead.
self.scaler = torch.cuda.amp.GradScaler()
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/amp/grad_scaler.py:132: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(

#############################################

Running job: disha

#############################################

Running 1 process
Loading Flux model
Loading transformer
Error running job: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

========================================
Result:

0 completed jobs
1 failure
========================================
Traceback (most recent call last):
File "/workspace/ai-toolkit/run.py", line 90, in
main()
File "/workspace/ai-toolkit/run.py", line 86, in main
raise e
File "/workspace/ai-toolkit/run.py", line 78, in main
job.run()
File "/workspace/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
process.run()
File "/workspace/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 1298, in run
self.sd.load_model()
File "/workspace/ai-toolkit/toolkit/stable_diffusion_model.py", line 562, in load_model
transformer.to(torch.device(self.quantize_device), dtype=dtype)
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 1090, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1340, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 927, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1326, in convert
return t.to(
^^^^^
File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 319, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
(venv) root@27e72a4032d1:/workspace/ai-toolkit#

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected error from cudaGetDeviceCount() #240

Unexpected error from cudaGetDeviceCount() #240

Aniket22156 commented Jan 15, 2025

Unexpected error from cudaGetDeviceCount() #240

Unexpected error from cudaGetDeviceCount() #240

Comments

Aniket22156 commented Jan 15, 2025

This is for bugs only

Describe the bug

Running job: disha