Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected error from cudaGetDeviceCount() #240

Open
Aniket22156 opened this issue Jan 15, 2025 · 0 comments
Open

Unexpected error from cudaGetDeviceCount() #240

Aniket22156 opened this issue Jan 15, 2025 · 0 comments

Comments

@Aniket22156
Copy link

This is for bugs only

Did you already ask in the discord?

Yes

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes

Describe the bug

(venv) root@27e72a4032d1:/workspace/ai-toolkit# python run.py config/examples/train_lora_flux_24gb.yaml
Running 1 job
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/cuda/init.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/timm/models/registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models
warnings.warn(f"Importing from {name} is deprecated, please import via timm.models", FutureWarning)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
{
"type": "sd_trainer",
"training_folder": "output",
"performance_log_every": 1000,
"device": "cuda:0",
"trigger_word": "DishaPatani",
"network": {
"type": "lora",
"linear": 16,
"linear_alpha": 16
},
"save": {
"dtype": "float16",
"save_every": 200,
"max_step_saves_to_keep": 8,
"push_to_hub": false
},
"datasets": [
{
"folder_path": "workspace/Disha",
"caption_ext": "txt",
"caption_dropout_rate": 0.05,
"shuffle_tokens": false,
"cache_latents_to_disk": true,
"resolution": [
512,
768,
1024
]
}
],
"train": {
"batch_size": 1,
"steps": 3000,
"gradient_accumulation_steps": 1,
"train_unet": true,
"train_text_encoder": false,
"gradient_checkpointing": true,
"noise_scheduler": "flowmatch",
"optimizer": "adamw8bit",
"lr": 0.0001,
"ema_config": {
"use_ema": true,
"ema_decay": 0.99
},
"dtype": "bf16"
},
"model": {
"name_or_path": "black-forest-labs/FLUX.1-dev",
"is_flux": true,
"quantize": true
},
"sample": {
"sampler": "flowmatch",
"sample_every": 250,
"width": 1024,
"height": 1024,
"prompts": [
"DishaPatani standing at beach in bikini during sunset looking at viewer",
"DishaPatani holding a coffee cup, in a beanie, sitting at a cafe",
"DishaPatani playing the guitar, on stage, singing a song, laser lights, punk rocker",
"DishaPatani holding a sign'I am not real' standing in bedroom at night"
],
"neg": "",
"seed": 88888888,
"walk_seed": true,
"guidance_scale": 4,
"sample_steps": 20
}
}
Using EMA
/workspace/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py:61: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead.
self.scaler = torch.cuda.amp.GradScaler()
/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/amp/grad_scaler.py:132: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(

#############################################

Running job: disha

#############################################

Running 1 process
Loading Flux model
Loading transformer
Error running job: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

========================================
Result:

  • 0 completed jobs
  • 1 failure
    ========================================
    Traceback (most recent call last):
    File "/workspace/ai-toolkit/run.py", line 90, in
    main()
    File "/workspace/ai-toolkit/run.py", line 86, in main
    raise e
    File "/workspace/ai-toolkit/run.py", line 78, in main
    job.run()
    File "/workspace/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
    process.run()
    File "/workspace/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 1298, in run
    self.sd.load_model()
    File "/workspace/ai-toolkit/toolkit/stable_diffusion_model.py", line 562, in load_model
    transformer.to(torch.device(self.quantize_device), dtype=dtype)
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 1090, in to
    return super().to(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
    ^^^^^^^^^^^^^^^^^^^^
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
    ^^^^^^^^^
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1326, in convert
    return t.to(
    ^^^^^
    File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 319, in _lazy_init
    torch._C._cuda_init()
    RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
    (venv) root@27e72a4032d1:/workspace/ai-toolkit#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant