-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any optimizations that can be done 30+GB VRAM during training #3
Comments
Hi @gt732 I updated the last reply #2 (comment), it could even get reduced to 8GB GPU memory. |
@YuliangXiu thanks for the update I’ll try following the documentation and see if I can reduce the VRAM usage during training. |
@YuliangXiu I was able to get the initial model trained using the following arguments. The VRAM usage spiked between 8-16GB. accelerate launch multi_concepts/train.py \
--pretrained_model_name_or_path $BASE_MODEL \
--project_name ${SUBJECT_NAME} \
--instance_data_dir ${INPUT_DIR} \
--output_dir ${EXP_DIR} \
--class_data_dir data/multi_concepts_data \
--train_batch_size 1 \
--phase1_train_steps 1000 \
--phase2_train_steps 4000 \
--lr_step_rules "1:2000,0.1" \
--initial_learning_rate 5e-4 \
--learning_rate 2e-6 \
--prior_loss_weight 1.0 \
--syn_loss_weight "2.0,2.0" \
--mask_loss_weight 1.0 \
--lambda_attention 1e-2 \
--img_log_steps 1000 \
--checkpointing_steps 1000 \
--use_view_prompt \
--log_checkpoints \
--boft_block_num=8 \
--boft_block_size=0 \
--boft_n_butterfly_factor=1 \
--lora_r=32 \
--enable_xformers_memory_efficient_attention \
--use_peft ${peft_type} \
--wandb_mode "offline" \
--use_view_prompt \
--do_not_apply_masked_prior \
--mixed_precision fp16 \
--gradient_checkpointing \
--use_8bit_adam \
--set_grads_to_none \ Now the last challenge is getting this step to run python cores/main_mc.py \
--config configs/tech_mc_geometry.yaml \
--exp_dir ${EXP_DIR} \
--sub_name ${SUBJECT_NAME} \
--use_peft ${peft_type} \
--use_shape_description \ I'm running into compiling issues when running the trainer. I tried a ton of different methods to try and fix it but there's something wrong with the gcc libraries in my conda env. This is being tested on Windows 10 WSL Ubuntu 24.04.1 LTS ERROR Memory usage statistics:
Maximum number of tetrahedra: 5333413
Maximum number of tet blocks (blocksize = 8188): 652
Approximate memory for tetrahedral mesh (bytes): 752,983,904
Approximate memory for extra pointers (bytes): 12,066,080
Approximate memory for algorithms (bytes): 134,400
Approximate memory for working arrays (bytes): 210,901,848
Approximate total used memory (bytes): 976,086,232
shape of vertices: (834173, 3), shape of grids: (4986585, 4)
MESA: error: ZINK: failed to choose pdev
glx: failed to create drisw screen
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [03:15<00:00, 5.11it/s]
fitted mesh with num_vertex 481862, num_faces 890574
[INFO] loading stable diffusion...
[INFO] using hugging face custom model key: results/human/yuliang
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 16.43it/s]
Added 7 tokens
[INFO] loaded PEFT adapters!
[INFO] loaded stable diffusion!
get rgb text prompt
get normal text prompt
[INFO] Trainer: df | 2024-10-03_10-02-14 | cuda | fp32 | results/human/yuliang/geometry
[INFO] #parameters: 11480403
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
++> Evaluate results/human/yuliang/geometry at epoch 0 ...
0% 0/10 [00:00<?, ?it/s]/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:663: UserWarning: Graph break due to unsupported builtin _gridencoder.PyCapsule.grid_encode_forward. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph.
torch._dynamo.utils.warn_once(msg)
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
collect2: error: ld returned 1 exit status
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
collect2: error: ld returned 1 exit status
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
collect2: error: ld returned 1 exit status
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find /lib/x86_64-linux-gnu/libc.so.6
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
/home/scheme/anaconda3/envs/PuzzleAvatar/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libc.so.6 when searching for /lib/x86_64-linux-gnu/libc.so.6
collect2: error: ld returned 1 exit status
Traceback (most recent call last):
File "/home/scheme/PuzzleAvatar/cores/main_mc.py", line 379, in <module>
trainer.train(train_loader, valid_loader, max_epoch)
File "/home/scheme/PuzzleAvatar/cores/lib/trainer.py", line 723, in train
self.evaluate_one_epoch(valid_loader)
File "/home/scheme/PuzzleAvatar/cores/lib/trainer.py", line 1032, in evaluate_one_epoch
preds, preds_depth, preds_normal, preds_alpha, loss = self.eval_step(data)
File "/home/scheme/PuzzleAvatar/cores/lib/trainer.py", line 594, in eval_step
outputs = self.model(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
return self._torchdynamo_orig_callable(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
result = self._inner_convert(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
return _compile(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
return StrobelightCompileTimeProfiler.profile_compile_time(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
return func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
out_code = transform_code_object(code, transform)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
transformations(instructions, code_options)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
return fn(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
tracer.run()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
super().run()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
while self.step():
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 497, in wrapper
return handle_graph_break(self, inst, speculation.reason)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 566, in handle_graph_break
self.output.compile_subgraph(self, reason=reason)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1123, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1390, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/__init__.py", line 1951, in __call__
return compile_fx(model_, inputs_, config_patches=self.config)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
return aot_autograd(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 69, in __call__
cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
compiled_fn, _ = create_aot_dispatcher_function(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
compiled_fn, fw_metadata = compiler_fn(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 168, in aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base
return inner_compile(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1749, in compile_to_fn
return self.compile_to_module().call
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1678, in compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1638, in codegen
self.scheduler.codegen()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2741, in codegen
self.get_backend(device).codegen_node(node)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 69, in codegen_node
return self._triton_scheduling.codegen_node(node)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1148, in codegen_node
return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1317, in codegen_node_schedule
src_code = kernel.codegen_kernel()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2159, in codegen_kernel
**self.inductor_meta_common(),
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2047, in inductor_meta_common
"backend_hash": torch.utils._triton.triton_hash_with_backend(),
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/utils/_triton.py", line 63, in triton_hash_with_backend
backend = triton_backend()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/torch/utils/_triton.py", line 49, in triton_backend
target = driver.active.get_current_target()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
self._initialize_obj()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives[0]()
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
self.utils = CudaUtils() # TODO: make static
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/runtime/build.py", line 48, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
CalledProcessError: Command '['/home/scheme/anaconda3/envs/PuzzleAvatar/bin/x86_64-conda-linux-gnu-cc', '/tmp/tmp6m8__m0s/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmp6m8__m0s/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/backends/nvidia/lib', '-L/usr/lib/wsl/lib', '-L/lib/x86_64-linux-gnu', '-I/home/scheme/anaconda3/envs/PuzzleAvatar/lib/python3.10/site-packages/triton/backends/nvidia/include', '-I/tmp/tmp6m8__m0s', '-I/home/scheme/anaconda3/envs/PuzzleAvatar/include/python3.10']' returned non-zero exit status 1.
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
0% 0/10 [00:03<?, ?it/s] ENVIRONMENT INFO System InformationOS InformationDistributor ID: Ubuntu Python VersionPython 3.10.14 Conda Environment Packagespackages in environment at /home/scheme/anaconda3/envs/PuzzleAvatar:Name Version Build Channel_libgcc_mutex 0.1 conda_forge conda-forge CUDA Versionnvcc: NVIDIA (R) Cuda compiler driver +-----------------------------------------------------------------------------------------+ PyTorch Version and CUDA SupportPyTorch Version: 2.4.0+cu121 GCC Versiongcc (conda-forge gcc 11.2.0-16) 11.2.0 glibc Versionldd (Ubuntu GLIBC 2.39-0ubuntu8.3) 2.39 Installed PyTorch Packagespytorch-lightning 2.1.0 pip freeze Outputabsl-py==2.1.0 NVIDIA Driver VersionThu Oct 3 13:53:44 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.51.01 Driver Version: 565.90 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:2B:00.0 On | N/A |
| 0% 32C P8 34W / 350W | 1394MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 33 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+ |
How about removing the |
NICE!!!!! That worked! I'll keep you posted! |
Great, congrats @gt732 ! Could you please submit a pull request to reduce VRAM usage and maybe other small changes to run the code? This would be really helpful for users with limited GPU resources. Thanks so much. |
Hi, I added --gradient_checkpointing, but the shipping reports an error, have you encountered the same problem, thanks! |
Hi,
Is there any optimizations or settings I can change to get this running on my 3090 24GB? I'm using the photos included in the demo of Yuilang to test the code.
Thanks!
The text was updated successfully, but these errors were encountered: