Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment issue #179

Closed
mefor44 opened this issue Jan 29, 2025 · 4 comments
Closed

Environment issue #179

mefor44 opened this issue Jan 29, 2025 · 4 comments

Comments

@mefor44
Copy link

mefor44 commented Jan 29, 2025

Can you please share more details about environment? I was trying to run training based on your instructions but it generated an error (pased below). I am trying to run the code on Ubuntu 22, with CUDA 11.8 and python version 3.11. I install pytorch 2.6 and then install remaining dependencies with pip3 install gin-config absl-py scikit-learn scipy matplotlib numpy apex hypothesis pandas fbgemm_gpu iopath tensorboard.

Error message:
`WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
Initialize _item_emb.weight as truncated normal: torch.Size([695763, 64]) params
Skipping init for _embedding_module._item_emb.weight
Initialize _input_features_preproc._pos_emb.weight as xavier normal: torch.Size([61, 64]) params
Skipping init for _hstu._attention_layers.0._uvqk
Skipping init for _hstu._attention_layers.0._rel_attn_bias._ts_w
Skipping init for _hstu._attention_layers.0._rel_attn_bias._pos_w
Skipping init for _hstu._attention_layers.0._o.weight
Skipping init for _hstu._attention_layers.0._o.bias
Skipping init for _hstu._attention_layers.1._uvqk
Skipping init for _hstu._attention_layers.1._rel_attn_bias._ts_w
Skipping init for _hstu._attention_layers.1._rel_attn_bias._pos_w
Skipping init for _hstu._attention_layers.1._o.weight
Skipping init for _hstu._attention_layers.1._o.bias
Skipping init for _hstu._attention_layers.2._uvqk
Skipping init for _hstu._attention_layers.2._rel_attn_bias._ts_w
Skipping init for _hstu._attention_layers.2._rel_attn_bias._pos_w
Skipping init for _hstu._attention_layers.2._o.weight
Skipping init for _hstu._attention_layers.2._o.bias
Skipping init for _hstu._attention_layers.3._uvqk
Skipping init for _hstu._attention_layers.3._rel_attn_bias._ts_w
Skipping init for _hstu._attention_layers.3._rel_attn_bias._pos_w
Skipping init for _hstu._attention_layers.3._o.weight
Skipping init for _hstu._attention_layers.3._o.bias
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
WARNING:root:Could not the library 'fbgemm_gpu_py.so': /home/mateusz.marzec/.pyenv/versions/3.11.9/envs/generative-recommenders/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb. This may be expected depending on the FBGEMM_GPU variant.
Traceback (most recent call last):
File "/home/mateusz.marzec/custom/generative-recommenders/main.py", line 85, in
main()
File "/home/mateusz.marzec/custom/generative-recommenders/main.py", line 81, in main
app.run(_main)
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
^^^^^^^^^^
File "/home/mateusz.marzec/custom/generative-recommenders/main.py", line 72, in _main
mp.spawn(
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 282, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 238, in start_processes
while not context.join():
^^^^^^^^^^^^^^
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 189, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 76, in _wrap
fn(i, *args)
File "/home/mateusz.marzec/custom/generative-recommenders/main.py", line 65, in mp_train_fn
train_fn(rank, world_size, master_port)
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/gin/config.py", line 1605, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
raise proxy.with_traceback(exception.traceback) from None
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/gin/config.py", line 1582, in gin_wrapper
return fn(*new_args, **new_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mateusz.marzec/custom/generative-recommenders/generative_recommenders/trainer/train.py", line 333, in train_fn
eval_dict = eval_metrics_v2_from_tensors(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/mateusz.marzec/custom/generative-recommenders/generative_recommenders/data/eval.py", line 103, in eval_metrics_v2_from_tensors
shared_input_embeddings = model.encode(
^^^^^^^^^^^^^
File "/home/mateusz.marzec/custom/generative-recommenders/generative_recommenders/modeling/sequential/hstu.py", line 799, in encode
return self._encode(
^^^^^^^^^^^^^
File "/home/mateusz.marzec/custom/generative-recommenders/generative_recommenders/modeling/sequential/hstu.py", line 760, in _encode
encoded_seq_embeddings, cache_states = self.generate_user_embeddings(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mateusz.marzec/custom/generative-recommenders/generative_recommenders/modeling/sequential/hstu.py", line 696, in generate_user_embeddings
x_offsets=torch.ops.fbgemm.asynchronous_complete_cumsum(past_lengths),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mateusz.marzec/.pyenv/versions/generative-recommenders/lib/python3.11/site-packages/torch/_ops.py", line 1170, in getattr
raise AttributeError(
AttributeError: '_OpNamespace' 'fbgemm' object has no attribute 'asynchronous_complete_cumsum'
In call to configurable 'train_fn' (<function train_fn at 0x7fccf13e82c0>)`

@jiaqizhai
Copy link
Contributor

Could you try pip3 install -r requirements.txt directly?

@mefor44
Copy link
Author

mefor44 commented Jan 29, 2025

I tried it but it leads to python version conflict: #177. Which python version do you use?

@jiaqizhai
Copy link
Contributor

jiaqizhai commented Jan 30, 2025

Fix: #180

@mefor44
Copy link
Author

mefor44 commented Jan 31, 2025

When I took new requirements from #180 it still had the same issue. However switching to CUDA 12.4 solved the problem! Previously I had CUDA 11.8. I saw you added CUDA version to the Readme - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants