Description
Fatal Python error: Segmentation fault
Current thread 0x000074aa85dae740 (most recent call first):
File "", line 219 in _call_with_frames_removed
File "", line 1166 in create_module
File "", line 556 in module_from_spec
File "", line 657 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "", line 219 in _call_with_frames_removed
File "", line 1042 in _handle_fromlist
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/ortools/graph/pywrapgraph.py", line 13 in
File "", line 219 in _call_with_frames_removed
File "", line 843 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "", line 219 in _call_with_frames_removed
File "", line 1042 in _handle_fromlist
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/openlanev2/evaluation/f_score.py", line 40 in
File "", line 219 in _call_with_frames_removed
File "", line 843 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/openlanev2/evaluation/evaluate.py", line 26 in
File "", line 219 in _call_with_frames_removed
File "", line 843 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/openlanev2/evaluation/init.py", line 1 in
File "", line 219 in _call_with_frames_removed
File "", line 843 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/bydpc/lxy_ws/map_topo/TopoNet/projects/toponet/datasets/openlanev2_subset_A_dataset.py", line 20 in
File "", line 219 in _call_with_frames_removed
File "", line 843 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/bydpc/lxy_ws/map_topo/TopoNet/projects/toponet/datasets/init.py", line 2 in
File "", line 219 in _call_with_frames_removed
File "", line 843 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/bydpc/lxy_ws/map_topo/TopoNet/projects/toponet/init.py", line 1 in
File "", line 219 in _call_with_frames_removed
File "", line 843 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "", line 1014 in _gcd_import
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/importlib/init.py", line 127 in import_module
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/mmcv/utils/misc.py", line 73 in import_modules_from_strings
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/mmcv/utils/config.py", line 343 in fromfile
File "tools/train.py", line 171 in main
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 361 in wrapper
File "tools/train.py", line 316 in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 0 (pid: 99244) of binary: /home/bydpc/anaconda3/envs/toponet/bin/python
/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:367: UserWarning:
CHILD PROCESS FAILED WITH NO ERROR_FILE
CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 99244 (local_rank 0) FAILED (exitcode -11)
Error msg: Signal 11 (SIGSEGV) received by PID 99244
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:
from torch.distributed.elastic.multiprocessing.errors import record
@record
def trainer_main(args):
# do train
warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in
main()
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 361, in wrapper
return f(*args, **kwargs)
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main
run(args)
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/bydpc/anaconda3/envs/toponet/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
tools/train.py FAILED
==================================================
Root Cause:
[0]:
time: 2025-03-17_13:06:05
rank: 0 (local_rank: 0)
exitcode: -11 (pid: 99244)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 99244"
Other Failures:
<NO_OTHER_FAILURES>
I only have one GPU, so I ran script ./tools/dist_train.sh 1, but it gave me an error. Can anyone help me fix this?