Skip to content

[Bug]: profiler crashes when profiling with torch multi processing #759

Open
@itej89

Description

@itej89

Describe the bug

I tried to profile a script that contains a main method like this

if __name__ == '__main__':
    num_processes = 8
    torch.multiprocessing.spawn(test_loop, args=(num_processes, ), nprocs=num_processes)

command

/opt/rocm/bin/rocprof-compute  profile -n perf_data  -- python3 ./test.py

Error:
The profiler crashes after one iteration saying "An instance of rocprof is already running"

Workaround(or single rank hack) for now

# Launch following on RANK 0
/opt/rocm/bin/rocprof-compute  profile -n perf_data  -- python3 ./test.py

# Keep closing and relaunching following on the reamining launch for every iteration of /opt/rocm/bin/rocprof-compute
python3 ./test.py

Linux Distribution

Ubuntu 22.04

ROCm Compute Profiler Version

3.0.0

GPU

MI300X

ROCm Version

No response

Cluster name (if applicable)

No response

Reproducer

a script that contains a main method like this

if __name__ == '__main__':
    num_processes = 8
    torch.multiprocessing.spawn(test_loop, args=(num_processes, ), nprocs=num_processes)

command

/opt/rocm/bin/rocprof-compute  profile -n perf_data  -- python3 ./test.py

Expected behavior

No response

Relevant log output

Screenshots

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions