Skip to content

[Bug]: Freeze in kaleido #799

Open
Open
@etiennemlb

Description

@etiennemlb

Describe the bug

The profiler freeze when trying to generate the roofline using Plotly/Kaleido.

rocprof-compute profile -n roof0 --roof-only --device 0 --kernel-names -- myprocess

After computing the roofline, the software hangs. Cancelling it leads to the following stack trace:

CTraceback (most recent call last):
  File "bin/rocprof-compute", line 156, in <module>
    main()
  File "bin/rocprof-compute", line 144, in main
    rocprof_compute.run_profiler()
  File "libexec/rocprofiler-compute/utils/utils.py", line 53, in wrap_function
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libexec/rocprofiler-compute/rocprof_compute_base.py", line 294, in run_profiler
    self.__soc[self.__mspec.gpu_arch].post_profiling()
  File "libexec/rocprofiler-compute/utils/utils.py", line 53, in wrap_function
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libexec/rocprofiler-compute/rocprof_compute_soc/soc_gfx90a.py", line 109, in post_profiling
    self.roofline_obj.post_processing()
  File "libexec/rocprofiler-compute/roofline.py", line 445, in post_processing
    self.standalone_roofline()
  File "libexec/rocprofiler-compute/utils/utils.py", line 53, in wrap_function
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libexec/rocprofiler-compute/roofline.py", line 382, in standalone_roofline
    self.empirical_roofline(ret_df=t_df)
  File "libexec/rocprofiler-compute/utils/utils.py", line 53, in wrap_function
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libexec/rocprofiler-compute/roofline.py", line 158, in empirical_roofline
    ml_combo_fig_fp32_fp64.write_image(
  File "python-libs/plotly/basedatatypes.py", line 3895, in write_image
    return pio.write_image(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "python-libs/plotly/io/_kaleido.py", line 510, in write_image
    img_data = to_image(
               ^^^^^^^^^
  File "python-libs/plotly/io/_kaleido.py", line 398, in to_image
    img_bytes = scope.transform(
                ^^^^^^^^^^^^^^^^
  File "python-libs/kaleido/scopes/plotly.py", line 153, in transform
    response = self._perform_transform(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "python-libs/kaleido/scopes/base.py", line 293, in _perform_transform
    self._ensure_kaleido()
  File "python-libs/kaleido/scopes/base.py", line 192, in _ensure_kaleido
    startup_response_string = self._proc.stdout.readline().decode('utf-8')

Linux Distribution

OS: NAME="Red Hat Enterprise Linux" VERSION="8.10 (Ootpa)"

ROCm Compute Profiler Version

6.4.0

GPU

MI250X

ROCm Version

6.4.0

Cluster name (if applicable)

Frontier

Reproducer

Expected behavior

No response

Relevant log output

Screenshots

No response

Additional Context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions