Skip to content

"ze_peak" freezes on DG1 with latest drm-tip kernel + drivers #20

@eero-t

Description

@eero-t

Setup:

  • HW: CML-S / DG1 (0x4905)
  • OS: Ubuntu 22.04
  • Kernel: "drm-tip" head from yesterday
  • UMD: Latest releases of compute stack components, built with LLVM 12
  • App: "ze_peak" from level-zero-tests head

Bug:

./ze_peak freezes with 99% CPU usage after showing:
Single Precision Compute (GFLOPS)

(I.e. half precision and global BW tests before it worked fine.)

It can be quit with ^C, so it's not in 100% CPU loop.

Gdb shows:

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
0x00007f6fbca28cab in sched_yield () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f6fbca28cab in sched_yield () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f6fbc27cd63 in ?? () from target:/usr/local/lib/libze_intel_gpu.so.1
#2  0x00007f6fbc0572c2 in ?? () from target:/usr/local/lib/libze_intel_gpu.so.1
#3  0x0000564de2c87d3f in ?? ()
#4  0x0000564de2c88653 in ?? ()
#5  0x0000564de2c94ba1 in ?? ()
#6  0x0000564de2c86104 in ?? ()
#7  0x00007f6fbc949d90 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#8  0x00007f6fbc949e40 in __libc_start_main () from target:/lib/x86_64-linux-gnu/libc.so.6
#9  0x0000564de2c862e5 in ?? ()

perf showed most of the time being spent inside libze_intel_gpu.so.1. I.e. it could be driver issue, but I thought it better to start from the app.

ze_image_copy, ze_nano and ze_pingpong work fine. ze_bandwidth gets slower and slower, and I did not wait for it to complete.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions