Replies: 1 comment 3 replies
-
Can you pls identity what in nekRS triggers this.
… On 25 Feb 2021, at 16:07, zjin-lcf ***@***.***> wrote:
I am not sure if anyone encountered the following error when building the example using the OCCA HIP version. Thanks.
nrsmpi ethier 1
...
key: SCALAR00 INITIAL GUESS DEFAULT, value: EXTRAPOLATION
key: SCALAR00 PRECONDITIONER, value: JACOBI
key: SCALAR00 SOLVER TOLERANCE, value: 1.000000e-12
key: SCALAR00 DIFFUSIVITY, value: 1.000000e-02
key: SCALAR00 DENSITY, value: 1.000000e+00
key: SCALAR01 INITIAL GUESS DEFAULT, value: EXTRAPOLATION
key: SCALAR01 PRECONDITIONER, value: JACOBI
key: SCALAR01 SOLVER TOLERANCE, value: 1.000000e-12
key: SCALAR01 DIFFUSIVITY, value: 1.000000e-02
key: SCALAR01 DENSITY, value: 1.000000e+00
key: SCALAR SOLVER, value: PCG
key: SCALAR BASIS, value: NODAL
key: SCALAR DISCRETIZATION, value: CONTINUOUS
key: BUILD ONLY, value: FALSE
key: DATA FILE, value: /path/to/nekRS-HIP/examples/ethier/.cache/udf/udf.okl
key: CI-MODE, value: 0
device memory usage: 0.0635219 GB
initialization took 289.073 s
timestepping for 100 steps ...
**:0:rocdevice.cpp :2303: 747494304564 us: Device::callbackQueue aborting with status: 0x100f**
[92464] *** Process received signal ***
[node:92464] Signal: Aborted (6)
[node:92464] Signal code: (-6)
[node:92464] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b5b86a4c630]
[node:92464] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b5b86c8f3d7]
[node:92464] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b5b86c90ac8]
[node:92464] [ 3] /opt/rocm-4.0.0/hip/lib/libamdhip64.so.4(+0x19e82b)[0x2b5b8807582b]
[node:92464] [ 4] /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1(+0x306bf)[0x2b5b89a396bf]
[node:92464] [ 5] /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1(+0x6569b)[0x2b5b89a6e69b]
[node:92464] [ 6] /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1(+0x166c7)[0x2b5b89a1f6c7]
[node:92464] [ 7] /lib64/libpthread.so.0(+0x7ea5)[0x2b5b86a44ea5]
[node:92464] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2b5b86d579fd]
[node:92464] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node node exited on signal 6 (Aborted).
--------------------------------------------------------------------------
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am not sure if anyone encountered the following error when building the example using the OCCA HIP version. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions