Skip to content

Comments

Clean up cmake for Omega on Aurora#229

Merged
philipwjones merged 1 commit intoE3SM-Project:developfrom
amametjanov:aurora/cleanup-cmake
Aug 19, 2025
Merged

Clean up cmake for Omega on Aurora#229
philipwjones merged 1 commit intoE3SM-Project:developfrom
amametjanov:aurora/cleanup-cmake

Conversation

@amametjanov
Copy link
Member

Move CMAKE_CXX_STANDARD property before cmake project call.

[BFB]

Checklist

  • Testing
    • A comment in the PR documents testing used to verify the changes including any tests that are added/modified/impacted.
    • Unit tests have passed. Please provide a relevant CDash build entry for verification.

@amametjanov amametjanov added Omega clean up CMake CMake-related issues labels May 16, 2025
@amametjanov
Copy link
Member Author

amametjanov commented May 16, 2025

Testing:

100% tests passed, 0 tests failed out of 31

How-to test on Aurora:

!/bin/bash
cd components/omega/
rm -rf build && mkdir build && cd build
export PARMETIS_ROOT=/lus/flare/projects/E3SM_Dec/soft/polaris/aurora/spack/dev_polaris_0_8_0_oneapi-ifx_mpich/var/spack/environments/dev_polaris_0_8_0_oneapi-ifx_mpich/.spack-env/view

cmake \
   -DOMEGA_CIME_COMPILER=oneapi-ifxgpu \
   -DOMEGA_CIME_MACHINE=aurora \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON -Wno-dev \
   -S /home/azamatm/repos/Omega/components/omega -B . 2>&1 |tee cmake.1.out

ln -isf /lus/flare/projects/E3SM_Dec/inputdata/ocn/mpas-o/oQU240/ocean.QU.240km.151209.nc test/OmegaMesh.nc
ln -isf /lus/flare/projects/E3SM_Dec/inputdata/ocn/mpas-o/polaris_cache/global_convergence/icos/cosine_bell/Icos480/init/initial_state.230220.nc test/OmegaSphereMesh.nc
ln -isf /lus/flare/projects/E3SM_Dec/inputdata/ocn/mpas-o/polaris_cache/global_convergence/icos/cosine_bell/Icos480/init/PlanarPeriodic48x48.nc test/OmegaPlanarMesh.nc

./omega_build.sh 2>&1 | tee omega_build.sh.1.out

qsub -q debug -l walltime=00:30:00 -A E3SM_Dec -l select=1,filesystems=home:flare -I

cd $PBS_O_WORKDIR
./omega_ctest.sh 2>&1 | tee omega_ctest.sh.1.out

@mark-petersen mark-petersen requested a review from grnydawn August 6, 2025 15:12
@grnydawn
Copy link

@amametjanov , I tried to build aurora/cleanup-cmake branch on Aurora and got the following error:

CMake Warning (dev) at /lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL/e3smcase/cmake_macros/oneapi-ifxgpu_aurora.cmake:8 (string):
  Syntax error in cmake code at

    /lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL/e3smcase/cmake_macros/oneapi-ifxgpu_aurora.cmake:8

  when parsing string

     -\-intel -fsycl -fsycl-targets=spir64_gen -mlong-double-64 -Xsycl-target-backend \"-device 12.60.7\"

  Invalid escape sequence \-

  Policy CMP0010 is not set: Bad variable reference syntax is an error.  Run
  "cmake --help-policy CMP0010" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
Call Stack (most recent call first):
  /lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL/e3smcase/Macros.cmake:29 (include)
  OmegaBuild.cmake:155 (include)
  OmegaBuild.cmake:164 (read_cime_config)
  CMakeLists.txt:39 (init_standalone_build)
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Error at OmegaBuild.cmake:181 (message):
  C compiler, '', is not found.
Call Stack (most recent call first):
  CMakeLists.txt:39 (init_standalone_build)

Did you see this error before?

Move CMAKE_CXX_STANDARD property before cmake project call
@amametjanov amametjanov force-pushed the aurora/cleanup-cmake branch from afec985 to efec5f7 Compare August 13, 2025 21:54
@amametjanov
Copy link
Member Author

The error was due to older E3SM config (without #232).
Rebased this branch onto latest develop, which has the latest cmake config, and re-checked:

100% tests passed, 0 tests failed out of 33

@grnydawn
Copy link

@amametjanov , thanks for the update. I was able to build without errors after the update. However, I encountered a failure in one of the Omega ctest suites, IOSTREAM_TEST, with the following error message:

/lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL/e3smcase/.env_mach_specific.sh: line 31: export: `--gpu-bind': not a valid identifier
/lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL/e3smcase/.env_mach_specific.sh: line 31: export: `list:0.0:0.1:1.0:1.1:2.0:2.1:3.0:3.1:4.0:4.1:5.0:5.1': not a valid identifier
/lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL/e3smcase/.env_mach_specific.sh: line 31: export: `--mem-bind': not a valid identifier
/lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL/e3smcase/.env_mach_specific.sh: line 31: export: `list:0:0:0:0:0:0:1:1:1:1:1:1': not a valid identifier
Test project /lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL
    Start 19: IOSTREAM_TEST
1/1 Test #19: IOSTREAM_TEST ....................***Failed   19.15 sec
x4402c5s2b0n0.hsn.cm.aurora.alcf.anl.gov 0: PIO: FATAL ERROR: Aborting... FATAL ERROR: NetCDF: HDF error (file = ocn.hifreq.0001-06.nc) (/lus/flare/projects/E3SM_Dec/youngsun/repos/github/Omega.Az/externals/scorpio/src/clib/pioc_support.cpp: 5411)
x4402c5s2b0n0.hsn.cm.aurora.alcf.anl.gov 0: PIO: WARNING: Opening file (ocn.hifreq.0001-06.nc) with iotype=3 (PIO_IOTYPE_NETCDF4C) failed (ierr=-101, NetCDF: HDF error). Retrying with iotype=PIO_IOTYPE_NETCDF
x4402c5s2b0n0.hsn.cm.aurora.alcf.anl.gov 0: Obtained 10 stack frames.
./testIOStream.exe() [0x63cdbd]
./testIOStream.exe() [0x63cff6]
./testIOStream.exe() [0x63d34c]
./testIOStream.exe() [0x6430af]
./testIOStream.exe() [0x6433e5]
./testIOStream.exe() [0x474e8c]
./testIOStream.exe() [0x4a7ba4]
./testIOStream.exe() [0x4ac82a]
./testIOStream.exe() [0x43c38f]
/lib64/libc.so.6(__libc_start_main+0xef) [0x147c013d724d]
Abort(-1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
x4402c5s2b0n0.hsn.cm.aurora.alcf.anl.gov 0: Rank 0 aborted with code -1: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
x4402c5s2b0n0.hsn.cm.aurora.alcf.anl.gov: rank 0 invalid status -1
x4402c5s2b0n0.hsn.cm.aurora.alcf.anl.gov: rank 1 died from signal 15


0% tests passed, 1 tests failed out of 1

Label Time Summary:
Omega-0    =  19.15 sec*proc (1 test)
SYCL       =  19.15 sec*proc (1 test)

Total Test time (real) =  19.18 sec

The following tests FAILED:
	 19 - IOSTREAM_TEST (Failed)
Errors while running CTest

While I am not sure whether this error is related to this PR, could you take a look at the issue on Aurora?

@amametjanov
Copy link
Member Author

The error is not related to this PR.
That file ocn.hifreq.0001-06.nc is written out using PIO_IOTYPE_NETCDF4C, which has run-time seg-fault issues in other tests on Aurora.
To fix, I propose changing default format to PIO_IOTYPE_PNETCDF at https://github.com/E3SM-Project/Omega/blob/develop/components/omega/src/base/IO.h#L74 . That format is more robust and performant across machines as discussed in https://e3sm.atlassian.net/wiki/spaces/EIDMG/pages/769130507/Picking+a+netcdf+type+for+all+input+files .

@grnydawn
Copy link

grnydawn commented Aug 19, 2025

@amametjanov , I agree that the file I/O issue is not related to this PR, and I think this PR is ready to be merged.

However, I still encounter an issue even after setting the default I/O mode to PIO_IOTYPE_PNETCDF in IO.h. I wonder if this issue also persists on Aurora. FYI, I ran this test on an interactive node, and I’m not sure if that makes any difference to the test result.

Test project /lus/flare/projects/E3SM_Dec/youngsun/temp/omega/standalone_oneapi-ifxgpu_SYCL
    Start 19: IOSTREAM_TEST
1/1 Test #19: IOSTREAM_TEST ....................***Failed    3.43 sec
x4216c1s3b0n0.hsn.cm.aurora.alcf.anl.gov 0: PIO: FATAL ERROR: Aborting... An error occured, Waiting on pending requests on file (ocn.hist.0001-02-01_00:00:00.nc, ncid=19) failed (Number of pending requests on file = 1, Number of variables with pending requests = 1, Number of request blocks = 1, Current block being waited on = 0, Number of requests in current block = 1).. NetCDF: Operation not allowed in define mode (err=-39). Aborting since the error handler was set to PIO_INTERNAL_ERROR... (/lus/flare/projects/E3SM_Dec/youngsun/repos/github/Omega.Az/externals/scorpio/src/clib/pio_darray_int.cpp: 2192)
x4216c1s3b0n0.hsn.cm.aurora.alcf.anl.gov 0: Obtained 10 stack frames.
./testIOStream.exe() [0x63cdbd]
./testIOStream.exe() [0x63cff6]
./testIOStream.exe() [0x63d1e7]
./testIOStream.exe() [0x670c91]
./testIOStream.exe() [0x63c5d4]
./testIOStream.exe() [0x63ca52]
./testIOStream.exe() [0x477f8d]
./testIOStream.exe() [0x4b579f]
./testIOStream.exe() [0x4a9025]
./testIOStream.exe() [0x4ac82a]
Abort(-1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
x4216c1s3b0n0.hsn.cm.aurora.alcf.anl.gov 0: Rank 0 aborted with code -1: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
x4216c1s3b0n0.hsn.cm.aurora.alcf.anl.gov: rank 0 invalid status -1
x4216c1s3b0n0.hsn.cm.aurora.alcf.anl.gov: rank 1 died from signal 15

@philipwjones , What are your thoughts on changing the default I/O mode to PIO_IOTYPE_PNETCDF in IO.h, as Az suggested?

@philipwjones
Copy link

Yes I’m fine changing the default IO type - the current choice was somewhat arbitrary.

Copy link

@grnydawn grnydawn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed on Frontier and Perlmutter. All tests except IOSTREAM_TEST passed on Aurora. The issue with IOSTREAM_TEST does not appear to be related to this PR.

@philipwjones philipwjones merged commit eceab7a into E3SM-Project:develop Aug 19, 2025
1 check passed
@amametjanov amametjanov deleted the aurora/cleanup-cmake branch August 21, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean up CMake CMake-related issues Omega

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants