Skip to content

Fix use of CUDA CCs in LLVM easyblock and cleanup #3755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jun 4, 2025

(created using eb --new-pr)

  • Relax condition for passing CUDA CCs to allow use with LLVM 18
  • Rename usepolly to use_polly consistent with other options
  • Don't warn for mod files for non-minimal builds
  • Use get_cuda_cc_template_value and list_to_cmake_arg to avoid comments and simplify code
  • Reduce use of runtimes_cmake_args (Variables prefixed with e.g. LIBOMPTARGET get passed directly and should rather be passed to the main configure)
  • Fix/Enhance some comments
  • Refactor and fix check for OpenMP files in sanity check (temporary variable was assigned to final variable inside nvptx_target_cond block)
  • Add warning for unknown CPU arch (from CLang easyblock)
  • refactorings to super() call, f-Strings and list changes

Test easyconfigs: easybuilders/easybuild-easyconfigs#23028

Requires:

Comment on lines 690 to 691
if gpu_archs:
self.runtimes_cmake_args['LIBOMPTARGET_DEVICE_ARCHITECTURES'] = '%s' % '|'.join(gpu_archs)
self._cmakeopts['LIBOMPTARGET_DEVICE_ARCHITECTURES'] = ';'.join(gpu_archs)
Copy link
Contributor

@Thyre Thyre Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work, and I would be extremely hesitant towards reverting this.
The sanity check may pass, simply because all architectures are being built, but this is not what we want.

See e.g. #3675 (comment) for LLVM 19. I think this is also something where @bedroge had issues with, which we resolved with the currently implemented approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually see the opposite: I did a test with the CLang and the LLVM easyblock of the "same" easyconfig and did a diff:

Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_35.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_37.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_50.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_52.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_53.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_60.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_61.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_62.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_70.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_72.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_75.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_86.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_87.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_89.bc.
Nur in /tmp/install-old/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0/lib64: libomptarget-nvptx-sm_90.bc.

I inspected the CMake files and every CMake value with prefix LIBOMPTARGET get automatically passed through to the sub-builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were significant changes with LLVM 19 & LLVM 20, which might influence the results a bit.
I just checked my LLVM 19 & 20 installs on my machines, and with the currently implemented approach, only the passed architectures are built.

I don't have an LLVM 18.1.8 installation with the new EasyBlock available to compare to, and unfortunately jsc-zen3 hasn't either. Maybe we missed something for this version in particular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a companion PR for testing with some easyconfigs of various versions. Build is running, will check

IIRC LLVM 20 doesn't use those anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking a Clang installation done with the old EasyBlock just before EB 5.0.0 (4.9.5dev), I can see that all architectures are built:

jreuter@ZAM054  /opt/EasyBuild/apps/software/Clang/18.1.8-GCCcore-13.3.0-CUDA-12.6.0  find . -name "*.bc"
./lib/libomptarget-amdgpu-gfx1150.bc
./lib/libomptarget-amdgpu-gfx900.bc
./lib/libomptarget-amdgpu-gfx941.bc
./lib/libomptarget-amdgpu-gfx1032.bc
./lib/libomptarget-amdgpu-gfx902.bc
./lib/libomptarget-nvptx-sm_89.bc
./lib/libomptarget-amdgpu-gfx801.bc
./lib/libomptarget-amdgpu-gfx1103.bc
./lib/libomptarget-amdgpu-gfx1010.bc
./lib/libomptarget-nvptx-sm_37.bc
./lib/libomptarget-nvptx-sm_80.bc
./lib/libomptarget-nvptx-sm_86.bc
./lib/libomptarget-nvptx-sm_61.bc
./lib/libomptarget-nvptx-sm_70.bc
./lib/libomptarget-nvptx-sm_50.bc
./lib/libomptarget-amdgpu-gfx1030.bc
./lib/libomptarget-nvptx-sm_62.bc
./lib/libomptarget-nvptx-sm_75.bc
./lib/libomptarget-nvptx-sm_72.bc
./lib/libomptarget-amdgpu-gfx1031.bc
./lib/libomptarget-amdgpu-gfx90c.bc
./lib/libomptarget-nvptx-sm_87.bc
./lib/libomptarget-nvptx-sm_60.bc
./lib/libomptarget-amdgpu-gfx1035.bc
./lib/libomptarget-amdgpu-gfx803.bc
./lib/libomptarget-nvptx-sm_35.bc
./lib/libomptarget-nvptx-sm_53.bc
./lib/libomptarget-amdgpu-gfx942.bc
./lib/libomptarget-amdgpu-gfx940.bc
./lib/libomptarget-amdgpu-gfx1101.bc
./lib/libomptarget-amdgpu-gfx1100.bc
./lib/libomptarget-amdgpu-gfx90a.bc
./lib/libomptarget-amdgpu-gfx906.bc
./lib/libomptarget-amdgpu-gfx701.bc
./lib/libomptarget-amdgpu-gfx700.bc
./lib/libomptarget-amdgpu-gfx908.bc
./lib/libomptarget-amdgpu-gfx1033.bc
./lib/libomptarget-amdgpu-gfx1151.bc
./lib/libomptarget-amdgpu-gfx1034.bc
./lib/libomptarget-amdgpu-gfx1036.bc
./lib/libomptarget-nvptx-sm_90.bc
./lib/libomptarget-amdgpu-gfx1102.bc
./lib/libomptarget-nvptx-sm_52.bc

Compared to LLVM 19.1.1 with the latest EasyBlock:

 jreuter@ZAM054  /opt/EasyBuild/apps/software/LLVM/19.1.1-GCCcore-13.3.0  find . -name "*.bc"
./lib/x86_64-unknown-linux-gnu/libomptarget-nvptx-sm_80.bc
./lib/x86_64-unknown-linux-gnu/libomptarget-nvptx-sm_75.bc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I might have made a mistake during the comparison. I'm doing a rebuild of both versions now.

The idea was to simplify this and make it easier to understand in the future. To me this runtimes_cmake_args thing was highly suspicious/confusing especially with the pipe separator and it isn't clear either if and why this argument isn't required for the "main" build.
In the source you can see that there are some variables
and the LIBOMP* prefix passed through by default. And according to git blame this has been virtually always the case.
So I'm confident this is the best way, especially as I couldn't find any examples doing it the current way which makes me thing this is "safer".

I think building a few versions (see the EC PR) with this should be enough to check that this works for all 3 major versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked LLVM 18.1.8 with your changes, and there it seems to work with your solution as well.

$ find . -name "*.bc"
./lib/libomptarget-nvptx-sm_75.bc
./lib/libomptarget-amdgpu-gfx1101.bc
./lib/libomptarget-nvptx-sm_80.bc

Lets check the other ones for safety as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've redone the tests with Clang 18.1.8 once with the clang easyblock and once with the updated llvm easyblock, both with CCC=7.0,8.0, and can still see that the clang easyblock builds all architectures. Diff of the install folder restricted to added files and ignoring the lib64 symlink:

Nur in Clang-LLVM/bin: count.
Nur in Clang-LLVM/bin: FileCheck.
Nur in Clang-LLVM/bin: lli-child-target.
Nur in Clang-LLVM/bin: llvm-jitlink-executor.
Nur in Clang-LLVM/bin: llvm-PerfectShuffle.
Nur in Clang-LLVM/bin: not.
Nur in Clang-LLVM/bin: obj2yaml.
Nur in Clang-LLVM/bin: split-file.
Nur in Clang-LLVM/bin: UnicodeNameMappingGenerator.
Nur in Clang-LLVM/bin: yaml2obj.
Nur in Clang-LLVM/bin: yaml-bench.
Nur in Clang-Clang/easybuild/reprod/easyblocks: clang.py.
Nur in Clang-LLVM/easybuild/reprod/easyblocks: llvm.py.
Nur in Clang-LLVM/include: x86_64-pc-linux.
Nur in Clang-LLVM/include: x86_64-pc-linux-gnu.
Nur in Clang-LLVM/lib/clang/18/lib: x86_64-pc-linux.
Nur in Clang-LLVM/lib/clang/18/lib: x86_64-pc-linux-gnu.
Nur in Clang-Clang/lib: libgomp.so.
Nur in Clang-Clang/lib: libiomp5.so.
Nur in Clang-LLVM/lib: libLLVM-18.so.
Nur in Clang-LLVM/lib: libLLVM.so.
Nur in Clang-LLVM/lib: libLLVM.so.18.1.
Nur in Clang-LLVM/lib: libMLIR.so.
Nur in Clang-LLVM/lib: libMLIR.so.18.1.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1010.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1030.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1031.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1032.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1033.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1034.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1035.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1036.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1100.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1101.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1102.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1103.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1150.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx1151.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx700.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx701.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx801.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx803.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx900.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx902.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx906.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx908.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx90a.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx90c.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx940.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx941.bc.
Nur in Clang-Clang/lib: libomptarget-amdgpu-gfx942.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_35.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_37.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_50.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_52.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_53.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_60.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_61.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_62.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_72.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_75.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_86.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_87.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_89.bc.
Nur in Clang-Clang/lib: libomptarget-nvptx-sm_90.bc.
Nur in Clang-LLVM/lib: x86_64-pc-linux.
Nur in Clang-LLVM/lib: x86_64-pc-linux-gnu.

libgomp and libiomp are symlinks to libomp. Not sure why LLVM doesn't create them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might worth checking if this has some influence:

Nur in Clang-Clang/lib: libgomp.so.
Nur in Clang-Clang/lib: libiomp5.so.

One can select the OpenMP library with -fopenmp=[...] IIRC.
If choosing libgomp still works, I wouldn't care that the files are missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That option seemed to have been considered in an ancient Clang version but dropped. Given that those are symlinks I don't see a reason to not install them just in case.

Comment on lines 501 to 505
self.runtimes_cmake_args['LIBOMPTARGET_PLUGINS_TO_BUILD'] = '%s' % '|'.join(self.offload_targets)
self._cmakeopts['LIBOMPTARGET_PLUGINS_TO_BUILD'] = ';'.join(self.offload_targets)
Copy link
Contributor

@Thyre Thyre Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment.
I don't see why we should revert this when the current approach is proven to work just fine.

We've tested this very extensively already, both with CUDA & ROCm

Copy link
Contributor

@Thyre Thyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding offload targets, see above.
The remainder looks good to me at a quick first glance ( except the failing CI of course 😄 )

@Thyre
Copy link
Contributor

Thyre commented Jun 4, 2025

Test report by @Thyre

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
zam226 - Linux Ubuntu 22.04, x86_64, 12th Gen Intel(R) Core(TM) i7-12700, Python 3.10.12
See https://gist.github.com/Thyre/28a9e078282f4a68ac8b3061c00ca6f0 for a full test report.


The CMake option changes break builds where offload architectures need to be passed.

@Flamefire
Copy link
Contributor Author

I had messed up quoting one argument. Shall we introduce a function that does "'%s'" % ';'.join(some-list) e.g. as CMakeMake::list_to_cmake_arg?

@Flamefire
Copy link
Contributor Author

As for the CI failure: I'd rather make the function more flexible and use that: easybuilders/easybuild-framework#4913

IMO this is better than having to repeat our logic of determining the CCCs in many places. I had set the build option locally which is why I didn't notice it.
Objections?

@Thyre
Copy link
Contributor

Thyre commented Jun 4, 2025

Both sound like a good idea 👍

@Flamefire Flamefire changed the title Fix use of CUDA CCs in LLVM easyblock and cleanup Clean up LLVM easyblock Jun 4, 2025
@Flamefire Flamefire changed the title Clean up LLVM easyblock Fix use of CUDA CCs in LLVM easyblock and cleanup Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants