Skip to content

INCOMPLETE Corrections for NOACC component type when targeting GPU / OpenACC #110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 10, 2025

Conversation

willend
Copy link
Contributor

@willend willend commented Feb 6, 2025

#109

@g5t I've tested this functional at least for ISIS_CRISP.instr, built a patched mccode-antlr and generated attached c-code locally on my Mac, transported to my 8-way GPU machine and ran on all the GPUS in parallel there:

Screenshot 2025-02-06 at 19 38 31

ISIS_CRISP.c.txt

For a reason I do not yet fully understand the code generated for NCrystal_example (perhaps 2 instances of the NCrystal_sample?) still fails to compile w/OpenACC.
NCrystal_example.c.txt

@willend willend changed the title Corrections for NOACC component type when targeting GPU / OpenACC INCOMPLETE Corrections for NOACC component type when targeting GPU / OpenACC Feb 6, 2025
@willend
Copy link
Contributor Author

willend commented Feb 6, 2025

Wrong observation on issue , 2. is still present :-)

Another difference relevant for NCrystal_example.instr is that the two NCrystal_sample/NOACC comps feature SPLITs

@willend
Copy link
Contributor Author

willend commented Feb 6, 2025

Manually

  1. Moving the SPLITs to proceeding arms
  2. adding CPU COMPONENT to the NCrystal instances
  3. patching the c-code to correct
// CPU
COMPONENT ....
  1. Adding -DFUNNEL to the compilation defines
    allows NCrystal_example to also run:
    Screenshot 2025-02-06 at 20 03 43

@willend
Copy link
Contributor Author

willend commented Feb 6, 2025

@g5t the only missing element now seems to be to raise -DFUNNEL "globally" once a NOACC component is found and pass to the compiler.

  • In McStas this is by adding to DEPENDENCY - any way to do that directly from within src/mccode_antlr/translators/c_raytrace.py?

@g5t
Copy link
Collaborator

g5t commented Feb 6, 2025

Would both a component instance with CPU and a component definition with NOACC introduce the -DFUNNEL dependency line in the same way?

@g5t
Copy link
Collaborator

g5t commented Feb 6, 2025

-DFUNNEL is now added to the compiler flags if any component instance is CPU only.
I can only run the new OpenACC tests with -acc=host so I can't really test if the change works.

@willend Could you run the new tests on your -acc=gpu capable machine? E.g.,

pytest tests/runtime/test_raytrace.py

@willend
Copy link
Contributor Author

willend commented Feb 7, 2025

Pytest output shows a few issues - most if not all related to h5 stuff:
pytest.log

I simply built and installed into my "usual" McStas dev env on the machine, i.e. not completely "clean". Do you want another attempt from a completely fresh environment @g5t?

Success: Current code is functional for e.g. NCrystal_example, if cogen and compilation is done "by hand".

One more smaller wish - usability oriented - wrt. integration with e.g. mcrun:

Could you add a "CFLAGS" line ala the standard mcstas cogen? - Then mcrun will pick up dependencies automatically? :-)

mcstas-antlr output:

(mcstas-3.x-dev/miniconda3) NCrystal_example $ mcstas-antlr -t NCrystal_example.instr -I$MCSTAS
No initialization present?

-----------------------------------------------------------

Generating single GPU kernel or single CPU section layout:
-> SPLIT 10 at component monochromator
-> SPLIT 10 at component powder_sample

-----------------------------------------------------------

Generating GPU/CPU -DFUNNEL layout:
-> GPU kernel from component origin
-> GPU kernel from component source
-> GPU kernel from component mono_arm
Component monochromator is NOACC, CPUONLY=True
->FUNNEL mode enabled, SPLIT within buffer.
-> SPLIT within buffer at component monochromator
Component powder_sample is NOACC, CPUONLY=True
->FUNNEL mode enabled, SPLIT within buffer.
-> SPLIT within buffer at component powder_sample

-----------------------------------------------------------

mcstas output:

(mcstas-3.x-dev/miniconda3) NCrystal_example $ mcstas -t NCrystal_example.instr -I$MCSTAS
Info:    'NCrystal_sample' is a contributed component.

-----------------------------------------------------------

Generating single GPU kernel or single CPU section layout: 
-> SPLIT N at component monochromator
-> SPLIT N at component powder_sample

-----------------------------------------------------------

Generating GPU/CPU -DFUNNEL layout:
Component monochromator is NOACC, CPUONLY=1
-> FUNNEL mode enabled, SPLIT within buffer.
-> SPLIT within buffer at component monochromator
-> GPU kernel from component mono_out
Component powder_sample is NOACC, CPUONLY=1
-> FUNNEL mode enabled, SPLIT within buffer.
-> SPLIT within buffer at component powder_sample
-> GPU kernel from component powder_pattern_detc

-----------------------------------------------------------
CFLAGS= -Wl,-rpath,CMD(ncrystal-config --show libdir) -Wl,CMD(ncrystal-config --show libpath) -ICMD(ncrystal-config --show includedir) -DFUNNEL 

If you add the above CFLAGS-output to mcstas-antlr I could in return implement an --cogen switch for mcrun and we could claim "full compatibility"? ;-)

@g5t
Copy link
Collaborator

g5t commented Feb 7, 2025

Your HDF5/h5py problems were fixed by putting everything into a single virtual environment?

I know it's not what you're after, but if you copy NCrystal_example.instr to your working directory (or give it's full path), the following should work 🤓

 mcrun-antlr NCrystal_example.instr -n 1 sample_cfg='Y2O3_sg206_Yttrium_Oxide.ncmat;density=0.6x'

(with no C file to worry about re-generating or accidentally modifying)

As for the CFLAGS line

  1. is it important that it's the last STDOUT line? Or are you detecting it via a literal search for the keyword at the start of any line?
  2. Would it be bad for mcrun if the CMD (and other) special statements are already resolved when printed?

@willend
Copy link
Contributor Author

willend commented Feb 7, 2025

Your HDF5/h5py problems were fixed by putting everything into a single virtual environment?

Nope, au contraire: I suspect the issues arose from installing in a "dirty" env. But it could also be something else e.g. missing/unresolved dependencies (and there is no reason to suspect things should not function once "deployed" correctly e.g. via conda-forge).

I know it's not what you're after, but if you copy NCrystal_example.instr to your working directory (or give it's full path), the following should work 🤓

mcrun-antlr NCrystal_example.instr -n 1 sample_cfg='Y2O3_sg206_Yttrium_Oxide.ncmat;density=0.6x'
(with no C file to worry about re-generating or accidentally modifying)

True, but what I am after is something else: Compatibility with the legacy tools that people are familiar with. 😜

As for the CFLAGS line

is it important that it's the last STDOUT line? Or are you detecting it via a literal search for the keyword at the start of any line?

There is a regex match for (a single) ^CFLAGS= line that can in principle exist anywhere in stdout. Not sure multiple would work (out of the box).

Would it be bad for mcrun if the CMD (and other) special statements are already resolved when printed?

Nope that should be no issue. Resolved result from mcstas-antlr internals should be just fine.

@willend
Copy link
Contributor Author

willend commented Feb 8, 2025

Found another minor thing:
It does not seem mcstas-antlr honours DEPENDENCY from the instr itself? Cf templateSANS_Mantid - or is it simply the interpretation of @NEXUSFLAGS@ that is missing?

(base) box:Conditional_test peterwillendrup$ grep DEPENDENCY templateSANS_Mantid.instr 
DEPENDENCY " @NEXUSFLAGS@ "
(base) box:Conditional_test peterwillendrup$ mcrun-antlr templateSANS_Mantid.instr 
No initialization present?

-----------------------------------------------------------

Generating single GPU kernel or single CPU section layout:

-----------------------------------------------------------

Generating GPU/CPU -DFUNNEL layout:
-> GPU kernel from component a

-----------------------------------------------------------
2025-02-08 14:53:43.065 | INFO     | mccode_antlr.compiler.c:_compile_instrument:159 - Compile templateSANS_Mantid
2025-02-08 14:53:43.065 | ERROR    | mccode_antlr.run.runner:mccode_run_cmd:290 - Interactive parameter entry does not currently work
2025-02-08 14:53:43.065 | INFO     | mccode_antlr.run.runner:mccode_run_cmd:291 - Execute `templateSANS_Mantid.out --list-parameters` to check expected parameters
(base) CIN-969631:Conditional_test peterwillendrup$ ./templateSANS_Mantid.out --help
templateSANS_Mantid (/Users/peterwillendrup/tmp/Conditional_test/templateSANS_Mantid.instr) instrument simulation, generated with McStas (1970-01-01)
Usage: ./templateSANS_Mantid.out [options] [parm=value ...]
Options are:
  -s SEED   --seed=SEED      Set random seed (must be != 0)
  -n COUNT  --ncount=COUNT   Set number of particles to simulate.
  -d DIR    --dir=DIR        Put all data files in directory DIR.
  -t        --trace          Enable trace of neutrons through instrument.
                             (Use -t=2 or --trace=2 for modernised mcdisplay rendering)
  -g        --gravitation    Enable gravitation for all trajectories.
  --no-output-files          Do not write any data files.
  -h        --help           Show this help message.
  -i        --info           Detailed instrument information.
  --list-parameters          Print the instrument parameters to standard out
  --meta-list                Print names of components which defined metadata
  --meta-defined COMP[:NAME] Print component defined metadata names, or (0,1) if NAME provided
  --meta-type COMP:NAME      Print metadata format type specified in definition
  --meta-data COMP:NAME      Print the metadata text
  --source                   Show the instrument code which was compiled.

  --bufsiz                   Monitor_nD list/buffer-size (default: 1000000)
  --format=FORMAT            Output data files using FORMAT=MCSTAS

Instrument parameters are:
  lambda          (double) [default='6']
  dlambda         (double) [default='0.05']
  r               (double) [default='150']
  PHI             (double) [default='0.001']
  Delta_Rho       (double) [default='0.6']
  sigma_abs       (double) [default='0.0']
Known signals are: USR1 (status) USR2 (save) TERM (save and exit)

(If successfully compiled with the @NEXUSFLAGS@ the binary would respond with
--format=FORMAT Output data files using FORMAT=MCSTAS NEXUS)

@g5t
Copy link
Collaborator

g5t commented Feb 9, 2025

There should be CFLAGS= output now; can you check that it works for you?

The @NEXUSFLAGS@ / wrong --format problem seems like a separate issue to deal with.

@willend
Copy link
Contributor Author

willend commented Feb 10, 2025

CFLAGS= output confirmed functional! 👍 - But it looks like the -DFUNNEL coming from NOACC seems not included in that output?

And agreed DEPENDENCY from instrument level is another separate matter. Running mcstas-antlr with (standard) mcrun and --format=NeXus produces a binary linked correctly that produces NeXus

@g5t
Copy link
Collaborator

g5t commented Feb 10, 2025

I've updated the tests to check explicitly for CFLAGS=.*-DFUNNEL.* in the STDOUT produced while translating each (NO)ACC instrument. I'm still limited to --acc=host, so maybe I've missed something. Maybe you can check on your machine?

pytest tests/runtime/test_raytrace.py 

@willend
Copy link
Contributor Author

willend commented Feb 10, 2025

Some weirdness is still going on, but I think I have to conclude it is at my end...

CFLAGS from mcstas-antlr at the end here:

mcrun -c NCrystal_example.instr --openacc
INFO: No output directory specified (--dir)
INFO: Using directory: "NCrystal_example_20250210_164115"
INFO: Regenerating c-file: NCrystal_example.c
WARNING: Full-path code-generator "/u/data/pkwi/McStas/mcstas/3.x-dev/bin/mcstas-antlr -I$MCSTAS" not found!!
WARNING: Attempting replacement by "mcstas-antlr -I$MCSTAS"
No initialization present?

-----------------------------------------------------------

Generating single GPU kernel or single CPU section layout:
-> SPLIT 10 at component monochromator
-> SPLIT 10 at component powder_sample

-----------------------------------------------------------

Generating GPU/CPU -DFUNNEL layout:
-> GPU kernel from component origin
-> GPU kernel from component source
-> GPU kernel from component mono_arm
Component monochromator is NOACC, CPUONLY=True
->FUNNEL mode enabled, SPLIT within buffer.
-> SPLIT within buffer at component monochromator
Component powder_sample is NOACC, CPUONLY=True
->FUNNEL mode enabled, SPLIT within buffer.
-> SPLIT within buffer at component powder_sample

-----------------------------------------------------------
CFLAGS= -DFUNNEL -Wl,-rpath,/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/lib -Wl,/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/lib/libNCrystal.so.3.8.2 -I/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/include

But is then apparently missing from the final compilation command:

(cut away everything in between.... )

nvlink error   : Undefined reference to 'ncrystal_samplescatter' in '/tmp/nvcwDoPfc53eutQC.o'
nvlink error   : Undefined reference to 'ncrystal_crosssection_nonoriented' in '/tmp/nvcwDoPfc53eutQC.o'
nvlink error   : Undefined reference to 'ncrystal_crosssection' in '/tmp/nvcwDoPfc53eutQC.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/bin/tools/nvdd
INFO: call to nvc failed with Command 'nvc -o ./NCrystal_example.out ./NCrystal_example.c -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/lib -Wl,-rpath-link,/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/lib -L/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/lib -mtune=haswell -fPIC -O2 -isystem /u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/include -fast -Minfo=accel -acc=gpu -gpu=mem:managed -DOPENACC -Wl,-rpath,/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/lib -Wl,/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/lib/libNCrystal.so.3.8.2 -I/u/data/pkwi/McStas/mcstas/3.x-dev/miniconda3/include' returned non-zero exit status 2.

@willend
Copy link
Contributor Author

willend commented Feb 10, 2025

Yay! Confirmed to function on my 8-way GPU box!

Copy link
Collaborator

@g5t g5t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving-in the translation-time STDOUT CFLAGS= line because sometimes I find it necessary to compile a translated file by hand, and not needing to expand the special flags + copy & pasting sounds like a good idea.

@g5t g5t merged commit c103b7e into main Feb 10, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants