Skip to content

Use consistent names for internal nvcc files #2383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Apr 15, 2025

This PR extracts the fixes from #2356 per @drahnr's request.

This PR branched from/is a follow-up to #2382. See the diff of the two branches here.

The names for the internal files depend on the compilation flag and device architectures. nvcc generates a different name for the .cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, .ptx, and .cubin files when the compile flag is -ptx, -cubin, or -c, and also on whether there's one vs. many -gencode arguments. Additionally, it will either include or omit the --gen_module_id_file flag from the cicc invocation based on whether the compile flag is -ptx, -cubin, or -c.

Some examples:

# compile flag: -ptx, single arch
$ nvcc -x cu -ptx x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id"  "/x/x.cpp1.ii" -o "x.cu.o"

# compile flag: -cubin, single arch
$ nvcc -x cu -cubin x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "x.cu.o"

# compile flag: -c, single arch
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "/x/x.sm_60.cubin" 

# compile flag: -c, multiple archs
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] -gencode=arch=compute_70,code=[compute_70,sm_70] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.compute_60.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_60.cudafe1.c" --stub_file_name "/x/x.compute_60.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_60.cudafe1.gpu"  "/x/x.compute_60.cpp1.ii" -o "/x/x.compute_60.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.compute_60.ptx"  -o "/x/x.compute_60.cubin" 
#$ gcc -E ... "x.cu" -o "/x/x.compute_70.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_70.cudafe1.c" --stub_file_name "/x/x.compute_70.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_70.cudafe1.gpu"  "/x/x.compute_70.cpp1.ii" -o "/x/x.compute_70.ptx"
#$ ptxas -arch=sm_70 -m64  "/x/x.compute_70.ptx"  -o "/x/x.compute_70.sm_70.cubin" 

From the above, we observe that:

  • .cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, and .ptx files are either:
    • x.<suffix>
    • x.compute_XX.<suffix>
  • .cubin files are either:
    • x.cubin
    • x.compute_XX.cubin
    • x.compute_XX.sm_XX.cubin
  • without -c, the cicc command includes --gen_module_id_file
  • with -c, the cicc command omits --gen_module_id_file

This PR hashes all the cudafe++, cicc, and ptxas arguments to avoid collisions, but nvcc's inconsistent file naming leads to cache misses when there should be hits. So for simplicity I updated the renaming logic to rename to the longest form of each (i.e. x.compute_XX.ptx, x.compute_XX.sm_XX.cubin), and always add the --gen_module_id_file flag to cicc invocations.

@trxcllnt
Copy link
Contributor Author

@trxcllnt
Copy link
Contributor Author

cc: @robertmaynard for review

These changes ensure cache hits for compilations which are subsets of previously cached compilations

* Normalize cudafe++, ptx, and cubin names regardless of whether the compilation flag is `-c`, `-ptx`, `-cubin`, or whether there are one or many `-gencode` flags
* Include the compiler `hash_key` in the output dir for internal nvcc files to guarantee stability and uniqueness
* Fix cache error due to hash collision from not hashing all the PTX and cubin flags
add more multi-arch tests to ensure combining cached/new PTX and cubins doesn't produce corrupted objects
@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from fc85929 to 051fec9 Compare April 28, 2025 18:55
@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 25.72298% with 488 lines in your changes missing coverage. Please review.

Project coverage is 69.57%. Comparing base (9fb942e) to head (ffe68b9).

Files with missing lines Patch % Lines
src/compiler/nvcc.rs 0.00% 453 Missing ⚠️
src/compiler/cicc.rs 0.00% 24 Missing ⚠️
src/compiler/msvc.rs 0.00% 5 Missing ⚠️
src/compiler/cudafe.rs 0.00% 1 Missing ⚠️
src/compiler/diab.rs 0.00% 1 Missing ⚠️
src/compiler/nvhpc.rs 0.00% 1 Missing ⚠️
src/compiler/ptxas.rs 0.00% 1 Missing ⚠️
src/compiler/tasking_vx.rs 0.00% 1 Missing ⚠️
tests/harness/client.rs 99.25% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2383      +/-   ##
==========================================
- Coverage   71.41%   69.57%   -1.84%     
==========================================
  Files          65       45      -20     
  Lines       36349    26521    -9828     
==========================================
- Hits        25960    18453    -7507     
+ Misses      10389     8068    -2321     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from 7ba3fae to ffe68b9 Compare April 29, 2025 15:51
@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from ffe68b9 to 3c08b18 Compare April 29, 2025 16:28
Output(PathBuf),
PassThrough(OsString),
UnhashedFlag,
ExtraOutput(PathBuf),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this from UnhashedOutput -> ExtraOutput to reflect that these argument types are now hashed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants