Skip to content

Use consistent names for internal nvcc files #2383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Apr 15, 2025

This PR extracts the fixes from #2356 per @drahnr's request.

This PR branched from/is a follow-up to #2382. See the diff of the two branches here.

The names for the internal files depend on the compilation flag and device architectures. nvcc generates a different name for the .cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, .ptx, and .cubin files when the compile flag is -ptx, -cubin, or -c, and also on whether there's one vs. many -gencode arguments. Additionally, it will either include or omit the --gen_module_id_file flag from the cicc invocation based on whether the compile flag is -ptx, -cubin, or -c.

Some examples:

# compile flag: -ptx, single arch
$ nvcc -x cu -ptx x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id"  "/x/x.cpp1.ii" -o "x.cu.o"

# compile flag: -cubin, single arch
$ nvcc -x cu -cubin x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "x.cu.o"

# compile flag: -c, single arch
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "/x/x.sm_60.cubin" 

# compile flag: -c, multiple archs
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] -gencode=arch=compute_70,code=[compute_70,sm_70] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.compute_60.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_60.cudafe1.c" --stub_file_name "/x/x.compute_60.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_60.cudafe1.gpu"  "/x/x.compute_60.cpp1.ii" -o "/x/x.compute_60.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.compute_60.ptx"  -o "/x/x.compute_60.cubin" 
#$ gcc -E ... "x.cu" -o "/x/x.compute_70.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_70.cudafe1.c" --stub_file_name "/x/x.compute_70.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_70.cudafe1.gpu"  "/x/x.compute_70.cpp1.ii" -o "/x/x.compute_70.ptx"
#$ ptxas -arch=sm_70 -m64  "/x/x.compute_70.ptx"  -o "/x/x.compute_70.sm_70.cubin" 

From the above, we observe that:

  • .cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, and .ptx files are either:
    • x.<suffix>
    • x.compute_XX.<suffix>
  • .cubin files are either:
    • x.cubin
    • x.compute_XX.cubin
    • x.compute_XX.sm_XX.cubin
  • without -c, the cicc command includes --gen_module_id_file
  • with -c, the cicc command omits --gen_module_id_file

This PR hashes all the cudafe++, cicc, and ptxas arguments to avoid collisions, but nvcc's inconsistent file naming leads to cache misses when there should be hits. So for simplicity I updated the renaming logic to rename to the longest form of each (i.e. x.compute_XX.ptx, x.compute_XX.sm_XX.cubin), and always add the --gen_module_id_file flag to cicc invocations.

@trxcllnt
Copy link
Contributor Author

@trxcllnt
Copy link
Contributor Author

cc: @robertmaynard for review

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from fc85929 to 051fec9 Compare April 28, 2025 18:55
@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 85.21257% with 80 lines in your changes missing coverage. Please review.

Project coverage is 71.69%. Comparing base (a43cade) to head (f736d4c).

Files with missing lines Patch % Lines
src/compiler/nvcc.rs 83.93% 76 Missing ⚠️
src/compiler/diab.rs 0.00% 1 Missing ⚠️
src/compiler/msvc.rs 0.00% 1 Missing ⚠️
src/compiler/nvhpc.rs 0.00% 1 Missing ⚠️
src/compiler/tasking_vx.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2383      +/-   ##
==========================================
+ Coverage   71.58%   71.69%   +0.11%     
==========================================
  Files          65       65              
  Lines       36214    36430     +216     
==========================================
+ Hits        25923    26120     +197     
- Misses      10291    10310      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch 2 times, most recently from ffe68b9 to 3c08b18 Compare April 29, 2025 16:28
Output(PathBuf),
PassThrough(OsString),
UnhashedFlag,
ExtraOutput(PathBuf),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this from UnhashedOutput -> ExtraOutput to reflect that these argument types are now hashed.

@sylvestre
Copy link
Collaborator

i wish this kind of changes would be done in a separate PR
I can't either:

  • squash this PR into a single commit given some unrelated changes
  • merge it as it has some commits like "revert unrelated changes"

Maybe split this PR into several

Note: it is why it takes time to merge your PR, split them into smaller PR would make our life much easier

@trxcllnt
Copy link
Contributor Author

@sylvestre I am fine with either squashing or merging. If you'd prefer to squash, are there files you'd like me to revert? If merge, I can rebase out the follow-up commits.

@drahnr
Copy link
Collaborator

drahnr commented May 16, 2025

Maybe split this PR into several

would be my personal preference.

@sylvestre
Copy link
Collaborator

same, smaller PR would be ideal :)

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from caf8955 to e599942 Compare May 20, 2025 21:41
@trxcllnt
Copy link
Contributor Author

I rebased on main and squashed the changes in trxcllnt@ac44e9a, trxcllnt@3c08b18, and trxcllnt@caf8955 into a single commit.

I can make a separate PR with the test changes after this one. How does that sound?

// up in the preprocessed output, so using random tmpdir paths leads to
// erroneous cache misses.
let out_dir = env::temp_dir().join("sccache_nvcc").join({
// Combine `hash_key` with the output path in case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename hash_key to something less ambiguous? It's temp dir subdir stabilizer, hash_key is not encoding that, it also needs docs in the trait definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it hash_key because it's the key produced by the call to generate_hash_key() for this nvcc compilation. I am fine calling it something else, it just seemed the most descriptive and consistent name.

@trxcllnt
Copy link
Contributor Author

I do plan to update the system.rs tests to account for the new numbers, it just takes a few hours in the current state and I've been busy with other things lately.

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from e599942 to 4d47627 Compare May 27, 2025 21:21
@trxcllnt
Copy link
Contributor Author

trxcllnt commented Jun 4, 2025

I updated the tests in system.rs so they're all now passing.

The rust v1.75.0 jobs are failing to cargo install grcov, but that looks to be happening in other PRs and is unrelated to the changes here. Is there a cargo flag or envvar we can set to allow unstable features when installing grcov?

@drahnr
Copy link
Collaborator

drahnr commented Jun 10, 2025

CC @sylvestre re grcov

@sylvestre
Copy link
Collaborator

it is change in the dep tree of grcov

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from 4d47627 to 4aa034e Compare June 10, 2025 22:54
These changes ensure cache hits for compilations which are subsets of previously cached compilations

* Normalize cudafe++, ptx, and cubin names regardless of whether the compilation flag is `-c`, `-ptx`, `-cubin`, or whether there are one or many `-gencode` flags
* Include the compiler `hash_key` in the output dir for internal nvcc files to guarantee stability and uniqueness
* Fix cache error due to hash collision from not hashing all the PTX and cubin flags
@sylvestre sylvestre force-pushed the fix/consistent-nvcc-internal-file-names branch from 4aa034e to f736d4c Compare June 20, 2025 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants