Skip to content

Use consistent names for internal nvcc files [DO NOT MERGE] #2383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Apr 15, 2025

This PR extracts the fixes from #2356 per @drahnr's request.

This PR branched from/is a follow-up to #2382. See the diff of the two branches here.

The names for the internal files depend on the compilation flag and device architectures. nvcc generates a different name for the .cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, .ptx, and .cubin files when the compile flag is -ptx, -cubin, or -c, and also on whether there's one vs. many -gencode arguments. Additionally, it will either include or omit the --gen_module_id_file flag from the cicc invocation based on whether the compile flag is -ptx, -cubin, or -c.

Some examples:

# compile flag: -ptx, single arch
$ nvcc -x cu -ptx x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id"  "/x/x.cpp1.ii" -o "x.cu.o"

# compile flag: -cubin, single arch
$ nvcc -x cu -cubin x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "x.cu.o"

# compile flag: -c, single arch
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "/x/x.sm_60.cubin" 

# compile flag: -c, multiple archs
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] -gencode=arch=compute_70,code=[compute_70,sm_70] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.compute_60.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_60.cudafe1.c" --stub_file_name "/x/x.compute_60.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_60.cudafe1.gpu"  "/x/x.compute_60.cpp1.ii" -o "/x/x.compute_60.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.compute_60.ptx"  -o "/x/x.compute_60.cubin" 
#$ gcc -E ... "x.cu" -o "/x/x.compute_70.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_70.cudafe1.c" --stub_file_name "/x/x.compute_70.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_70.cudafe1.gpu"  "/x/x.compute_70.cpp1.ii" -o "/x/x.compute_70.ptx"
#$ ptxas -arch=sm_70 -m64  "/x/x.compute_70.ptx"  -o "/x/x.compute_70.sm_70.cubin" 

From the above, we observe that:

  • .cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, and .ptx files are either:
    • x.<suffix>
    • x.compute_XX.<suffix>
  • .cubin files are either:
    • x.cubin
    • x.compute_XX.cubin
    • x.compute_XX.sm_XX.cubin
  • without -c, the cicc command includes --gen_module_id_file
  • with -c, the cicc command omits --gen_module_id_file

This PR hashes all the cudafe++, cicc, and ptxas arguments to avoid collisions, but nvcc's inconsistent file naming leads to cache misses when there should be hits. So for simplicity I updated the renaming logic to rename to the longest form of each (i.e. x.compute_XX.ptx, x.compute_XX.sm_XX.cubin), and always add the --gen_module_id_file flag to cicc invocations.

@trxcllnt
Copy link
Contributor Author

@trxcllnt
Copy link
Contributor Author

cc: @robertmaynard for review

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from fc85929 to 051fec9 Compare April 28, 2025 18:55
@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 85.32110% with 80 lines in your changes missing coverage. Please review.

Project coverage is 71.70%. Comparing base (a43cade) to head (751cc7c).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/compiler/nvcc.rs 83.93% 76 Missing ⚠️
src/compiler/diab.rs 0.00% 1 Missing ⚠️
src/compiler/msvc.rs 0.00% 1 Missing ⚠️
src/compiler/nvhpc.rs 0.00% 1 Missing ⚠️
src/compiler/tasking_vx.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2383      +/-   ##
==========================================
+ Coverage   71.58%   71.70%   +0.11%     
==========================================
  Files          65       65              
  Lines       36214    36434     +220     
==========================================
+ Hits        25923    26124     +201     
- Misses      10291    10310      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch 2 times, most recently from ffe68b9 to 3c08b18 Compare April 29, 2025 16:28
@sylvestre
Copy link
Collaborator

i wish this kind of changes would be done in a separate PR
I can't either:

  • squash this PR into a single commit given some unrelated changes
  • merge it as it has some commits like "revert unrelated changes"

Maybe split this PR into several

Note: it is why it takes time to merge your PR, split them into smaller PR would make our life much easier

@trxcllnt
Copy link
Contributor Author

@sylvestre I am fine with either squashing or merging. If you'd prefer to squash, are there files you'd like me to revert? If merge, I can rebase out the follow-up commits.

@drahnr
Copy link
Collaborator

drahnr commented May 16, 2025

Maybe split this PR into several

would be my personal preference.

@sylvestre
Copy link
Collaborator

same, smaller PR would be ideal :)

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from caf8955 to e599942 Compare May 20, 2025 21:41
@trxcllnt
Copy link
Contributor Author

I rebased on main and squashed the changes in trxcllnt@ac44e9a, trxcllnt@3c08b18, and trxcllnt@caf8955 into a single commit.

I can make a separate PR with the test changes after this one. How does that sound?

@trxcllnt
Copy link
Contributor Author

I do plan to update the system.rs tests to account for the new numbers, it just takes a few hours in the current state and I've been busy with other things lately.

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from e599942 to 4d47627 Compare May 27, 2025 21:21
@trxcllnt
Copy link
Contributor Author

trxcllnt commented Jun 4, 2025

I updated the tests in system.rs so they're all now passing.

The rust v1.75.0 jobs are failing to cargo install grcov, but that looks to be happening in other PRs and is unrelated to the changes here. Is there a cargo flag or envvar we can set to allow unstable features when installing grcov?

@drahnr
Copy link
Collaborator

drahnr commented Jun 10, 2025

CC @sylvestre re grcov

@sylvestre
Copy link
Collaborator

it is change in the dep tree of grcov

@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from 4d47627 to 4aa034e Compare June 10, 2025 22:54
@sylvestre sylvestre force-pushed the fix/consistent-nvcc-internal-file-names branch from 4aa034e to f736d4c Compare June 20, 2025 16:22
args.splice(idx..(idx + 1), []);
}
// Fix for CTK < 12.0:
// Remove `--gen_module_id_file` if cudafe++ already does it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want to encode cudafe++ specific behaviour. I pressume cudafe++ is just one example, and the fix is adjusting the path.

If this is done though, then any custom logic for output dependencies might depend on that. Could you elaborate if you see any further side effects of this?

Copy link
Contributor Author

@trxcllnt trxcllnt Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic normalizes the command order and arguments due to differing behavior between CUDA <=11 and >=12.

CUDA <=11 nvcc generates the commands in an order like this:

cicc -arch=60 --module_id_file_name foo.module_id --gen_module_id_file ...
...
cicc -arch=70 --module_id_file_name foo.module_id ...
...
cudafe++ --module_id_file_name foo.module_id ...

In the above, the first cicc call creates foo.module_id on disk due to --gen_module_id_file, which is consumed by the later cicc and cudafe++ calls. This is problematic, since we want to potentially run both cicc calls in parallel.

CUDA >= 12 nvcc generates the cudafe++ command first, and puts the --gen_module_id_file flag on it.

So here we reorganize and splice in/out arguments so the commands are always of the CUDA >=12 form.

any custom logic for output dependencies might depend on that

Since we're ensuring the command arguments are consistent, we should always get the same outputs for all our cudafe++, cicc, and ptxas invocations, even though nvcc's original commands are dependent on the -gencode flags.

// of whether the compilation flag is `-c`, `-ptx`, or `-cubin`

// e.g. test_a.compute_XX.cpp1.ii
let mut nidx = args.len() - 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we check anywhere this holds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the last three arguments to cicc are always input.cpp1.ii -o output.ptx

let mut nidx = args.len() - 3;
let name = args[nidx].clone();
// test_a.compute_XX.cpp1.ii -> test_a.compute_XX
let name = name.split(".cpp1.ii").next().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let name = name.split(".cpp1.ii").next().unwrap();
let name = name.rsplit_once().ok_or_else(|| { err })?.0;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: .cpp1.ii.cpp1.ii should be split how?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the name is something like x.compute_60.cpp1.ii, so splitting on the suffix and taking the first value gives just the name x.compute_60.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that intended how it's supposed to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intended. This block is part of ensuring consistent cicc arguments, regardless of which -gencode flags are or aren't passed.

The input file name is something like foo.compute75.cpp1.ii, so the three flags we splice in are:

--gen_c_file_name foo.compute75.cudafe1.c \
--stub_file_name foo.compute75.cudafe1.stub.c \
--gen_device_file_name foo.compute75.cudafe1.gpu

So the logic here extracts the foo.compute75 component from the input file name and uses it as the prefix for the file name for each of these three flags.

Copy link
Collaborator

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor nits, otherwise good to go! Thank you!

These changes ensure cache hits for compilations which are subsets of previously cached compilations

* Normalize cudafe++, ptx, and cubin names regardless of whether the compilation flag is `-c`, `-ptx`, `-cubin`, or whether there are one or many `-gencode` flags
* Include the compiler `hash_key` in the output dir for internal nvcc files to guarantee stability and uniqueness
* Fix cache error due to hash collision from not hashing all the PTX and cubin flags
@trxcllnt trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from f736d4c to 30e18c9 Compare June 27, 2025 06:46
"[{}]: maybe_keep_temps_then_clean move {src:?} -> {dst:?}",
output_path.display()
);
fs::rename(src, dst)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure it's the same filesystem? Or have a fallback? Otherwise rename will fail.

Copy link
Contributor Author

@trxcllnt trxcllnt Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we check if src and dst are on the same file system? Should I use fs::copy() and fs::remove_file() instead of fs::rename()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting, it's ok for the time being

@trxcllnt
Copy link
Contributor Author

trxcllnt commented Jun 27, 2025

Not sure what's going on with Windows2022, but it looks like clang-cl was updated recently?

I can try restricting those jobs to CUDA 12.5 instead of 12.8. edit: this isn't the issue.

I see Found clang++ in the logs of the failing tests, so it appears clang-cl isn't being detected as msvc-clang here.

I ran CI on a branch to print out the version: Found clang++ (version: "Clang 19.1.5")

Changing the test to search for clang-cl on Windows fails with the same error. This feels like a bug in the latest VS release (or the recent GHA Windows runner image update).

Copy link
Collaborator

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylvestre LGTM, you might want to give it a quick pass

@trxcllnt trxcllnt changed the title Use consistent names for internal nvcc files Use consistent names for internal nvcc files [DO NOT MERGE] Jul 7, 2025
@trxcllnt
Copy link
Contributor Author

trxcllnt commented Jul 7, 2025

Don't merge this yet, I need to make a few updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants