Use consistent names for internal `nvcc` files [DO NOT MERGE] #2383

trxcllnt · 2025-04-15T19:16:45Z

This PR extracts the fixes from #2356 per @drahnr's request.

This PR branched from/is a follow-up to #2382. See the diff of the two branches here.

The names for the internal files depend on the compilation flag and device architectures. nvcc generates a different name for the .cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, .ptx, and .cubin files when the compile flag is -ptx, -cubin, or -c, and also on whether there's one vs. many -gencode arguments. Additionally, it will either include or omit the --gen_module_id_file flag from the cicc invocation based on whether the compile flag is -ptx, -cubin, or -c.

Some examples:

# compile flag: -ptx, single arch
$ nvcc -x cu -ptx x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id"  "/x/x.cpp1.ii" -o "x.cu.o"

# compile flag: -cubin, single arch
$ nvcc -x cu -cubin x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --gen_module_id_file --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "x.cu.o"

# compile flag: -c, single arch
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[compute_60,sm_60] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.cudafe1.c" --stub_file_name "/x/x.cudafe1.stub.c" --gen_device_file_name "/x/x.cudafe1.gpu"  "/x/x.cpp1.ii" -o "/x/x.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.ptx"  -o "/x/x.sm_60.cubin" 

# compile flag: -c, multiple archs
$ nvcc -x cu -c x.cu -o x.cu.o -gencode=arch=compute_60,code=[sm_60] -gencode=arch=compute_70,code=[compute_70,sm_70] --dryrun --keep --keep-dir /x 2>&1 | grep -P '(cpp1\.ii|\.ptx|\.cubin)'
#$ gcc -E ... "x.cu" -o "/x/x.compute_60.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_60.cudafe1.c" --stub_file_name "/x/x.compute_60.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_60.cudafe1.gpu"  "/x/x.compute_60.cpp1.ii" -o "/x/x.compute_60.ptx"
#$ ptxas -arch=sm_60 -m64  "/x/x.compute_60.ptx"  -o "/x/x.compute_60.cubin" 
#$ gcc -E ... "x.cu" -o "/x/x.compute_70.cpp1.ii" 
#$ "$CICC_PATH/cicc" ... --module_id_file_name "/x/x.module_id" --gen_c_file_name "/x/x.compute_70.cudafe1.c" --stub_file_name "/x/x.compute_70.cudafe1.stub.c" --gen_device_file_name "/x/x.compute_70.cudafe1.gpu"  "/x/x.compute_70.cpp1.ii" -o "/x/x.compute_70.ptx"
#$ ptxas -arch=sm_70 -m64  "/x/x.compute_70.ptx"  -o "/x/x.compute_70.sm_70.cubin"

From the above, we observe that:

.cpp1.ii, .cudafe1.c, .cudafe1.stub.c, .cudafe1.gpu, and .ptx files are either:
- x.<suffix>
- x.compute_XX.<suffix>
.cubin files are either:
- x.cubin
- x.compute_XX.cubin
- x.compute_XX.sm_XX.cubin
without -c, the cicc command includes --gen_module_id_file
with -c, the cicc command omits --gen_module_id_file

This PR hashes all the cudafe++, cicc, and ptxas arguments to avoid collisions, but nvcc's inconsistent file naming leads to cache misses when there should be hits. So for simplicity I updated the renaming logic to rename to the longest form of each (i.e. x.compute_XX.ptx, x.compute_XX.sm_XX.cubin), and always add the --gen_module_id_file flag to cicc invocations.

trxcllnt · 2025-04-15T19:28:53Z

Looks like integration tests are failing due to https://github.blog/changelog/2025-03-20-notification-of-upcoming-breaking-changes-in-github-actions/#decommissioned-cache-service-brownouts

trxcllnt · 2025-04-18T16:45:57Z

cc: @robertmaynard for review

codecov-commenter · 2025-04-28T18:59:16Z

Codecov Report

Attention: Patch coverage is 85.32110% with 80 lines in your changes missing coverage. Please review.

Project coverage is 71.70%. Comparing base (a43cade) to head (751cc7c).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/compiler/nvcc.rs	83.93%	76 Missing ⚠️
src/compiler/diab.rs	0.00%	1 Missing ⚠️
src/compiler/msvc.rs	0.00%	1 Missing ⚠️
src/compiler/nvhpc.rs	0.00%	1 Missing ⚠️
src/compiler/tasking_vx.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2383      +/-   ##
==========================================
+ Coverage   71.58%   71.70%   +0.11%     
==========================================
  Files          65       65              
  Lines       36214    36434     +220     
==========================================
+ Hits        25923    26124     +201     
- Misses      10291    10310      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/compiler/cicc.rs

src/compiler/gcc.rs

src/compiler/msvc.rs

sylvestre · 2025-05-10T06:56:33Z

i wish this kind of changes would be done in a separate PR
I can't either:

squash this PR into a single commit given some unrelated changes
merge it as it has some commits like "revert unrelated changes"

Maybe split this PR into several

Note: it is why it takes time to merge your PR, split them into smaller PR would make our life much easier

trxcllnt · 2025-05-12T17:28:32Z

@sylvestre I am fine with either squashing or merging. If you'd prefer to squash, are there files you'd like me to revert? If merge, I can rebase out the follow-up commits.

drahnr · 2025-05-16T20:25:44Z

Maybe split this PR into several

would be my personal preference.

sylvestre · 2025-05-20T07:47:04Z

same, smaller PR would be ideal :)

trxcllnt · 2025-05-20T21:44:55Z

I rebased on main and squashed the changes in trxcllnt@ac44e9a, trxcllnt@3c08b18, and trxcllnt@caf8955 into a single commit.

I can make a separate PR with the test changes after this one. How does that sound?

src/compiler/nvcc.rs

trxcllnt · 2025-05-23T16:33:34Z

I do plan to update the system.rs tests to account for the new numbers, it just takes a few hours in the current state and I've been busy with other things lately.

trxcllnt · 2025-06-04T23:30:25Z

I updated the tests in system.rs so they're all now passing.

The rust v1.75.0 jobs are failing to cargo install grcov, but that looks to be happening in other PRs and is unrelated to the changes here. Is there a cargo flag or envvar we can set to allow unstable features when installing grcov?

drahnr · 2025-06-10T20:56:24Z

CC @sylvestre re grcov

sylvestre · 2025-06-10T20:58:12Z

it is change in the dep tree of grcov

drahnr · 2025-06-23T14:34:42Z

src/compiler/nvcc.rs

-                            args.splice(idx..(idx + 1), []);
-                        }
+                // Fix for CTK < 12.0:
+                // Remove `--gen_module_id_file` if cudafe++ already does it


I am not sure we want to encode cudafe++ specific behaviour. I pressume cudafe++ is just one example, and the fix is adjusting the path.

If this is done though, then any custom logic for output dependencies might depend on that. Could you elaborate if you see any further side effects of this?

This logic normalizes the command order and arguments due to differing behavior between CUDA <=11 and >=12.

CUDA <=11 nvcc generates the commands in an order like this:

cicc -arch=60 --module_id_file_name foo.module_id --gen_module_id_file ... ... cicc -arch=70 --module_id_file_name foo.module_id ... ... cudafe++ --module_id_file_name foo.module_id ...

In the above, the first cicc call creates foo.module_id on disk due to --gen_module_id_file, which is consumed by the later cicc and cudafe++ calls. This is problematic, since we want to potentially run both cicc calls in parallel.

CUDA >= 12 nvcc generates the cudafe++ command first, and puts the --gen_module_id_file flag on it.

So here we reorganize and splice in/out arguments so the commands are always of the CUDA >=12 form.

any custom logic for output dependencies might depend on that

Since we're ensuring the command arguments are consistent, we should always get the same outputs for all our cudafe++, cicc, and ptxas invocations, even though nvcc's original commands are dependent on the -gencode flags.

src/compiler/nvcc.rs

drahnr · 2025-06-23T14:36:33Z

src/compiler/nvcc.rs

+                // of whether the compilation flag is `-c`, `-ptx`, or `-cubin`
+
+                // e.g. test_a.compute_XX.cpp1.ii
+                let mut nidx = args.len() - 3;


Did we check anywhere this holds?

Yes, the last three arguments to cicc are always input.cpp1.ii -o output.ptx

drahnr · 2025-06-23T14:42:47Z

src/compiler/nvcc.rs

+                let mut nidx = args.len() - 3;
+                let name = args[nidx].clone();
+                // test_a.compute_XX.cpp1.ii -> test_a.compute_XX
+                let name = name.split(".cpp1.ii").next().unwrap();


Suggested change

let name = name.split(".cpp1.ii").next().unwrap();

let name = name.rsplit_once().ok_or_else(|| { err })?.0;

Q: .cpp1.ii.cpp1.ii should be split how?

Here the name is something like x.compute_60.cpp1.ii, so splitting on the suffix and taking the first value gives just the name x.compute_60.

Is that intended how it's supposed to work?

Yes, this is intended. This block is part of ensuring consistent cicc arguments, regardless of which -gencode flags are or aren't passed.

The input file name is something like foo.compute75.cpp1.ii, so the three flags we splice in are:

--gen_c_file_name foo.compute75.cudafe1.c \ --stub_file_name foo.compute75.cudafe1.stub.c \ --gen_device_file_name foo.compute75.cudafe1.gpu

So the logic here extracts the foo.compute75 component from the input file name and uses it as the prefix for the file name for each of these three flags.

src/compiler/nvcc.rs

drahnr

A few minor nits, otherwise good to go! Thank you!

These changes ensure cache hits for compilations which are subsets of previously cached compilations * Normalize cudafe++, ptx, and cubin names regardless of whether the compilation flag is `-c`, `-ptx`, `-cubin`, or whether there are one or many `-gencode` flags * Include the compiler `hash_key` in the output dir for internal nvcc files to guarantee stability and uniqueness * Fix cache error due to hash collision from not hashing all the PTX and cubin flags

src/compiler/nvcc.rs

drahnr · 2025-06-27T14:02:37Z

src/compiler/nvcc.rs

+                                "[{}]: maybe_keep_temps_then_clean move {src:?} -> {dst:?}",
+                                output_path.display()
+                            );
+                            fs::rename(src, dst)?;


Can we ensure it's the same filesystem? Or have a fallback? Otherwise rename will fail.

How would we check if src and dst are on the same file system? Should I use fs::copy() and fs::remove_file() instead of fs::rename()?

Revisiting, it's ok for the time being

trxcllnt · 2025-06-27T17:39:13Z

Not sure what's going on with Windows2022, but it looks like clang-cl was updated recently?

~~I can try restricting those jobs to CUDA 12.5 instead of 12.8.~~ edit: this isn't the issue.

I see Found clang++ in the logs of the failing tests, so it appears clang-cl isn't being detected as msvc-clang here.

I ran CI on a branch to print out the version: Found clang++ (version: "Clang 19.1.5")

Changing the test to search for clang-cl on Windows fails with the same error. This feels like a bug in the latest VS release (or the recent GHA Windows runner image update).

drahnr

@sylvestre LGTM, you might want to give it a quick pass

trxcllnt · 2025-07-07T17:06:56Z

Don't merge this yet, I need to make a few updates.

trxcllnt mentioned this pull request Apr 15, 2025

Refactor system/dist/CUDA tests #2382

Open

trxcllnt mentioned this pull request Apr 15, 2025

Support nvcc --device-debug flag #2384

Open

robertmaynard approved these changes Apr 21, 2025

View reviewed changes

trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from fc85929 to 051fec9 Compare April 28, 2025 18:55

trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch 2 times, most recently from ffe68b9 to 3c08b18 Compare April 29, 2025 16:28

trxcllnt commented Apr 29, 2025

View reviewed changes

src/compiler/cicc.rs Show resolved Hide resolved

trxcllnt commented Apr 29, 2025

View reviewed changes

src/compiler/gcc.rs Outdated Show resolved Hide resolved

trxcllnt commented Apr 29, 2025

View reviewed changes

src/compiler/msvc.rs Outdated Show resolved Hide resolved

trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from caf8955 to e599942 Compare May 20, 2025 21:41

drahnr reviewed May 23, 2025

View reviewed changes

src/compiler/nvcc.rs Outdated Show resolved Hide resolved

drahnr reviewed May 23, 2025

View reviewed changes

src/compiler/nvcc.rs Outdated Show resolved Hide resolved

trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from e599942 to 4d47627 Compare May 27, 2025 21:21

trxcllnt force-pushed the fix/consistent-nvcc-internal-file-names branch from 4d47627 to 4aa034e Compare June 10, 2025 22:54

sylvestre force-pushed the fix/consistent-nvcc-internal-file-names branch from 4aa034e to f736d4c Compare June 20, 2025 16:22