-
Notifications
You must be signed in to change notification settings - Fork 584
Use consistent names for internal nvcc
files [DO NOT MERGE]
#2383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use consistent names for internal nvcc
files [DO NOT MERGE]
#2383
Conversation
Looks like integration tests are failing due to https://github.blog/changelog/2025-03-20-notification-of-upcoming-breaking-changes-in-github-actions/#decommissioned-cache-service-brownouts |
cc: @robertmaynard for review |
fc85929
to
051fec9
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2383 +/- ##
==========================================
+ Coverage 71.58% 71.70% +0.11%
==========================================
Files 65 65
Lines 36214 36434 +220
==========================================
+ Hits 25923 26124 +201
- Misses 10291 10310 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ffe68b9
to
3c08b18
Compare
i wish this kind of changes would be done in a separate PR
Maybe split this PR into several Note: it is why it takes time to merge your PR, split them into smaller PR would make our life much easier |
@sylvestre I am fine with either squashing or merging. If you'd prefer to squash, are there files you'd like me to revert? If merge, I can rebase out the follow-up commits. |
would be my personal preference. |
same, smaller PR would be ideal :) |
caf8955
to
e599942
Compare
I rebased on I can make a separate PR with the test changes after this one. How does that sound? |
I do plan to update the |
e599942
to
4d47627
Compare
I updated the tests in The rust v1.75.0 jobs are failing to |
CC @sylvestre re grcov |
it is change in the dep tree of grcov |
4d47627
to
4aa034e
Compare
4aa034e
to
f736d4c
Compare
args.splice(idx..(idx + 1), []); | ||
} | ||
// Fix for CTK < 12.0: | ||
// Remove `--gen_module_id_file` if cudafe++ already does it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want to encode cudafe++
specific behaviour. I pressume cudafe++
is just one example, and the fix is adjusting the path.
If this is done though, then any custom logic for output dependencies might depend on that. Could you elaborate if you see any further side effects of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic normalizes the command order and arguments due to differing behavior between CUDA <=11 and >=12.
CUDA <=11 nvcc generates the commands in an order like this:
cicc -arch=60 --module_id_file_name foo.module_id --gen_module_id_file ...
...
cicc -arch=70 --module_id_file_name foo.module_id ...
...
cudafe++ --module_id_file_name foo.module_id ...
In the above, the first cicc
call creates foo.module_id
on disk due to --gen_module_id_file
, which is consumed by the later cicc
and cudafe++
calls. This is problematic, since we want to potentially run both cicc
calls in parallel.
CUDA >= 12 nvcc generates the cudafe++
command first, and puts the --gen_module_id_file
flag on it.
So here we reorganize and splice in/out arguments so the commands are always of the CUDA >=12 form.
any custom logic for output dependencies might depend on that
Since we're ensuring the command arguments are consistent, we should always get the same outputs for all our cudafe++
, cicc
, and ptxas
invocations, even though nvcc's original commands are dependent on the -gencode
flags.
// of whether the compilation flag is `-c`, `-ptx`, or `-cubin` | ||
|
||
// e.g. test_a.compute_XX.cpp1.ii | ||
let mut nidx = args.len() - 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we check anywhere this holds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the last three arguments to cicc are always input.cpp1.ii -o output.ptx
let mut nidx = args.len() - 3; | ||
let name = args[nidx].clone(); | ||
// test_a.compute_XX.cpp1.ii -> test_a.compute_XX | ||
let name = name.split(".cpp1.ii").next().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let name = name.split(".cpp1.ii").next().unwrap(); | |
let name = name.rsplit_once().ok_or_else(|| { err })?.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: .cpp1.ii.cpp1.ii
should be split how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the name is something like x.compute_60.cpp1.ii
, so splitting on the suffix and taking the first value gives just the name x.compute_60
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that intended how it's supposed to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intended. This block is part of ensuring consistent cicc
arguments, regardless of which -gencode
flags are or aren't passed.
The input file name is something like foo.compute75.cpp1.ii
, so the three flags we splice in are:
--gen_c_file_name foo.compute75.cudafe1.c \
--stub_file_name foo.compute75.cudafe1.stub.c \
--gen_device_file_name foo.compute75.cudafe1.gpu
So the logic here extracts the foo.compute75
component from the input file name and uses it as the prefix for the file name for each of these three flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor nits, otherwise good to go! Thank you!
These changes ensure cache hits for compilations which are subsets of previously cached compilations * Normalize cudafe++, ptx, and cubin names regardless of whether the compilation flag is `-c`, `-ptx`, `-cubin`, or whether there are one or many `-gencode` flags * Include the compiler `hash_key` in the output dir for internal nvcc files to guarantee stability and uniqueness * Fix cache error due to hash collision from not hashing all the PTX and cubin flags
f736d4c
to
30e18c9
Compare
"[{}]: maybe_keep_temps_then_clean move {src:?} -> {dst:?}", | ||
output_path.display() | ||
); | ||
fs::rename(src, dst)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ensure it's the same filesystem? Or have a fallback? Otherwise rename
will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we check if src
and dst
are on the same file system? Should I use fs::copy()
and fs::remove_file()
instead of fs::rename()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisiting, it's ok for the time being
Not sure what's going on with Windows2022, but it looks like
I see I ran CI on a branch to print out the version: Changing the test to search for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sylvestre LGTM, you might want to give it a quick pass
nvcc
filesnvcc
files [DO NOT MERGE]
Don't merge this yet, I need to make a few updates. |
This PR extracts the fixes from #2356 per @drahnr's request.
This PR branched from/is a follow-up to #2382. See the diff of the two branches here.
The names for the internal files depend on the compilation flag and device architectures.
nvcc
generates a different name for the.cpp1.ii
,.cudafe1.c
,.cudafe1.stub.c
,.cudafe1.gpu
,.ptx
, and.cubin
files when the compile flag is-ptx
,-cubin
, or-c
, and also on whether there's one vs. many-gencode
arguments. Additionally, it will either include or omit the--gen_module_id_file
flag from thecicc
invocation based on whether the compile flag is-ptx
,-cubin
, or-c
.Some examples:
From the above, we observe that:
.cpp1.ii
,.cudafe1.c
,.cudafe1.stub.c
,.cudafe1.gpu
, and.ptx
files are either:x.<suffix>
x.compute_XX.<suffix>
.cubin
files are either:x.cubin
x.compute_XX.cubin
x.compute_XX.sm_XX.cubin
-c
, thecicc
command includes--gen_module_id_file
-c
, thecicc
command omits--gen_module_id_file
This PR hashes all the
cudafe++
,cicc
, andptxas
arguments to avoid collisions, but nvcc's inconsistent file naming leads to cache misses when there should be hits. So for simplicity I updated the renaming logic to rename to the longest form of each (i.e.x.compute_XX.ptx
,x.compute_XX.sm_XX.cubin
), and always add the--gen_module_id_file
flag tocicc
invocations.