-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: avoid unnecessary array copies through np.ndarray.astype where possible #4496
ENH: avoid unnecessary array copies through np.ndarray.astype where possible #4496
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires some careful review which I intend to give it. Thank you!
@@ -153,7 +153,7 @@ def _get_cut_mask(self, grid): | |||
for mi, (i, pos) in enumerate(zip(pids, self.positions[points_in_grid])): | |||
if not points_in_grid[i]: | |||
continue | |||
ci = ((pos - grid.LeftEdge) / grid.dds).astype("int") | |||
ci = ((pos - grid.LeftEdge) / grid.dds).astype("int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular one doesn't necessarily need promotion to int64 (although I recognize numpy may do it internally) because it's a relative offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doens't change behaviour, I'm just making the size explicit rather than implicit.
No need to review this before I'm done stabilising tests, but thank you for showing interest in it ! |
33b3c1b
to
949cd3e
Compare
949cd3e
to
1914908
Compare
…(...).value -> arr.to_value(...))
1914908
to
162cdfb
Compare
force-pushed to resolve merge conflicts |
I think this looks good, and I've read through it and think it's fine. I would like another pass at it before it gets merged, but I think it's on the right track. |
@yt-fido test this please |
@yt-fido test this please |
@matthewturk any chance you could have a look soon ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and I don't see any behavior changes to worry about.
@@ -128,7 +128,7 @@ def check_tree(self): | |||
nre = self.ds.arr(node.get_right_edge(), units="code_length") | |||
li = np.rint((nle - gle) / dds).astype("int32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outside of scope for this PR but my eyebrows raised a little seeing this casting to int32 instead of int64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, fortunately these variables don't seem to be actually used (except for that one assert np.all(dims > 0)
which is runtime-skippable)
PR Summary
np.ndarray.astype
returns a copy of an array by default. In many cases, copies can be avoided by using the target datatype from the start.This is a thorough, yet conservative refactor: I've stayed away from cases where precision was truncated after a calculation, so as to not introduce new floating point errors.
Additionally, I've changed a couple places where we were casting to
"int"
orint
to use"int64"
instead, which is identical on POSIX platform but improve consistency on Windows, whereastype("int")
might be equivalent toastype("int32")
(I couldn't find a reference to back this up, so I may be misremebering).