-
Notifications
You must be signed in to change notification settings - Fork 1
Dataset caching #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: wacasoft
Are you sure you want to change the base?
Dataset caching #40
Conversation
offset += _padded_size(attr_dict['datatype_size'], padding_multiple) | ||
|
||
# Read the dataspace information | ||
shape, maxshape = determine_data_shape(buffer, offset) | ||
items = int(np.prod(shape)) | ||
items = prod(shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this depends on shape being an integer, so you can remove the redundancy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bit of premature optimisation - math.prod
is way faster than np.prod
fh = open(self._filename, 'rb') | ||
try: | ||
# Try s3 being an s3fs.S3File object | ||
fh = fh.s3.open(self._filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth putting a much larger comment section in advance of this block explaining the various entities in play. I already can't remember what is going on here and why and a minute of trying to chase it down didn't enlighten me :-).
Tthen, instead of trying if it is an object by trying the method might be better to use the isinstance to check whether it is a particular type of entity that way the code will be much clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Would this be a bit better?
if self.posix:
fh = open(self._filename, 'rb')
else:
try:
# Try s3 being an s3fs.S3File object
fh = fh.s3.open(self._filename)
except AttributeError:
raise SomeSortOfError("unknown file object type")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's better, but is there a reason why we wouldn't do:
if self.posix:
fh=open(self._filename,'rb')
elif isinstance(fh,s3fs.S3File):
fh=fh.s3.open(self._filename)
else:
raise SomeSortofError(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it boils down to if we want to "Look Before You Leap", or else "Easier to Ask Forgiveness". I'm fine with either approach (assuming that there are no other libraries that share the s3fs
API) and the former (i.e. the code you suggested) is certainly more explicit.
See NCAS-CMS/h5netcdf#8 for details