Skip to content

Dataset caching #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: wacasoft
Choose a base branch
from
Open

Dataset caching #40

wants to merge 7 commits into from

Conversation

davidhassell
Copy link
Collaborator

See NCAS-CMS/h5netcdf#8 for details

@davidhassell davidhassell requested a review from bnlawrence April 25, 2025 16:04
offset += _padded_size(attr_dict['datatype_size'], padding_multiple)

# Read the dataspace information
shape, maxshape = determine_data_shape(buffer, offset)
items = int(np.prod(shape))
items = prod(shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this depends on shape being an integer, so you can remove the redundancy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bit of premature optimisation - math.prod is way faster than np.prod

fh = open(self._filename, 'rb')
try:
# Try s3 being an s3fs.S3File object
fh = fh.s3.open(self._filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth putting a much larger comment section in advance of this block explaining the various entities in play. I already can't remember what is going on here and why and a minute of trying to chase it down didn't enlighten me :-).

Tthen, instead of trying if it is an object by trying the method might be better to use the isinstance to check whether it is a particular type of entity that way the code will be much clearer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Would this be a bit better?

            if self.posix:
                fh = open(self._filename, 'rb')
            else:
                try:
                    # Try s3 being an s3fs.S3File object
                    fh = fh.s3.open(self._filename)
               except AttributeError:
                    raise SomeSortOfError("unknown file object type")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better, but is there a reason why we wouldn't do:

if self.posix:
    fh=open(self._filename,'rb')
elif isinstance(fh,s3fs.S3File):
   fh=fh.s3.open(self._filename)
else:
   raise SomeSortofError(...)

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it boils down to if we want to "Look Before You Leap", or else "Easier to Ask Forgiveness". I'm fine with either approach (assuming that there are no other libraries that share the s3fs API) and the former (i.e. the code you suggested) is certainly more explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants