Dataset caching #40

davidhassell · 2025-04-25T16:04:52Z

bnlawrence · 2025-04-26T08:01:55Z

pyfive/dataobjects.py

        offset += _padded_size(attr_dict['datatype_size'], padding_multiple)

        # Read the dataspace information
        shape, maxshape = determine_data_shape(buffer, offset)
-        items = int(np.prod(shape))
+        items = prod(shape)


I guess this depends on shape being an integer, so you can remove the redundancy?

Just a bit of premature optimisation - math.prod is way faster than np.prod

bnlawrence · 2025-04-26T08:10:30Z

pyfive/h5d.py

-            fh = open(self._filename, 'rb')
+            try:
+                # Try s3 being an s3fs.S3File object
+                fh = fh.s3.open(self._filename)


I think it might be worth putting a much larger comment section in advance of this block explaining the various entities in play. I already can't remember what is going on here and why and a minute of trying to chase it down didn't enlighten me :-).

Tthen, instead of trying if it is an object by trying the method might be better to use the isinstance to check whether it is a particular type of entity that way the code will be much clearer?

I see. Would this be a bit better?

if self.posix: fh = open(self._filename, 'rb') else: try: # Try s3 being an s3fs.S3File object fh = fh.s3.open(self._filename) except AttributeError: raise SomeSortOfError("unknown file object type")

That's better, but is there a reason why we wouldn't do:

if self.posix: fh=open(self._filename,'rb') elif isinstance(fh,s3fs.S3File): fh=fh.s3.open(self._filename) else: raise SomeSortofError(...)

?

I suppose it boils down to if we want to "Look Before You Leap", or else "Easier to Ask Forgiveness". I'm fine with either approach (assuming that there are no other libraries that share the s3fs API) and the former (i.e. the code you suggested) is certainly more explicit.

Thanks. If you're ok with both, I prefer the former, as it's clearer to (this) reader :-).

davidhassell added 7 commits April 7, 2025 10:58

read empty string attributes

972fc79

dev

e47a527

dev

17eee65

dev

0189fc3

dev

89b28fe

dev

0fb1e39

caching

e3004a6

davidhassell requested a review from bnlawrence April 25, 2025 16:04

bnlawrence reviewed Apr 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset caching #40

Dataset caching #40

Uh oh!

davidhassell commented Apr 25, 2025

Uh oh!

bnlawrence Apr 26, 2025

Uh oh!

davidhassell Apr 28, 2025

Uh oh!

bnlawrence Apr 26, 2025

Uh oh!

davidhassell Apr 28, 2025

Uh oh!

bnlawrence Apr 29, 2025

Uh oh!

davidhassell Apr 29, 2025

Uh oh!

bnlawrence Apr 29, 2025

Uh oh!

Uh oh!

Dataset caching #40

Are you sure you want to change the base?

Dataset caching #40

Uh oh!

Conversation

davidhassell commented Apr 25, 2025

Uh oh!

bnlawrence Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

davidhassell Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

bnlawrence Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

davidhassell Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

bnlawrence Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

davidhassell Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

bnlawrence Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!