- 
                Notifications
    You must be signed in to change notification settings 
- Fork 24
Support for Opaque datasets #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
  
     Merged
                    Changes from 3 commits
      Commits
    
    
            Show all changes
          
          
            9 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      d012c07
              
                Initial test framework for opaque data (passes with notimplementederror)
              
              
                 7abc0a5
              
                All the core functionality necessary for #88. Need to do some more te…
              
              
                 536e193
              
                More tests. Documentation
              
              
                 c074e83
              
                Some more tests to make coverage happy
              
              
                 6a8c005
              
                Apparently you need to commit all the files you have changed. Who knew?
              
              
                 0a0149f
              
                Update pyfive/datatype_msg.py
              
              
                bnlawrence 54aaff8
              
                Bug fixes, and more coverage, following great review by @kmuehlbauer.
              
              
                 1f02728
              
                Fixed a comment that shouldn't have survived.
              
              
                 7cb0531
              
                Cleaning up
              
              
                 File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -7,4 +7,7 @@ Getting started | |
| Installation <installation> | ||
| Usage <usage> | ||
| Enumerations <enums> | ||
| Opaque Datasets <opaque> | ||
|  | ||
|  | ||
|  | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| Opaque Datasets | ||
| --------------- | ||
|  | ||
| It is possible to create datasets with opaque datatypes in HDF5. These are | ||
| datasets where the data is stored as a sequence of bytes, with no | ||
| interpretation of those bytes. This is not a commonly used feature of HDF5, | ||
| but it is used in some applications. The `h5py` package supports reading | ||
| and writing opaque datatypes, and so `pyfive` also supports reading them. | ||
|  | ||
| This implementation has only been tested for opaque datatypes that | ||
| were created using `h5py`. | ||
|  | ||
| Such opaque datatypes will be transparently read into the same type of | ||
| numpy array as was used to write the data. The users should not | ||
| need to do anything special to read the data - but may need to do | ||
| something special with the data to interpret it once read. | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| import os | ||
|  | ||
| import numpy as np | ||
| import pytest | ||
| from numpy.testing import assert_array_equal | ||
|  | ||
| import pyfive | ||
| import h5py | ||
|  | ||
|  | ||
| def test_opaque_dataset_hdf5(name, data): | ||
|  | ||
| # Verify that h5py can read this file before we do | ||
| # our own test. If this fails, pyfive cannot be | ||
| # expected to get it right. | ||
|  | ||
| (ordinary_data, string_data, opdata) = data | ||
|  | ||
| with h5py.File(name, "r") as f: | ||
| dset = f["opaque_datetimes"] | ||
| assert_array_equal(dset[...], opdata.astype(h5py.opaque_dtype(opdata.dtype))) | ||
|  | ||
| # Now see if pyfive can do the right thing | ||
| with pyfive.File(name) as hfile: | ||
| # check data | ||
| dset = hfile["opaque_datetimes"] | ||
| # pyfive should return the same raw bytes that h5py wrote | ||
| # but in the instance that it is tagged with NUMPY, | ||
| # pyfive automatically fixes it, which it should be for this example. | ||
| assert_array_equal(dset[...], opdata) | ||
|  | ||
| # make sure the other things are fine | ||
| assert_array_equal(hfile['string_data'][...],string_data) | ||
| assert_array_equal(hfile['ordinary_data'][...],ordinary_data) | ||
|  | ||
| assert pyfive.check_opaque_dtype(dset.dtype) is True | ||
| assert pyfive.check_enum_dtype(dset.dtype) is None | ||
| assert pyfive.check_opaque_dtype(hfile['ordinary_data'].dtype) is False | ||
| assert pyfive.check_dtype(opaque=hfile['ordinary_data'].dtype) is False | ||
| assert pyfive.check_dtype(opaque=hfile['opaque_datetimes'].dtype) is True | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
| @pytest.fixture(scope='module') | ||
| def data(): | ||
| """Provide datetime64 array data.""" | ||
| ordinary_data = np.array([1, 2, 3], dtype='i4') | ||
| string_data = np.array([b'one', b'two', b'three'], dtype='S5') | ||
| opaque_data = np.array([ | ||
| np.datetime64("2019-09-22T17:38:30"), | ||
| np.datetime64("2020-01-01T00:00:00"), | ||
| np.datetime64("2025-10-04T12:00:00"), | ||
| ]) | ||
|  | ||
| data = (ordinary_data, string_data, opaque_data) | ||
|  | ||
| return data | ||
|  | ||
|  | ||
| @pytest.fixture(scope='module') | ||
| def name(data): | ||
| """Create an HDF5 file with datetime64 data stored as opaque.""" | ||
| name = os.path.join(os.path.dirname(__file__), "opaque_datetime.hdf5") | ||
|  | ||
| (ordinary_data, string_data, opdata) = data | ||
|  | ||
| # Convert dtype to an opaque version (as per h5py docs) | ||
| # AFIK this just adds {'h5py_opaque': True} to the dtype metadata | ||
| # without which h5py cannot write the data. | ||
|  | ||
| opaque_data = opdata.astype(h5py.opaque_dtype(opdata.dtype)) | ||
|  | ||
| # Want to put some other things in the file too, so we can exercise | ||
| # some of the other code paths. | ||
|  | ||
| with h5py.File(name, "w") as f: | ||
| f["opaque_datetimes"] = opaque_data | ||
| f['string_data'] = string_data | ||
| f['ordinary_data'] = ordinary_data | ||
|  | ||
| return name | 
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.