Improve float16 performance #2154

bhawkins · 2022-10-09T19:48:32Z

Using HDF5 to read data stored as 16-bit floating point into a 32-bit buffer is extremely slow, around 16x slower than an equivalent conversion in numpy. I uploaded a demo here. For simplicity I used h5py, but one can obtain the same result using the HDF5 C API. HDF5 also seems to discard any payload bits in NaN values. I suspect the slowdown is due to the very general implementation for custom float types in HDF5 here

hdf5/src/H5Tconv.c

Lines 4267 to 4271 in 306db40

    
            * Function:    H5T__conv_f_f 
        
            * 
        
            * Purpose:    Convert one floating point type to another.  This is a catch 
        
            *        all for floating point conversions and is probably not 
        
            *        particularly fast!

versus the float16-specific handling in numpy.

The case I really care about involves a structured data type (for complex values), which is 44x slower than a numpy workaround. That demo is available here, though I haven't isolated a cause for that extra factor of 3x.

It seems like ideally there'd be a H5T__conv_half_single routine that uses hardware to convert from _Float16 (example). I guess this might require adding a native_half type, which seems like a big job. Or maybe just a special case in H5T__conv_f_f?

The text was updated successfully, but these errors were encountered:

bhawkins · 2022-10-29T02:45:34Z

I implemented a demo using just the C API, available here

https://github.com/bhawkins/demo_hdf5_c4

Profiling confirms that the slowdown is indeed in H5T__conv_f_f, which is more than 30 times slower than whatever clang does.

As context, this issue is highly relevant to an upcoming NASA mission called NISAR. It is an imaging radar that will soon produce a freely available, global dataset of several petabytes. The data has high dynamic range and high entropy, so float16 encoding is an appealing solution to reduce file sizes.

Software support for float16 varies, and in several scenarios the obvious or default behavior is to use the HDF5 API to convert to float32 on read, which gets bogged down as in the above demos. This is notably the behavior of GDAL, which forms the basis of a wide variety of GIS applications. So while it is possible to work around this problem on an ad hoc basis in each application, there would be a potentially wide-ranging benefit to simply making the libhdf5 code path faster.

derobins · 2024-01-19T20:58:51Z

We plan to add native float16 support but it probably won't be ready until 1.14.5

mkitti · 2024-01-22T15:25:22Z

We just had some recent interest in this in JuliaIO/HDF5.jl#341 (comment) . It would be great if there was native float16 and bfloat16 support.

jhendersonHDF · 2024-01-31T20:58:38Z

Hi @bhawkins and @mkitti, if you happen to get the chance it would be appreciated if you could look over the RFC for 16-bit float (and complex number) support at https://forum.hdfgroup.org/t/hdf5-rfc-adding-support-for-16-bit-floating-point-and-complex-number-datatypes-to-hdf5/11975 and give any feedback that you may have in that forum thread. Thanks!

bhawkins added the enhancement label Oct 9, 2022

derobins removed the enhancement label Mar 3, 2023

derobins self-assigned this May 3, 2023

derobins added Priority - 1. High These are important issues that should be resolved in the next release Component - C Library Core C library issues (usually in the src directory) Type - Improvement labels May 3, 2023

derobins changed the title ~~[Feature Request] float16 performance~~ Improve float16 performance May 3, 2023

derobins modified the milestones: 1.14.4, 1.14.5 Jan 19, 2024

mkitti mentioned this issue Jan 22, 2024

Added support for Float16 JuliaIO/HDF5.jl#341

Open

lukas-weber mentioned this issue Jan 29, 2024

Add native int128/uint128 support #3970

Open

derobins assigned jhendersonHDF Feb 12, 2024

derobins modified the milestones: 1.14.5, 2.0.0 Oct 15, 2024

jhendersonHDF unassigned derobins Feb 27, 2025

brtnfld removed the Type - Improvement label Apr 25, 2025

nbagha1 added this to HDF5 - TRIAGE & TRACK May 1, 2025

github-project-automation bot moved this to To be triaged in HDF5 - TRIAGE & TRACK May 1, 2025

nbagha1 moved this from To be triaged to Backlog in HDF5 - TRIAGE & TRACK May 23, 2025

nbagha1 removed this from the Release 2.0.0 milestone May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve float16 performance #2154

Improve float16 performance #2154

bhawkins commented Oct 9, 2022

bhawkins commented Oct 29, 2022

Uh oh!

derobins commented Jan 19, 2024

Uh oh!

mkitti commented Jan 22, 2024

Uh oh!

jhendersonHDF commented Jan 31, 2024

Uh oh!

Uh oh!

Improve float16 performance #2154

Improve float16 performance #2154

Comments

bhawkins commented Oct 9, 2022

bhawkins commented Oct 29, 2022

Uh oh!

derobins commented Jan 19, 2024

Uh oh!

mkitti commented Jan 22, 2024

Uh oh!

jhendersonHDF commented Jan 31, 2024

Uh oh!