-
-
Notifications
You must be signed in to change notification settings - Fork 294
Improve float16 performance #2154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I implemented a demo using just the C API, available here https://github.com/bhawkins/demo_hdf5_c4 Profiling confirms that the slowdown is indeed in As context, this issue is highly relevant to an upcoming NASA mission called NISAR. It is an imaging radar that will soon produce a freely available, global dataset of several petabytes. The data has high dynamic range and high entropy, so float16 encoding is an appealing solution to reduce file sizes. Software support for float16 varies, and in several scenarios the obvious or default behavior is to use the HDF5 API to convert to float32 on read, which gets bogged down as in the above demos. This is notably the behavior of GDAL, which forms the basis of a wide variety of GIS applications. So while it is possible to work around this problem on an ad hoc basis in each application, there would be a potentially wide-ranging benefit to simply making the libhdf5 code path faster. |
We plan to add native float16 support but it probably won't be ready until 1.14.5 |
We just had some recent interest in this in JuliaIO/HDF5.jl#341 (comment) . It would be great if there was native float16 and bfloat16 support. |
Hi @bhawkins and @mkitti, if you happen to get the chance it would be appreciated if you could look over the RFC for 16-bit float (and complex number) support at https://forum.hdfgroup.org/t/hdf5-rfc-adding-support-for-16-bit-floating-point-and-complex-number-datatypes-to-hdf5/11975 and give any feedback that you may have in that forum thread. Thanks! |
Using HDF5 to read data stored as 16-bit floating point into a 32-bit buffer is extremely slow, around 16x slower than an equivalent conversion in numpy. I uploaded a demo here. For simplicity I used h5py, but one can obtain the same result using the HDF5 C API. HDF5 also seems to discard any payload bits in NaN values. I suspect the slowdown is due to the very general implementation for custom float types in HDF5 here
hdf5/src/H5Tconv.c
Lines 4267 to 4271 in 306db40
versus the float16-specific handling in numpy.
The case I really care about involves a structured data type (for complex values), which is 44x slower than a numpy workaround. That demo is available here, though I haven't isolated a cause for that extra factor of 3x.
It seems like ideally there'd be a
H5T__conv_half_single
routine that uses hardware to convert from_Float16
(example). I guess this might require adding anative_half
type, which seems like a big job. Or maybe just a special case inH5T__conv_f_f
?The text was updated successfully, but these errors were encountered: