Open
Description
The current gpu_tensor_hash
implementated in #5974 has following drawbacks:
add
itself is not a very decent reduction method- will perform an on-cpu reduction, which is not very performant for large tensors
TODO
- Rewrite a performant and robust tensor hash function
- Test the performance, consistency and correctness of the hash function against real data
Reference
You can reference here for inspirations