Skip to content

[Perf] improve the hash kernel for mm #8054

Open
@mickqian

Description

@mickqian

The current gpu_tensor_hash implementated in #5974 has following drawbacks:

  1. add itself is not a very decent reduction method
  2. will perform an on-cpu reduction, which is not very performant for large tensors

TODO

  1. Rewrite a performant and robust tensor hash function
  2. Test the performance, consistency and correctness of the hash function against real data

Reference

You can reference here for inspirations

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions