Requesting a feature development for fusing the function to reuse cache and reduce kernel launch latency similar to [cupy.fuse](https://docs.cupy.dev/en/stable/reference/generated/cupy.fuse.html) decorator