Implement batched GPU transforms #232

ziw-liu · 2025-03-27T17:50:35Z

          > This might be because MONAI transforms are not batched (executed in a loop), and CPU/GPU sync could be taking much longer than the actual compute.

Benchmark of 3D random affine in 6a88ec4 (10 runs, milliseconds):

Device	MONAI (sequential)	Kornia (batched)	Relative
Zen 2 CPU (1 thread)	9160	3800	2.4
Zen 2 CPU (16 threads)	7320	556	13.2
A40 GPU	2620	210	12.5

Originally posted by @ziw-liu in #218 (comment)

The text was updated successfully, but these errors were encountered:

ziw-liu · 2025-03-27T17:51:12Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement batched GPU transforms #232

Implement batched GPU transforms #232

ziw-liu commented Mar 27, 2025

ziw-liu commented Mar 27, 2025

Implement batched GPU transforms #232

Implement batched GPU transforms #232

Comments

ziw-liu commented Mar 27, 2025

ziw-liu commented Mar 27, 2025