-
Notifications
You must be signed in to change notification settings - Fork 123
Open
Description
For any kernel running on an OpenCL enabled device, the following holds true to achieve optimal performance:
If each EU runs n warps concurrently, ideally the number of warps per EU should be equivalent to:
n * ((mem_ops * (flops / bandwidth)) / compute_ops)
However, this requires 3 things that OpenCL does not provide:
- FLOPS
- Bandwidth
- Register File Size
Is there any extension or future extension idea to provide these 3 details for a given device?
Metadata
Metadata
Assignees
Labels
No labels