Skip to content

Achieving Max Performance #1441

@EJain-Dev

Description

@EJain-Dev

For any kernel running on an OpenCL enabled device, the following holds true to achieve optimal performance:

If each EU runs n warps concurrently, ideally the number of warps per EU should be equivalent to: 
n * ((mem_ops * (flops / bandwidth)) / compute_ops)

However, this requires 3 things that OpenCL does not provide:

  1. FLOPS
  2. Bandwidth
  3. Register File Size

Is there any extension or future extension idea to provide these 3 details for a given device?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions