Achieving Max Performance

For any kernel running on an OpenCL enabled device, the following holds true to achieve optimal performance:

```
If each EU runs n warps concurrently, ideally the number of warps per EU should be equivalent to: 
n * ((mem_ops * (flops / bandwidth)) / compute_ops)
```

However, this requires 3 things that OpenCL does not provide:
1. FLOPS
2. Bandwidth
3. Register File Size

Is there any extension or future extension idea to provide these 3 details for a given device?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Achieving Max Performance #1441

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Achieving Max Performance #1441

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions