CUDA backend performance tuning #30

Open

Milestone

opened

We need to investigate and study the best strategy for performance tuning in the CUDA backend.

One knob is the thread block size vs number of blocks.

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

yaksa-1.0b2

Relationships

None yet

Development

No branches or pull requests