Conversation
| _func(UCX_PERF_CMD_PUT_SINGLE, __VA_ARGS__); \ | ||
| break; \ | ||
| case UCX_PERF_CMD_PUT_SINGLE_V2: \ | ||
| _func(UCX_PERF_CMD_PUT_SINGLE_V2, __VA_ARGS__); \ |
There was a problem hiding this comment.
I have a general suggestion here:
Could we utilize the fall through feature of the switch statement to avoid repetition?
Something like:
#define UCX_PERF_SWITCH_CMD(_cmd, _func, ...) \
switch (_cmd) { \
case UCX_PERF_CMD_PUT_SINGLE: \
case UCX_PERF_CMD_PUT_SINGLE_V2: \
case UCX_PERF_CMD_PUT_MULTI: \
case UCX_PERF_CMD_PUT_PARTIAL: \
_func(_cmd, __VA_ARGS__); \
break; \
default: \
ucs_error("Unsupported cmd: %d", _cmd); \
break; \
}The same could be applied for #define UCX_PERF_SWITCH_LEVEL
There was a problem hiding this comment.
The problem is that _cmd's value eventually gets used as a template parameter in the kernel launch:
_kernel<_level, _cmd><<<_blocks, _threads, _shared_size>>>(__VA_ARGS__);
and nvcc requires template parameters to be compile-time constants, but when using fall-through, _cmd remains a runtime variable.
8271a71 to
9f60574
Compare
6b2e563 to
f4e9803
Compare
| elems[i].length = perf.params.msg_size_list[i]; | ||
| offset += elems[i].length; | ||
|
|
||
| /* local elements - API v2 */ |
There was a problem hiding this comment.
Maybe init mem list elems on demand according to test and not both always.
| size_t count = data_count + (has_counter(perf) ? 1 : 0); | ||
| size_t offset = 0; | ||
| ucp_device_mem_list_elem_t elems[count]; | ||
| ucp_device_mem_list_elem_t local_elems[count]; |
There was a problem hiding this comment.
With latest API we can use existing ucp_device_mem_list_elem_t elems[count] to create all handles types
| throw std::runtime_error("Failed to create memory list"); | ||
| } | ||
|
|
||
| ucp_device_mem_list_params_t local_params; |
There was a problem hiding this comment.
Maybe create mem list handle on demand according to test and not both always.
| UCX_PERF_CMD_AM, | ||
| UCX_PERF_CMD_PUT, | ||
| UCX_PERF_CMD_PUT_SINGLE, | ||
| UCX_PERF_CMD_PUT_SINGLE_V2, |
There was a problem hiding this comment.
Can we maybe use existing UCX_PERF_CMD_PUT instead of UCX_PERF_CMD_PUT_SINGLE_V2?
UCX_PERF_CMD_PUT is used for host put tests, but maybe we can use it also for device put test if we can differ between them by the -a option
| remote_params.elements = remote_elems; | ||
|
|
||
| deadline = ucs_get_time() + ucs_time_from_sec(60.0); | ||
| do { |
There was a problem hiding this comment.
Maybe use helper func to improve code reuse and better separation ?
What?
Added perftest for device api v2