-
Notifications
You must be signed in to change notification settings - Fork 497
Select device lanes #11015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Select device lanes #11015
Conversation
WalkthroughAdds a lane descriptor and comparator; updates lane lookup to collect and sort RMA lanes by bandwidth (tie: device name), track the best non‑RMA lane, choose/promote lanes (prefer non‑RMA at Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Caller
participant Lookup as ucp_device_mem_list_lane_lookup
participant Collector as LaneCollector
participant Sorter as RMA_Sorter
participant Selector as FinalSelector
Caller->>Lookup: request lanes for mem-list
Note right of Lookup: iterate candidates, capture node_local_id
Lookup->>Collector: collect RMA entries (bw, lane, dev_name)
Lookup->>Collector: track best non‑RMA lane & bw
alt RMA entries exist
Collector->>Sorter: sort by bw desc (tie: dev_name)
Sorter-->>Selector: sorted RMA list
Selector->>Lookup: choose index = node_local_id % rma_count
else
Collector-->>Selector: no RMA lanes
end
alt non‑RMA exists
Selector->>Lookup: place non‑RMA at lanes[0]
Selector->>Lookup: place chosen RMA at lanes[1] or promote if bw higher
Note right of Lookup: trace/log non‑RMA and RMA choices
else
Selector->>Lookup: use chosen RMA as lanes[0]
Note right of Lookup: trace/log RMA choice
end
alt no valid lanes
Lookup-->>Caller: return early (no lanes)
else
Lookup-->>Caller: return selected lanes[]
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/ucp/core/ucp_device.c (2)
283-302: Lane descriptor and comparator look correct; consider NULL‑safety fordev_nameThe struct and comparator logic (descending bandwidth, then device name) are straightforward and match the intended ordering. One minor robustness detail:
ucp_device_lane_compareunconditionally callsstrcmp(lane_a->dev_name, lane_b->dev_name). If any transport ever leavesdev_nameasNULL, this will crash insideqsort.If the contract guarantees
dev_nameis neverNULL, it would be good to document or assert it here:- return strcmp(lane_a->dev_name, lane_b->dev_name); + ucs_assert(lane_a->dev_name != NULL); + ucs_assert(lane_b->dev_name != NULL); + return strcmp(lane_a->dev_name, lane_b->dev_name);Otherwise, consider falling back to an empty string or a stable placeholder when
dev_name == NULL.
310-320: Lane selection / load‑balancing logic is sound; a couple of clarity tweaks would helpThe updated
ucp_device_mem_list_lane_lookuplogic correctly:
- Tracks the best “non‑RMA” lane separately (
best_non_rma_lane/best_non_rma_bw).- Accumulates other candidate lanes in
best_lanes[], sorts them by bandwidth+dev_name, and picks one bynode_local_id % num_valid_lanesfor load balancing.- Prefers the RMA lane over the non‑RMA lane when its bandwidth is higher, while still filling at most two entries in
lanes[].A few small refinements could improve readability and diagnostics without changing behavior:
- Make
is_non_rmastrictly boolean for better tracesCurrently:
is_non_rma = (ucp_ep_md_attr(ep, lane)->flags & UCT_MD_FLAG_INVALIDATE_RMA); ... " ... non_rma=%d", ..., is_non_rma);
is_non_rmawill log the raw bitmask, not 0/1. Making it boolean keeps the trace easier to read:- is_non_rma = (ucp_ep_md_attr(ep, lane)->flags & - UCT_MD_FLAG_INVALIDATE_RMA); + is_non_rma = !!(ucp_ep_md_attr(ep, lane)->flags & + UCT_MD_FLAG_INVALIDATE_RMA);
- Avoid relying on
best_non_rma_bw == -1.0as a sentinelThe final selection uses
best_lanes[lane].bandwidth > best_non_rma_bw, wherebest_non_rma_bwis initialized to-1.0. It works, but the coupling between a magic value and control flow is not obvious. A slightly clearer variant would key off the presence ofbest_non_rma_laneinstead:- if (best_lanes[lane].bandwidth > best_non_rma_bw) { - lanes[1] = lanes[0]; - lanes[0] = best_lanes[lane].lane; - } else { - lanes[1] = best_lanes[lane].lane; - } + if (best_non_rma_lane == UCP_NULL_LANE || + best_lanes[lane].bandwidth > best_non_rma_bw) { + lanes[1] = best_non_rma_lane; + lanes[0] = best_lanes[lane].lane; + } else { + lanes[1] = best_lanes[lane].lane; + }Functionally equivalent, but easier to reason about.
- Confirm the “non‑RMA” criterion matches intent
You’re classifying lanes as “non‑RMA” based on
UCT_MD_FLAG_INVALIDATE_RMA:is_non_rma = !!(ucp_ep_md_attr(ep, lane)->flags & UCT_MD_FLAG_INVALIDATE_RMA);This looks intentional but is slightly non‑obvious naming‑wise (a flag mentioning RMA being used to define “non‑RMA”). It might be worth double‑checking that this flag set is exactly the set of lanes you want to prioritize as
lanes[0]before RMA lanes, or at least adding a brief comment tying the flag to the “non‑RMA” concept.Also applies to: 361-405
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/ucp/core/ucp_device.c(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: UCX PR (Static_check Static checks)
- GitHub Check: UCX PR (Codestyle ctags check)
- GitHub Check: UCX PR (Codestyle commit title)
- GitHub Check: UCX PR (Codestyle codespell check)
- GitHub Check: UCX PR (Codestyle AUTHORS file update check)
- GitHub Check: UCX PR (Codestyle format code)
- GitHub Check: UCX release DRP (Prepare CheckRelease)
- GitHub Check: UCX release (Prepare CheckRelease)
- GitHub Check: UCX snapshot (Prepare Check)
8caa8b4 to
41ffa64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/ucp/core/ucp_device.c (1)
401-406: Consider explicit null-lane check for clarity.While the current logic is correct (comparing against
best_non_rma_bw = -1.0when no non-RMA lane exists), the code would be clearer with an explicit check.Apply this diff for improved clarity:
- if (best_lanes[lane].bandwidth > best_non_rma_bw) { + if ((best_non_rma_lane == UCP_NULL_LANE) || + (best_lanes[lane].bandwidth > best_non_rma_bw)) { lanes[1] = lanes[0]; lanes[0] = best_lanes[lane].lane; } else {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/ucp/core/ucp_device.c(2 hunks)
🔇 Additional comments (5)
src/ucp/core/ucp_device.c (5)
283-287: LGTM! Clean descriptor struct.The struct is well-defined for collecting lane information. The
const char *dev_namepointer is appropriate for referencing stable transport resource names.
361-379: Good separation of non-RMA and RMA lanes.The logic correctly distinguishes between non-RMA lanes (can use copy operations) and RMA lanes (require memory registration). Tracking the best non-RMA lane while collecting all RMA lanes for load balancing is the right approach.
392-408: Excellent load balancing implementation!The load balancing logic is well-designed:
- Sorting ensures deterministic ordering across all processes
- The modulo operation (
node_local_id % num_valid_lanes) distributes different processes to different NICs- Lane promotion based on bandwidth comparison optimizes for performance
The logic correctly handles all scenarios (no non-RMA, no RMA, both present).
310-310: No issues found. Thenode_local_idfield is always initialized during context creation with a default value of 0 (seeucp_context.cline 2065), and can be overridden via environment variables (lines 2177-2179). By the time an endpoint accesses this field in line 310, the context is fully initialized.
289-302: <!-- [scratchpad_start]
[task review_file_1/1 review_segment_2/3]
Continue searching for ucp_ep_get_tl_rsc definition and dev_name initialization.[observations]
- Previous search found usages but not definition
- usages show: ucp_ep_get_tl_rsc(ep, lane)->dev_name at line 377
- usages show: ucp_ep_get_tl_rsc(ep, lane)->sys_device at line 344
- This suggests return type has fields: dev_name, sys_device
- Likely defined in header file (not .c file)
[analysis]
- Need to search header files (.h) for the function definition and return type
- Structure might be uct_tl_resource_desc_t or similar based on macro use
[pending]
- Find function signature in header files
- Understand return type and whether dev_name is guaranteed non-NULL
- Check if return value can be NULL
[actions]
Search for:
- ucp_ep_get_tl_rsc in header files (.h)
- uct_tl_resource_desc_t or similar structure definition
- Check if dev_name field is documented or guaranteed
src/ucp/core/ucp_device.c
Outdated
|
|
||
| static int ucp_device_lane_compare(const void *a, const void *b) | ||
| { | ||
| const ucp_device_lane_info_t *lane_a = (const ucp_device_lane_info_t*)a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally casting from void * is redundant
| ucp_lane_index_t lanes[UCP_DEVICE_MEM_LIST_MAX_EPS]) | ||
| { | ||
| double best_bw[UCP_DEVICE_MEM_LIST_MAX_EPS] = {-1., -1.}; | ||
| unsigned long node_local_id = ep->worker->context->config.node_local_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
align
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run format script and it kept it as is.
src/ucp/core/ucp_device.c
Outdated
| /* Sort RMA lanes */ | ||
| qsort(best_lanes, num_valid_lanes, sizeof(ucp_device_lane_info_t), | ||
| ucp_device_lane_compare); | ||
| lane = node_local_id % num_valid_lanes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra spaces
| bandwidth = ucp_worker_iface_bandwidth(ep->worker, | ||
| ucp_ep_get_rsc_index(ep, lane)); | ||
|
|
||
| is_rma = !!(ucp_ep_md_attr(ep, lane)->flags & UCT_MD_FLAG_NEED_MEMH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- not sure we need the separation between RMA and non-RMA lanes
- we could just save the indexes of the valid lanes and then sort, and not save all lanes properties in the array. Use ucs_qsort_r to pass context to get the rest of the info needed for comparison.
- can use ucs_array
What?
Select best device lanes and keep NICs load balanced.
Why?
Multi process applications wants to spread the load between the NICs.
Today it can be achieved only by setting UCX_NET_DEVICES but it require user to be topo aware.
How?
Use user hint to select NIC and keep load balanced.
Summary by CodeRabbit