-
Notifications
You must be signed in to change notification settings - Fork 119
TL/UCP: add support for onesided dynamic segments #1149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Can one of the admins verify this patch? |
8510a91 to
2d9a7b5
Compare
ae91958 to
ff393c7
Compare
|
@wfaderhold21 Let's target this for v1.6 release. @janjust @Sergei-Lebedev can we review this in upcoming weeks? |
a6a7ce5 to
659098c
Compare
|
@janjust @Sergei-Lebedev @nsarka @ikryukov I believe I've addressed the feedback thus far. Please let me know if you have any further comments/concerns. |
|
@janjust @nsarka @ikryukov @Sergei-Lebedev any comments on this? |
715356b to
5c5253e
Compare
|
@janjust @Sergei-Lebedev @nsarka @ikryukov any feedback would be appreciated. |
b789f8f to
c6b020a
Compare
TL/UCP: alltoall onesided convert to dyn seg
REVIEW: fix clang-tidy REVIEW: address feedback REVIEW: cleanup rebase REVIEW: address feedback
REVIEW: fix clang-tidy
c6b020a to
99d384e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR introduces dynamic memory segment support for TL/UCP onesided collectives, enabling implicit memory handle creation at collective initialization rather than requiring pre-mapped memory regions. Three new internal interfaces (ucc_tl_ucp_dynamic_segment_{init|exchange|finalize}) handle on-demand registration of source/destination buffers, exchange of memory handles via service collectives (allgather), and cleanup on completion. The implementation removes static memory segment arrays (va_base, base_length) from team structures and adds dynamic segment metadata to task structures, allowing algorithms to defer registration until collective invocation. When memory is already mapped through context creation or explicit ucc_mem_map calls, the dynamic segment logic becomes a no-op, preserving backward compatibility. The changes primarily affect alltoall/alltoallv onesided algorithms, with corresponding test infrastructure updates to validate both pre-mapped and dynamic segment code paths.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| src/components/tl/ucp/tl_ucp_coll.c | 2/5 | Adds core dynamic segment implementation with memory handle creation/exchange/cleanup; contains critical memory leaks and use-after-free bugs in error paths |
| src/components/tl/ucp/alltoall/alltoall_onesided.c | 2/5 | Integrates dynamic segments into alltoall algorithm; missing memory handle swap for GET operations and unclear finalization status handling |
| src/components/tl/ucp/alltoallv/alltoallv_onesided.c | 2/5 | Adds global source memory handle support and enforces global destination handles; accesses flags field without mask validation |
| src/components/tl/ucp/tl_ucp_context.c | 2/5 | Fixes memory cleanup in unmap functions; introduces use-after-free bug at lines 827-830 where freed data is accessed |
| src/components/tl/ucp/tl_ucp_sendrecv.h | 4/5 | Refactors one-sided memory resolution with new segment resolution helper; thread-safety of concurrent rkey unpacking needs verification |
| src/components/tl/ucp/tl_ucp.h | 4/5 | Removes static segment arrays from team structure and adds public memory mapping API; architectural change affects all onesided algorithms |
| src/components/tl/ucp/tl_ucp_task.h | 4/5 | Adds task flag and embedded dynamic segment structure for per-collective memory management; well-designed integration with existing infrastructure |
| src/components/tl/ucp/tl_ucp_coll.h | 3/5 | Declares new dynamic segment interfaces with inline test helper; hard-coded threshold and alltoall-specific parameter may need refinement |
| src/components/tl/ucp/tl_ucp_team.c | 3/5 | Removes static memory info initialization from team creation; onesided algorithms must now use dynamic segments or context-level handles |
| test/mpi/test_alltoallv.cc | 4/5 | Restructures buffer allocation to use pre-mapped buffers for onesided operations and dynamic allocation otherwise; correctly implements dynamic segment support |
| test/gtest/coll/test_alltoall.cc | 2/5 | Adds dynamic segment test case but overrides flags after initialization, potentially creating inconsistent state |
| test/mpi/test_mpi.h | 5/5 | Adds memory handle tracking fields and dynamic segment flag to test infrastructure; clean and well-integrated additions |
| test/mpi/test_case.cc | 5/5 | Adds "DynSeg" label to test output for dynamic segment visibility; correct implementation following existing pattern |
| test/gtest/common/test_ucc.cc | 5/5 | Adds zero-initialization for memory map array to prevent undefined behavior; defensive programming improvement |
| test/mpi/test_mpi.cc | 5/5 | Test infrastructure changes supporting dynamic segment feature (review not detailed in file summaries) |
| src/components/tl/ucp/tl_ucp.c | 5/5 | Updates EXPORTED_MEMORY_HANDLE config from string to numeric boolean; cosmetic consistency improvement |
Confidence score: 2/5
- This PR contains critical memory management bugs including use-after-free errors and memory leaks that will cause crashes or undefined behavior in production code
- Score reflects multiple severe issues: use-after-free in
tl_ucp_context.clines 827-830, memory leaks intl_ucp_coll.cerror paths (lines 228-229), use-after-free during cleanup loops (lines 594, 620-627), and missing memory handle swap for GET operations inalltoall_onesided.c. Additional concerns include unchecked flag field access inalltoallv_onesided.cline 35, hard-coded magic numbers in exchange loops, and unclear state handling during synchronous dynamic segment completion - Pay close attention to
src/components/tl/ucp/tl_ucp_coll.c(memory management in error paths),src/components/tl/ucp/tl_ucp_context.c(use-after-free at lines 827-830),src/components/tl/ucp/alltoall/alltoall_onesided.c(GET operation memory handle logic), andsrc/components/tl/ucp/alltoallv/alltoallv_onesided.c(flags field validation)
Sequence Diagram
sequenceDiagram
participant User
participant UCC_Core
participant TL_UCP_Coll
participant Dynamic_Segment
participant Service_Coll
participant UCP_Worker
User->>UCC_Core: ucc_collective_init(alltoall_onesided)
UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_init()
alt Memory handles NOT provided
TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_init()
Dynamic_Segment->>Dynamic_Segment: dynamic_segment_map_memh(src)
Dynamic_Segment->>Dynamic_Segment: dynamic_segment_map_memh(dst)
Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_map(EXPORT)
Dynamic_Segment-->>TL_UCP_Coll: Set UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG
end
TL_UCP_Coll-->>UCC_Core: Return task handle
User->>UCC_Core: ucc_collective_post(task)
UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_start()
alt Dynamic segments enabled
TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_exchange_nb()
Dynamic_Segment->>Dynamic_Segment: dynamic_segment_pack_memory_handles()
Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_memh_pack()
loop Exchange memory handle sizes
Dynamic_Segment->>Service_Coll: dynamic_segment_calculate_sizes_start()
Service_Coll->>Service_Coll: ucc_service_allgather(sizes)
Dynamic_Segment->>Service_Coll: dynamic_segment_calculate_sizes_test()
Service_Coll-->>Dynamic_Segment: Sizes exchanged
end
Dynamic_Segment->>Dynamic_Segment: dynamic_segment_allocate_buffers()
loop Exchange packed memory handles
Dynamic_Segment->>Service_Coll: dynamic_segment_pack_and_exchange_data_start()
Service_Coll->>Service_Coll: ucc_service_allgather(packed_handles)
Dynamic_Segment->>Service_Coll: dynamic_segment_pack_and_exchange_data_test()
Service_Coll-->>Dynamic_Segment: Handles exchanged
end
Dynamic_Segment->>Dynamic_Segment: dynamic_segment_import_memory_handles()
Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_map(IMPORT)
Dynamic_Segment-->>TL_UCP_Coll: Memory handles ready
end
TL_UCP_Coll->>TL_UCP_Coll: Enqueue to progress queue
loop Progress collective
User->>UCC_Core: ucc_context_progress()
UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_get/put_progress()
alt Dynamic segment check
TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_test_dynamic_segment()
Dynamic_Segment-->>TL_UCP_Coll: Exchange complete/in-progress
end
loop Post onesided operations
TL_UCP_Coll->>UCP_Worker: ucc_tl_ucp_get_nb/put_nb()
UCP_Worker->>UCP_Worker: ucp_get_nbx/ucp_put_nbx()
TL_UCP_Coll->>TL_UCP_Coll: alltoall_onesided_handle_completion()
end
TL_UCP_Coll->>TL_UCP_Coll: alltoall_onesided_wait_completion()
UCP_Worker-->>TL_UCP_Coll: Operations complete
end
TL_UCP_Coll-->>UCC_Core: UCC_OK
User->>UCC_Core: ucc_collective_finalize(task)
UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_finalize()
alt Dynamic segments used
TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_finalize()
Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_unmap(IMPORT, global)
Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_unmap(EXPORT, local)
Dynamic_Segment->>Dynamic_Segment: Free buffers
Dynamic_Segment-->>TL_UCP_Coll: Cleanup complete
end
TL_UCP_Coll-->>UCC_Core: Finalized
16 files reviewed, 18 comments
| for (auto i = 0; i < ctxs.size(); i++) { | ||
| ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER; | ||
| ctxs[i]->args->flags = 0; // No special flags for dynamic segments | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: overriding the mask and flags set by data_init creates inconsistent state. The function already set these based on coll_mask and coll_flags members (lines 38, 60-62). Consider setting coll_mask and coll_flags members before calling data_init instead. Should the test initialize coll_mask and coll_flags before calling data_init, or is there a specific reason to override after initialization?
| data_init(size, dtype, count, ctxs, team, false); | ||
| for (auto i = 0; i < ctxs.size(); i++) { | ||
| ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER; | ||
| ctxs[i]->args->flags = 0; // No special flags for dynamic segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: setting flags = 0 removes the UCC_COLL_ARGS_FLAG_MEM_MAPPED_BUFFERS flag that was set on line 62 for onesided operations. This may break the onesided algorithm's assumptions.
| /* Free the TL data structure */ | ||
| if (data) { | ||
| ucc_free(data); | ||
| memh->tl_data = NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: data is already freed in both branches above (export at line 805, import at line 820), so this check will always access freed memory. This is a use-after-free bug that could cause crashes.
src/components/tl/ucp/tl_ucp_coll.h
Outdated
| return UCC_OK; | ||
| } | ||
|
|
||
| if (task->dynamic_segments.exchange_step < 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: magic number 5 is not self-documenting; define a constant like UCC_TL_UCP_DYN_SEG_EXCHANGE_STEPS to clarify the intended number of exchange phases
| ucc_status_t ucc_tl_ucp_coll_dynamic_segment_init(ucc_coll_args_t *coll_args, | ||
| ucc_tl_ucp_alltoall_onesided_alg_t alg, | ||
| ucc_tl_ucp_task_t *task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: parameter alg is typed as ucc_tl_ucp_alltoall_onesided_alg_t but dynamic segments may be used by other collective types (alltoallv, etc.); consider a more generic type or documenting the restriction. Is the alltoall-specific alg type intentional, or should this accept a generic algorithm enum?
| if (task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG) { | ||
| status = ucc_tl_ucp_test_dynamic_segment(task); | ||
| if (status == UCC_INPROGRESS) { | ||
| return; | ||
| } | ||
| if (UCC_OK != status) { | ||
| task->super.status = status; | ||
| tl_error(UCC_TL_TEAM_LIB(team), | ||
| "failed to exchange dynamic segments"); | ||
| return; | ||
| } | ||
| src_memh = task->dynamic_segments.dst_local; | ||
| dst_memh = (ucc_mem_map_mem_h *)task->dynamic_segments.src_global; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: memory handle swap is missing for GET algorithm when using dynamic segments. The original code (lines 95-98) swaps src_memh and dst_memh for GET operations because remote reads need the destination buffer's rkey. This swap is not applied when dynamic segments are used, which will likely cause incorrect remote address resolution.
| if (task->super.status != UCC_INPROGRESS && | ||
| (task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG)) { | ||
| task->super.status = ucc_tl_ucp_coll_dynamic_segment_finalize(task); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: dynamic segment finalization happens after completion/error but the finalize call's error status overwrites the original task status. If the task completed successfully (UCC_OK) but finalization fails, the error is propagated; however if the task already failed, the finalization error replaces it, losing the original failure reason. Should finalization errors be logged separately while preserving the original task failure status?
| if (task->super.status != UCC_INPROGRESS && | ||
| (task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG)) { | ||
| task->super.status = ucc_tl_ucp_coll_dynamic_segment_finalize(task); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: same issue as GET progress: finalization status overwrites task status
| if (task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG) { | ||
| status = ucc_tl_ucp_coll_dynamic_segment_exchange_nb(task); | ||
| if (status == UCC_INPROGRESS) { | ||
| return ucc_progress_queue_enqueue(UCC_TL_CORE_CTX(team)->pq, &task->super); | ||
| } | ||
| if (UCC_OK != status) { | ||
| task->super.status = status; | ||
| tl_error(UCC_TL_TEAM_LIB(team), | ||
| "failed to exchange dynamic segments"); | ||
| return task->super.status; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: when dynamic segment exchange completes synchronously (returns UCC_OK), the function continues to line 223 and enqueues the task. However, the progress functions (GET/PUT) will call ucc_tl_ucp_test_dynamic_segment again at lines 106 and 164, which may not handle the already-completed case correctly. Does ucc_tl_ucp_test_dynamic_segment return UCC_OK immediately when called again after successful completion?
| status = ucc_tl_ucp_coll_dynamic_segment_init(&coll_args->args, alg, task); | ||
| if (UCC_OK != status) { | ||
| tl_error(UCC_TL_TEAM_LIB(tl_team), | ||
| "failed to initialize dynamic segments"); | ||
| goto out; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: if dynamic segment initialization fails, the code jumps to out: but the task object has already been allocated (line296) and is never cleaned up, causing a memory leak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds dynamic segment infrastructure for onesided collectives, enabling implicit memory handle creation and exchange when buffers aren't pre-mapped. Three new interfaces (init, exchange_nb, finalize) manage the lifecycle of memory registration, handle exchange via service collectives, and cleanup.
Key Changes:
- New
tl_ucp_coll.cfunctions (771 lines) implement multi-step exchange protocol with state machine alltoall_onesided.cintegrates dynamic segments with conditional memory handle usagealltoallv_onesided.cadds validation but doesn't support dynamic segments yet (not supported per line 185)- Task structure extended with
dynamic_segmentsfield tracking exchange state and buffers
Critical Issues Found:
- Heap corruption bug: Lines 574 and 606 in
tl_ucp_coll.cincorrectly callucc_free()on pointers that point into a contiguousglobal_buffer, not separately allocated objects - Memory leak: Line 213 fails to free
src_memh->tl_h->tl_datawhen dst mapping fails - Dead code: Line 559 checks
i == UCC_RANK_INVALIDinside a loop where this can never be true
Other Concerns:
- Finalization errors overwrite original task status (alltoall lines 141, 198)
- No validation that handles are global when provided by user in some paths
Confidence Score: 1/5
- Critical heap corruption and memory leak bugs make this PR unsafe to merge without fixes
- Multiple critical memory management bugs will cause crashes or corruption: freeing pointers into contiguous buffer allocation (lines 574, 606) and leaking allocated data on error path (line 213). These are deterministic bugs that will trigger in error scenarios
src/components/tl/ucp/tl_ucp_coll.crequires immediate attention - contains heap corruption bug at lines 574 and 606, plus memory leak at line 213. Must be fixed before merge
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| src/components/tl/ucp/tl_ucp_coll.c | 1/5 | added 771 lines for dynamic segment infrastructure. Critical bugs: heap corruption from freeing pointers into contiguous buffer (lines 574, 606), memory leak when dst mapping fails (line 213) |
| src/components/tl/ucp/alltoall/alltoall_onesided.c | 3/5 | integrated dynamic segment support for alltoall. Memory handle swap logic preserved for GET but applied conditionally. Finalization errors may overwrite original task status |
| src/components/tl/ucp/alltoallv/alltoallv_onesided.c | 4/5 | minor formatting and validation improvements. Added check for global dst memory handle flag. No dynamic segment integration yet (alltoallv not supported per line 185) |
| src/components/tl/ucp/tl_ucp_coll.h | 4/5 | added enum for exchange steps, three new function declarations for dynamic segment lifecycle. Clean interface design with test helper function |
| src/components/tl/ucp/tl_ucp_task.h | 4/5 | added UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG flag and dynamic_segments struct with exchange state, buffers, and service collective requests. Well-structured additions |
Sequence Diagram
sequenceDiagram
participant App as Application
participant Alg as Alltoall Algorithm
participant Init as dynamic_segment_init
participant Exch as dynamic_segment_exchange_nb
participant Fin as dynamic_segment_finalize
participant Svc as Service Collective
App->>Alg: ucc_tl_ucp_alltoall_onesided_init
Alg->>Init: ucc_tl_ucp_coll_dynamic_segment_init
Note over Init: Check if buffers already mapped
alt Buffers not mapped
Init->>Init: map src_memh via ucc_tl_ucp_mem_map
Init->>Init: map dst_memh via ucc_tl_ucp_mem_map
Init->>Alg: Set USE_DYN_SEG flag
else Buffers pre-mapped
Init->>Alg: Return UCC_OK (no-op)
end
App->>Alg: ucc_tl_ucp_alltoall_onesided_start
Alg->>Exch: ucc_tl_ucp_coll_dynamic_segment_exchange_nb
Note over Exch: STEP_INIT: Pack memory handles
Exch->>Exch: dynamic_segment_pack_memory_handles
Exch->>Alg: Return UCC_INPROGRESS
App->>Alg: progress (via task queue)
Alg->>Exch: ucc_tl_ucp_test_dynamic_segment
Note over Exch: STEP_SIZE_TEST: Exchange sizes
Exch->>Svc: Start allgather for pack sizes
Exch->>Alg: Return UCC_INPROGRESS
App->>Alg: progress
Alg->>Exch: continue exchange
Exch->>Svc: Test allgather completion
Svc-->>Exch: Complete with global sizes
Note over Exch: STEP_DATA_ALLOC: Allocate buffers
Exch->>Exch: dynamic_segment_allocate_buffers
Note over Exch: STEP_DATA_START: Exchange handles
Exch->>Svc: Start allgather for packed handles
Exch->>Alg: Return UCC_INPROGRESS
App->>Alg: progress
Alg->>Exch: continue exchange
Exch->>Svc: Test allgather completion
Svc-->>Exch: Complete with all handles
Note over Exch: STEP_DATA_TEST: Import handles
Exch->>Exch: dynamic_segment_import_memory_handles
loop For each rank
Exch->>Exch: ucc_tl_ucp_mem_map(IMPORT)
end
Exch->>Exch: Set exchange_step = COMPLETE
Exch->>Alg: Return UCC_OK
Note over Alg: Execute onesided operations
Alg->>Alg: ucc_tl_ucp_get_nb / ucc_tl_ucp_put_nb
App->>Alg: progress until complete
App->>Alg: Task completion/finalize
Alg->>Fin: ucc_tl_ucp_coll_dynamic_segment_finalize
loop For each global handle
Fin->>Fin: ucc_tl_ucp_mem_unmap(IMPORT)
end
Fin->>Fin: unmap src_local and dst_local
Fin->>Alg: Return UCC_OK
9 files reviewed, 4 comments
| global[i] = | ||
| (ucc_mem_map_memh_t *)PTR_OFFSET( | ||
| args->task->dynamic_segments.global_buffer, offset); | ||
| global[i]->tl_h = | ||
| ucc_calloc(1, sizeof(ucc_mem_map_tl_t), "global tl_h"); | ||
| if (!global[i]->tl_h) { | ||
| tl_error(UCC_TASK_LIB(args->task), | ||
| "failed to allocate global tl handles"); | ||
| status = UCC_ERR_NO_MEMORY; | ||
| ucc_free(global[i]); | ||
| goto out; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: global[i] points to an offset within the contiguous global_buffer allocation (line 567), not a separately allocated object. Calling ucc_free(global[i]) at line 574 will corrupt the heap
| global[i] = | |
| (ucc_mem_map_memh_t *)PTR_OFFSET( | |
| args->task->dynamic_segments.global_buffer, offset); | |
| global[i]->tl_h = | |
| ucc_calloc(1, sizeof(ucc_mem_map_tl_t), "global tl_h"); | |
| if (!global[i]->tl_h) { | |
| tl_error(UCC_TASK_LIB(args->task), | |
| "failed to allocate global tl handles"); | |
| status = UCC_ERR_NO_MEMORY; | |
| ucc_free(global[i]); | |
| goto out; | |
| } | |
| global[i]->tl_h = | |
| ucc_calloc(1, sizeof(ucc_mem_map_tl_t), "global tl_h"); | |
| if (!global[i]->tl_h) { | |
| tl_error(UCC_TASK_LIB(args->task), | |
| "failed to allocate global tl handles"); | |
| status = UCC_ERR_NO_MEMORY; | |
| goto out; | |
| } |
| for (j = 0; j < i; j++) { | ||
| if (global[j]->tl_h) { | ||
| /* we need to unmap */ | ||
| ucc_tl_ucp_mem_unmap(&ctx->super.super, UCC_MEM_MAP_MODE_IMPORT, | ||
| global[j]->tl_h); | ||
| } | ||
| ucc_free(global[j]->tl_h); | ||
| ucc_free(global[j]); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: global[j] points into global_buffer and should not be freed. Remove ucc_free(global[j]) at line 606 - only the array of pointers and individual tl_h allocations need cleanup
| for (j = 0; j < i; j++) { | |
| if (global[j]->tl_h) { | |
| /* we need to unmap */ | |
| ucc_tl_ucp_mem_unmap(&ctx->super.super, UCC_MEM_MAP_MODE_IMPORT, | |
| global[j]->tl_h); | |
| } | |
| ucc_free(global[j]->tl_h); | |
| ucc_free(global[j]); | |
| } | |
| for (j = 0; j < i; j++) { | |
| if (global[j]->tl_h) { | |
| /* we need to unmap */ | |
| ucc_tl_ucp_mem_unmap(&ctx->super.super, UCC_MEM_MAP_MODE_IMPORT, | |
| global[j]->tl_h); | |
| } | |
| ucc_free(global[j]->tl_h); | |
| } |
| if (i == UCC_RANK_INVALID) { | ||
| tl_error(UCC_TASK_LIB(args->task), | ||
| "invalid team index for context rank %d", i); | ||
| status = UCC_ERR_INVALID_PARAM; | ||
| goto out; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: i iterates from 0 to UCC_TL_TEAM_SIZE(tl_team) (line 553), so it can never equal UCC_RANK_INVALID. This check will never trigger. Remove dead code or clarify intent
| status = dynamic_segment_map_memh(&src_memh, coll_args, IS_SRC, task); | ||
| if (UCC_OK != status) { | ||
| return status; | ||
| } | ||
| status = dynamic_segment_map_memh(&dst_memh, coll_args, IS_DST, task); | ||
| if (UCC_OK != status) { | ||
| status = ucc_tl_ucp_mem_unmap(&ctx->super.super, UCC_MEM_MAP_MODE_EXPORT, | ||
| src_memh->tl_h); | ||
| if (status != UCC_OK) { | ||
| tl_error(UCC_TASK_LIB(task), "failed to unmap src memory handle"); | ||
| } | ||
| ucc_free(src_memh->tl_h); | ||
| ucc_free(src_memh); | ||
| return status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: if dst_memh mapping fails, the allocated src_memh->tl_h->tl_data from line 157 is leaked. Need to free src_memh->tl_h->tl_data before freeing src_memh->tl_h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory should be freed in the call to ucc_tl_ucp_mem_unmap.
What
This PR adds support in TL/UCP for implicit memory handle creation for onesided algorithms by the introduction of three internal interfaces: ucc_tl_ucp_dynamic_segment_{init | exchange | finalize}. The interfaces initialize a dynamic or implicit segment by registering the source and destination buffers at initialization time to create memory handles, exchanging memory handles during collective start, and finalizing (unmapping) memory on collective completion. These interfaces can be leveraged or ignored by existing or future algorithms. In the case that the user allocates and maps memory either through (1) context creation or (2) memory handles, then the interfaces do not perform any action. This is an update to PR #909.
Why ?
Even with the inclusion of PR #1070, the mapping of memory for message passing programming models such as MPI would require modification to current implementations. This is useful in situations where the messages exchanged in the collective are large.