-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[EventPipe] Remove thread lock and cleanup ThreadSessionState lifecycle #118415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EventPipe] Remove thread lock and cleanup ThreadSessionState lifecycle #118415
Conversation
EventPipe buffer_manager code had this complex synchronization scheme where some state was protected by a thread lock and other state was protected by the buffer_manager lock. Removing the lock introduces a little extra lock-free behavior, but overall I think it will be a simplification. 1. Previously the thread list in the buffer manager and the thread->session_state slots were updated under separate locks at different times. It was also possible that a TSS (ThreadSessionState) would never get added to the list. The function ep_session_remove_dangling_session_states was designed as a bandaid to clean that up, but it still didn't handle the case where the thread exits prior to the session being disabled which appears to permanently leak the TSS. Now the session_state slots and buffer_manager thread list are always updated together. ep_buffer_manager_deallocate_buffers will clean up all all thread session state at session disable time at the latest. 2. Previously buffer_lists had different lifetimes than TSS and they were getting leaked when ep_buffer_manager_write_all_buffers_to_file_v4 did cleanup. Now buffer_list is allocated and freed at the same as TSS so it doesn't require separate explicit cleanup. 3. Previously TSS->write_buffer was set under a different lock than buffer_list->tail_buffer which created a window where tail_buffer was the only one updated. Now they are updated atomically and that window no longer exists. This simplifies try_convert_buffer_to_read_only because there is no longer a possibility of failure. 4. We've been interested in cleaning up the TSS objects more eagerly to prevent unbounded leaks on long-running sessions. Doing this under the existing two-lock scheme would have been aggravating because we need to test if the buffers are empty under the buffer_manager lock, then do the session_state slot clear and TSS deletion under the thread lock. The locks didn't support being taken at the same time and as soon as one thread exited the buffer_manager_lock it was possible to race with another thread wanting to do the same cleanup. Now that there is only one lock the cleanup can happen atomically without needing to add yet another synchronization mechanism. 5. I removed buffer_manager_suspend_event rather than adjust its locking scheme. The method didn't do what its name implied, instead it converted writeable buffers to read-only buffers without suspending anything. There is no need to convert those buffers eagerly because the existing event reading code will automatically convert them on-demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR eliminates EventPipe's complex dual-lock synchronization scheme by removing the per-thread lock and implementing proper ThreadSessionState lifecycle cleanup with optimized SequencePoint tracking for better resource management.
Key changes:
- Removes EventPipe thread locks, replacing dual-lock scheme with single buffer_manager lock
- Implements proper ThreadSessionState cleanup to prevent memory leaks in long-running sessions
- Adds EventPipeThreadSequencePointTrackState enum for optimized SequencePoint tracking
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/eventpipe/ep.c | Adds session type check for process info logging |
| src/native/eventpipe/ep-types.h | Adds thread ID container type definitions |
| src/native/eventpipe/ep-types-forward.h | Defines EventPipeThreadSequencePointTrackState enum for thread lifecycle tracking |
| src/native/eventpipe/ep-thread.h | Updates thread synchronization model and session state management |
| src/native/eventpipe/ep-thread.c | Implements new thread session state lifecycle and removes thread locks |
| src/native/eventpipe/ep-session.h | Adds session_type getter |
| src/native/eventpipe/ep-session.c | Removes dangling session state cleanup logic |
| src/native/eventpipe/ep-event-instance.c | Removes thread reference counting from sequence points |
| src/native/eventpipe/ep-buffer.h | Updates lock requirements documentation |
| src/native/eventpipe/ep-buffer.c | Updates buffer conversion synchronization |
| src/native/eventpipe/ep-buffer-manager.h | Updates buffer manager structure and removes suspend functionality |
| src/native/eventpipe/ep-buffer-manager.c | Major refactoring of buffer management with unified lock model |
| src/native/eventpipe/ep-block.c | Updates sequence point block initialization for new thread ID handling |
Comments suppressed due to low confidence (1)
src/native/eventpipe/ep-buffer-manager.c:577
- The buffer_manager_requires_lock_held assertion is checking the wrong lock. This function is called from buffer_manager_allocate_buffer_for_thread which holds the buffer manager lock, but the assertion appears to be in a context where the lock may not be held.
{
d184ddb to
323d1fe
Compare
|
By moving from two locks to one, did you analyze any potential additional lock contention and its effect on EventPipe performance? Buffer manager lock is a central lock for a session, meaning that a lot of parallel activities will queue up on this lock more frequently due to increase need to lock buffer manager. Just looking at the buffer_manager_advance_to_non_empty_buffer we reach for the buffer manager lock multiple times where the old code only took buffer manager lock when a buffer needed to be deleted after been fully consumed. I know there has been a lot of optimizations that taken place over the years to reduce the contention on buffer manager lock to improve performance in EventPipe. Just wanted to make sure we don't introduce any major performance regression by increasing the use of buffer manager lock in some critical code paths. |
If the writer thread isn't currently writing an event, we should still convert the buffer to read-only
- Set thread writing_event_in_progress flag for rundown ep_buffer_manager_write_event now asserts that the thread's writing_event_in_progress flag is set. - Remove buffer free state assertion When sessions are disabled or abruptly disconnected, not all buffers may have been converted to readonly. log_process_info_event may never be read by the in-proc listener session and ep_buffer_alloc failure are both examples of when a buffer would be freed yet not have been converted to read-only. Now that converting buffers to readonly yields for in-progress event writing to complete, we shouldn't be deleting a buffer that a writer thread is writing into. - Allow thread to set a NULL session state As part of thread_session_state removal logic, a NULL pointer can be written int oa thread's session_state slot. - Allow reader thread to check a thread session state's write_buffer As part of converting buffers to readonly, the reader thread yields for writer threads to finish writing their event. So the assertion that only writing threads will access a thread session state's write buffer no longer holds.
628668f to
aa19e3b
Compare
session_state was switched to volatile in an earlier commit to remove the thread_lock
- Remove unnecessary buffer_manager field from ThreadSessionState The buffer_manager can be obtained through the session field - Remove unnecessary thread_holder field from BufferList The owning ThreadSessionState already holds a reference to the thread. Now that the BufferList's lifetime is tied to the ThreadSessionState, the BufferList's thread_holder is no longer needed. - Reuse ep_buffer_get_volatile_state over manual state load
When the buffer_manager iterates over events across threads, it tracks the current EventPipeEventInstance, EventPipeBuffer, and EventPipeBufferList. In preparation to remove ThreadSessionStates from the buffer_manager's thread_session_state_list, tracking the current ThreadSessionState instead of the current EventPipeBufferList will allow for direct removal, as they are 1:1. ep_buffer_manager_write_all_buffers_to_file_v4 adjusts sequence points tss:sequence_number mappings whenever events have been read from the tss. Instead of having the tss' underlying buffer_list track the last read sequence point, move the counter to the tss itself.
In preparation to cleanup ThreadSessionStates, adjust the SequencePoint map of Thread sequence numbers to map the EventPipeThread directly instead of the ThreadSessionState.
Event processing may be invoked despite no events being available for reading. e.g. event flushing when a session is being disabled Iterating over an empty snapshot list results in the same behavior.
noahfalk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lateralusX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly looked at the threading scenarios around the use of the new locking patterns + dependencies around waiting on writer threads to finish up work before terminating sessions, or cleaning up thread state. Made some comments around that, but nothing popped up as a real issue after code reviewing it. It needs a lot of milage over time to build full confidence, but what we have at least appears to be correct.
I didn't look into any depth around the changes around sequence point handling and ordering, but I assume we have test coverage that would at least trigger errors in-case sequence point handling regressed as part of this change.
I made some smaller comments. Great work!
Add session disabled assert as part of ep_session_suspend_write_event's thread yield loop as it assumes no new session inherits the index. Remove buffer_manager arg to convert_to_read_only as the buffer is a standalone entity. Instead, callers should enforce the lock is held.
|
Although the Test_BadEventSource_MismatchedIds_WithEtwListener had a different error exception, it had the same callstack and build analysis handled it (I was originally surprised it matched an issue even though the final error message was different). It turns out the error message was different because another PR updated the TraceEvent version which has this change microsoft/perfview#2177. |
Fixes #111368
This PR removes EventPipe's complex dual-lock synchronization scheme and integrates ThreadSessionState cleanup as part of event reading logic.
Thread Lock Removal
ThreadSessionState Lifecycle Cleanup
Slight Event Reading Optimization
Testing
Using the original repro
Performance
Adapted https://github.com/dotnet/diagnostics/tree/main/src/tests/EventPipeStress to also stress the native in-proc event listener EventPipeSession.
Testing a matrix of Session Type X Thread Count X Exceptions/Thread
Session Type: IPC Streaming, Native In-Proc
Thread Count: 1, 8, 16, 50, 100, 1000
Exceptions/Thread: 1, 2, 100, 1000
Note: The local stress test that I put together primarily tests an immediate burst in events being written to EventPipe, which doesn't reflect the behavior of any steady state behavior.
Native InProc
Baseline vs PR

This PR overall performs better in fewer events dropped, especially considering the high thread x high exceptions/thread scenarios.
IPC Streaming
Baseline VS PR

This PR performs similarly, with slight drop (0.8%) in events read for high threads x high exceptions/thread.