Fix try submit issue in sequential task scheduler#9066
Conversation
|
cc @prathyushpv as you are working on sequential processor as well. |
| case <-lockCh: | ||
| defer s.trySubmitLock.Unlock() | ||
| case <-time.After(trySubmitLockTimeout): | ||
| return false |
There was a problem hiding this comment.
How is trySubmitLock.Unlock() called if case <-time.After(trySubmitLockTimeout): is executed and returned immediately?
There was a problem hiding this comment.
Yes. The time after will use a new goroutine and a race condition is possible. Let me fix it
| select { | ||
| case <-lockCh: | ||
| // Lock acquired, proceed with submission | ||
| case <-time.After(trySubmitLockTimeout): |
There was a problem hiding this comment.
looks like this could be memory leak under load. This timer will not be garbage collected until it fires. It maybe better to define time above and stop it in a defer statement.
timer := time.NewTimer(trySubmitLockTimeout)
defer timer.Stop()
There was a problem hiding this comment.
As of Go 1.23, the garbage collector can recover unreferenced,
unstopped timers. There is no reason to prefer NewTimer when After will do.
As mentioned in the function doc, This try submit implementation is not suitable for high throughput
| // TrySubmit use mu locking to make it thread safe which has higher latency and not suitable for high throughput | ||
| func (s *SequentialScheduler[T]) TrySubmit(task T) bool { | ||
| // Try to acquire lock with timeout to prevent concurrent TrySubmit race condition | ||
| lockCh := make(chan struct{}, 1) |
There was a problem hiding this comment.
Why did we decide to go with this approach instead of calling TryLock() function to serialize calls?
There was a problem hiding this comment.
The current try lock could reject a request even if the channel is not full. here I try to do is a trylock with a timeout.
What changed?
Fix try submit issue in sequential task scheduler
Why?
With concurrent calls to TrySubmit in sequential task scheduler, there is an issue a task can be delay scheduled. This changes limit the concurrency to provide correctness atm.
How did you test it?
Potential risks
No risk as there is no caller atm