Skip to content

Fix try submit issue in sequential task scheduler#9066

Merged
yux0 merged 4 commits intotemporalio:mainfrom
yux0:yx/fix-try-submit
Jan 30, 2026
Merged

Fix try submit issue in sequential task scheduler#9066
yux0 merged 4 commits intotemporalio:mainfrom
yux0:yx/fix-try-submit

Conversation

@yux0
Copy link
Contributor

@yux0 yux0 commented Jan 16, 2026

What changed?

Fix try submit issue in sequential task scheduler

Why?

With concurrent calls to TrySubmit in sequential task scheduler, there is an issue a task can be delay scheduled. This changes limit the concurrency to provide correctness atm.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

No risk as there is no caller atm

@yux0 yux0 marked this pull request as ready for review January 16, 2026 20:41
@yux0 yux0 requested review from a team as code owners January 16, 2026 20:41
@yux0 yux0 added the teams/cgs label Jan 16, 2026
@yycptt
Copy link
Member

yycptt commented Jan 16, 2026

cc @prathyushpv as you are working on sequential processor as well.

case <-lockCh:
defer s.trySubmitLock.Unlock()
case <-time.After(trySubmitLockTimeout):
return false
Copy link
Contributor

@prathyushpv prathyushpv Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is trySubmitLock.Unlock() called if case <-time.After(trySubmitLockTimeout): is executed and returned immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The time after will use a new goroutine and a race condition is possible. Let me fix it

select {
case <-lockCh:
// Lock acquired, proceed with submission
case <-time.After(trySubmitLockTimeout):
Copy link
Contributor

@prathyushpv prathyushpv Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this could be memory leak under load. This timer will not be garbage collected until it fires. It maybe better to define time above and stop it in a defer statement.

timer := time.NewTimer(trySubmitLockTimeout)
defer timer.Stop()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of Go 1.23, the garbage collector can recover unreferenced,
unstopped timers. There is no reason to prefer NewTimer when After will do.

As mentioned in the function doc, This try submit implementation is not suitable for high throughput

// TrySubmit use mu locking to make it thread safe which has higher latency and not suitable for high throughput
func (s *SequentialScheduler[T]) TrySubmit(task T) bool {
// Try to acquire lock with timeout to prevent concurrent TrySubmit race condition
lockCh := make(chan struct{}, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we decide to go with this approach instead of calling TryLock() function to serialize calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current try lock could reject a request even if the channel is not full. here I try to do is a trylock with a timeout.

@yux0 yux0 merged commit 23be5b2 into temporalio:main Jan 30, 2026
66 checks passed
@yux0 yux0 deleted the yx/fix-try-submit branch January 30, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants