fix: avoid deadlock by moving connector refresh outside txn and adding per-token mutex #4312
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This patch moves connector refresh calls outside of the storage transaction and introduces a per-refresh-ID mutex to ensure only one concurrent request per token hits the external IdP. Other concurrent requests wait for the mutex and reuse the updated identity.
What this PR does / why we need it
This is an initial attempt to fix #4209. The gist of the problem is that commit 4b5f1d52 moved the connection to the external IdP to update the refresh token, in the same DB transaction used to update the token in dex's storage. This causes a deadlock when the "external IdP" is still dex (e.g. when using PasswordDB) and the underlying storage sets a maximum number of open connections (e.g. SQLite or PgBouncer)
For a detailed overview of what triggers the issue, please see #4209 (comment)
Closes #4209
Special notes for your reviewer
I'm not sure how to properly test this change or if it's already covered by the existing test suite. If you have any suggestions please let me know.