Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retries should have a built-in maximum number of tries or maximum duration of trying. #39

Open
DevBrent opened this issue Jul 25, 2023 · 1 comment

Comments

@DevBrent
Copy link

DevBrent commented Jul 25, 2023

Per redis recommendations within "redlock", locks should implement a maximum number of tries and use exponential fallback algorithms as a best practice. Ignoring that redis-lock retries every 50ms regardless of how long it's been trying (even hours), we've documented a failure case with abandoned user http requests that causes stacking of redis-locks sequentially for each failed request for hours.

Consider a pattern as follows:

function retrieveLockOrGiveUp(key, maxWaitMs) {
  return new Promise((resolve, reject) => {
    const maxWaitTimer = setTimeout(() => reject(new Error('Could not fetch lock in reasonable amount of time')), maxWaitMs); 
    const release = await redisLock('jobABC', 5000);
    clearTimeout(maxWaitTimer);
    resolve(release)
  });
}

or consider alternatively that an HTTP call will typically be abandoned by a user if it takes longer than 15 seconds.

What we directly observed were thousands of queued locks being held for 5000ms each because none of them could catch the lock and redis-lock never gives up nor does it have an AbortController / AbortSignal type concept to cancel the retry loop.

Even without the pattern above, a typical HTTP request being guarded by redis-lock is bound to encounter this same issue where perhaps hundreds or thousands of locks will be fetched sequentially despite all of the requesting users having lost their socket connections perhaps hours ago.

Due to an edge case being introduced that caused an excess of backlogged locks to be attempted and failed, we ended up with several machines fully saturated with infinite retry loops due to there being no upper bound on the retry logic here.

I suggest a few new options be added or at least those who are using redis-lock be weary of high-contention environments and consider the use of redlock if you are guarding http requests on long-lived servers.

@rakeshpai
Copy link
Member

Hi @DevBrent Thanks for the details, understood the issue. I'll be happy to merge a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants