Skip to content

time_bucketed.py not deadlock safe #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mbaader opened this issue Sep 1, 2021 · 4 comments · May be fixed by #2
Open

time_bucketed.py not deadlock safe #1

mbaader opened this issue Sep 1, 2021 · 4 comments · May be fixed by #2
Assignees

Comments

@mbaader
Copy link

mbaader commented Sep 1, 2021

Hi there,

thank you very much for your time_bucketed.py code. And special thanks to the idea of using some kind of "icons" in the print messages. Didn't know that was possible, but I like it very much.

However, there are some issues that led to two deadlocks within an hour while I tested it:

    bucket_val = r.get(key)
    if bucket_val and int(bucket_val) > 0:
        r.decrby(key, 1)
        return False

If the key expires between GET and DECRBY, you'll generate a new key with value -1 but no associated expire (hence, the key will never expire). And because you use SETNX to create a key, you'll never create a new key with an associated expire, if there already is one without an associated expire :

    if r.setnx(key, limit):
        r.expire(key, int(period.total_seconds()))

Furthermore if other processes already decreased the value of that key to 0 between the GET and DECRBY, you might exceed the rate limit in this timeframe.

So I came up with this solution, that should address both issues:

  #TTL returns -2 if the key does not exist and -1 if the key exists but has no associated expire.
  #Reference: https://redis.io/commands/ttl
  if int(R.ttl(key)) < 0:
      #Creates a key with associated expire and reduces the limit by 1 atomically since we use one token/call the API right after.
      R.setex(key, int(period.total_seconds()), limit-1)
      return False      
  if R.exists(key):
      #DECR creates a key with the value -1 if the key does not yet exist.
      #Checks if this happend or some other process already used a token because there is a race condition.
      #Reference: https://redis.io/commands/decr
      if int(R.decr(key)) >= 0:
          return False
  return True

Regards
Mat

@astagi
Copy link
Owner

astagi commented Sep 27, 2021

Hi @mbaader , sorry for my late response! Your solution looks good, and the problem raised by your tests is something worth analyzing deeper. Just a doubt, could it be possible (and simpler) to delete the key if its value is negative?

    bucket_val = r.get(key)
    if bucket_val and int(bucket_val) > 0:
        val = r.decrby(key, 1)
        if val < 0:
            r.del(key)
        return False

(can't have any chance to test it sorry, not available pc atm)

@astagi astagi self-assigned this Sep 27, 2021
@mbaader
Copy link
Author

mbaader commented Sep 27, 2021

Hi astagi,

yes, that might resolve the deadlock issue. But I'm not sure, if it addresses the race condition correctly. At least in theory, there might be any number of changes between the r.get(key) and the r.decrby(key, 1) so that an unexpected high number of tokens could be granted.

That might have happend in the first part of my above code too, so I modified it to:

  #TTL returns -2 if the key does not exist and -1 if the key has no TTL
  #Reference: https://redis.io/commands/ttl
  if int(R.ttl(key)) < 0:
      #Creates a key and adds an expire atomically.
      R.setex(key, int(period.total_seconds()), limit)
  if R.exists(key):
      #DECR creates a key with the value -1 if the key does not yet exist.
      #Checks if this happend because there is a race condition.
      #Reference: https://redis.io/commands/decr
      if int(R.decr(key)) >= 0:
        return False
  time.sleep(random.random()/limit)
  return True

I've been using this code in production for a few weeks now and haven't had any issues with it yet.
I added time.sleep(random.random()/limit) before the return True statement to avoid an unnecessary high number of database accesses if all tokens are used up.

Regards
Mat

@astagi
Copy link
Owner

astagi commented Oct 26, 2021

That's cool @mbaader! (sorry again for my late response).. could you open a PR for that? I'll update the article accordingly

@mbaader mbaader linked a pull request Oct 27, 2021 that will close this issue
@mbaader
Copy link
Author

mbaader commented Oct 27, 2021

ok @astagi - done.

@astagi astagi linked a pull request Nov 16, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants