Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

known pattern flow #5

Open
royaltm opened this issue Feb 25, 2013 · 30 comments
Open

known pattern flow #5

royaltm opened this issue Feb 25, 2013 · 30 comments

Comments

@royaltm
Copy link

royaltm commented Feb 25, 2013

The pattern from the official SETNX doc used in @kenn's implementation is flawed.
The implementation itself however is really nice and I like it.
Nevertheless my bad habit of testing other people's ideas (especially before implementing them and using on production) pushed me to play a little bit with redis-mutex gem.

Unfortunately the setnx/getset pattern may lead to the "phantom starvation" (what I call it) of the two (or more) concurrent processes (or threads) when both try to enter the critical section at the wrong time.

Here's a simple stress test:

require 'redis-mutex'

REDIS_OPTS = {}

lck = '__STRESSTEST__.lck'
Redis::Classy.db = Redis.new(REDIS_OPTS)
rmutex = Redis::Mutex.new(lck, expire: 5, block:10000, sleep:0.01)
10000.times do |i|
  rmutex.with_lock do
    print "\r %6d" % i
    sleep rand/100
  end
end

Now try to spawn two terminal windows (or create split window in tmux) and run the above script on both of them.
When you start the first one, you'll see counting.
But in a short moment (or even at the beginning) when you run it the second time the counting stops periodically.
The scripts will eventually reach 9999 (if you are patient enough) but mostly sleeping, waiting for lock to expire.

I might suppose @kenn is already aware of that but since there is no BUGS/LIMITATIONS section in README file I wrote this issue for the record.

I also think it is ours (the coders') responsibility to warn others about known issues and limitations especially when you are the keeper of the quite popular rubygem name. Noblesse Oblige (and popularity should too) in the open source crowd. :)

Here are the reconstructed steps behind the vulnerability:

For clarity there are three actors in the example (A, B, C) though C could also be A.

now = 1
A has locked the resource (with expire_at=2)
A tries to unlock, at the same time B tries to lock (with expire_at=3)
A: get == 2 ? YES
B: setnx(3) ? NO
A: del
B: get > now ? NO (get is 0)
C now tries to lock with (expire_at=4)
C: setnx(4) ? YES -> C is the owner now
B: getset(3) <= now ? NO
at the end B is not the owner but has managed to alter the lock

therefore C won't unlock:
C: get == 4 ? NO (get is 3)
so the lock remains staled and all the hungry processes must wait until it expires

@kenn
Copy link
Owner

kenn commented Mar 6, 2013

Excellent catch, @royaltm!

So it seems that if we change this logic:

B: get > now ? NO (get is 0)

to this:

B: get (is nil or) > now ? YES (get is nil)

the problem would be solved?

@nviennot
Copy link
Contributor

nviennot commented Mar 7, 2013

This should solve the issue: #6

@royaltm
Copy link
Author

royaltm commented Mar 7, 2013

Hello!
Your patch solves the issue in this particular example. However it's only a petty trick to workaround the problem. The problem with redis-mutex pattern is that read - check - write operations are not atomic. They are always prone to race conditions.

I've just imagined another example that leads to much more harmfull situation:

now = 2
A has locked the resource but it's key value has already expired (with expire_at=1)
B tries to lock
B: setnx(3) ? NO
B: get > now ? NO (get is 1)
A tries to unlock
A: get == 1 ? YES
B: getset(3) <= now ? YES
A: del

at the end B seems to be the owner of the lock but virtually the key's been deleted by A. In this case B has acquired the lock on premature assumption. It's much worse cause B is now fiddling in the unprotected critical section.

Of course you might still want to go that way adjusting this algorithm forewer hoping that no other race condition pattern emerges.

However if you want redis-mutex to be truly robust IMHO you should either:

  • make sure no condition race is possible using redis optimistic transactions
  • transfer the read - check - write logic to server side scripting (no other command is possible on redis while script is operating)

I would start with changing unlock in the following manner:

WATCH key
GET key == me ?
YES: MULTI
     DEL key
NO:  UNWATCH

Please look here.
This one has well designed locking pattern.

@nviennot
Copy link
Contributor

nviennot commented Mar 7, 2013

The unlock test is indeed racy, it's another issue. The watch solves it yes.
How about a lua script to avoid roundtrips?
https://github.com/crowdtap/promiscuous/blob/master/lib/promiscuous/redis.rb#L132-L151

@nviennot
Copy link
Contributor

nviennot commented Mar 7, 2013

Scrap that, it's racy with the a race on the getset. (@expires_at won't be the same).
I guess I'll go with a lua script for both sides. (I need to take multiple locks at the same time)

@nviennot
Copy link
Contributor

nviennot commented Mar 7, 2013

https://github.com/crowdtap/promiscuous/blob/master/lib/promiscuous/redis.rb#L116-L154
lock/unlock is now a LUA script. Much easier, and probably faster (only one network roundtrip for each operation).

@royaltm
Copy link
Author

royaltm commented Mar 7, 2013

for multiple locks you might take a look at my gem: (EM only) https://github.com/royaltm/redis-em-mutex

@nviennot
Copy link
Contributor

nviennot commented Mar 7, 2013

I don't like EM, I like Celluloid, and I need something thread safe.
Quite honestly, this code is unreadable: https://github.com/royaltm/redis-em-mutex/blob/master/lib/redis/em-mutex/script_handler.rb#L162-L213 way too much complexity for me to verify that this is working as intended.
But thanks :)

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

probably depends on who's reading it :), but no problem

@kenn
Copy link
Owner

kenn commented Mar 8, 2013

Number of projects are still using Redis 2.4 and Lua scripting is not supported with them. I would go with optimistic locking with WATCH at this time. The Lua-based logic would be a nice addition for Redis 2.6, though.

I'm going to change this code in try_lock

return false  if get.to_f > now

to this

return false  if (old_value = get).nil? or old_value.to_f > now

and fix the first case.

Then for unlock, I'll try implementing the WATCH command. How does that sound?

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

That sounds good, but you are still screwed, because you can getset() a lock that you don't own.
And so the unlock check doesn't work on a lock you've recovered while someone was racing with you.

@kenn
Copy link
Owner

kenn commented Mar 8, 2013

The point of this fix is that you never get to getset, as it returns false immediately. In other words, the get checks two things - if the lock is still effective (get.to_f > now) or anyone released the lock in the middle (get.nil?). Am I still missing something?

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

After that, you need getset to acquire an expired lock. If you raced with someone while doing the getset, unlock is broken, because the lock won't contain @expired_at

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

proposition of the locking pattern:

until setnx # loop until succeeded, this is the only place where we can be sure locked key belong to us
  watch do
    if (old_value = get).nil? # try again it's probably free now
      unwatch
      next
    end
    if old_value.to_f > now
      unwatch
      return false # someone else keeps valid lock, leave now
    else # get < now - possibly expired
      # now try to delete expired lock
      return false unless multi {|m| m.del } # someone else has messed with this key, leave now
      # or try again
    end
  end
end
return true

together with the unlock above will solve the thing

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

@royaltm 👍

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

A caveat however, how does watch work when multiple threads are using the same redis connection?

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

RTFS https://github.com/redis/redis-rb/blob/master/lib/redis.rb#L1995 - synchronize to the rescue

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

I did read the fucking sources.

Thread1: watch
Thread2: get XX
Thread1: get YY

Thread1 will have the "get" of thread2 in its watch.
Point is, watch doesn't work well with threads and single redis connection.

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

F stated for Fine ;)
"doesn't work well" why assume? watch with block is secured with monitor, so only one thread will watch and do stuff inside at the same time

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

The synchronization that you get is this one:

Thread1: lock.synchronize { watch }
Thread2: lock.synchronize { get XX }
Thread1: lock.synchronize { get YY }

Thread1 did not expect to have XX watched. That's a problem.

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

nope, I see it differently:
the synchronization is this actually, as yield is inside synchronize block in watch
Thread1: lock.synchronize { watch do blablabla end }
Thread2: lock.synchronize { watch do blablabla end }

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

OHhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh ok.
I misread the sources :D

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

don't misread your sources then :)

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

so 👍 for your solution, I don't see any issues with it.

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

the thing is, if and how watch/multi is supported in Redis::Classy? @kenn ?

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

It's using method missing, so it should be just fine.
I don't like Classy though :)

@royaltm
Copy link
Author

royaltm commented Mar 8, 2013

Every man to his taste.

@nviennot
Copy link
Contributor

nviennot commented Mar 8, 2013

Yes

@mkristian
Copy link

is this still an open issue or is it solved ?

@kenn
Copy link
Owner

kenn commented Dec 18, 2016

I've created a PR for this at #27 - if anyone can review, I'd appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants