-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fixed sqlserver not actually getting a lock if lock is already set #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dhui Any chance you take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@urbim Thanks for the PR and for the detailed write up. Could you also rebase your PR to get the latest supported Go versions?
From the documentation the Update should be used when updating resources, which is not the case here. The documentation is not entirely clear to me, but from what I can tell, this is the same as Exclusive with added mechanism, that prevents deadlocks from read-lock-update pattern. I think the Exclusive is the right choice here, so the PR also changes that.
Thanks for the investigation. I agree.
Since we are acquiring lock outside the transaction, the Session is the correct option to use. I actually tried using the Transaction value as the LockOwner and the result was -999, which "Indicates a parameter validation or other call error".
👍
I also set the timeout of the sp_getapplock to infinite, similar to what is used in the postgres implementation. Since migrations can take a long time (e.g. updating an index on a large database), I think that the only good options for timeout are either infinite or somehow configurable by the user. Since the second option is probably not feasible in the current architecture of this library, I opted for the first one.
I thought about this for a bit and think that changing the lock behavior to wait indefinitely makes sense. e.g. most users will want to block and wait for any pending migrations to finish running. It helps that the postgres driver has already made this decisions.
#1123 is already open addressing this timeout, but it would still incorrectly obtain the lock if another instance was holding the lock for more than 10 seconds.
I agree with you
@@ -33,7 +33,7 @@ var ( | |||
ErrMultipleAuthOptionsPassed = fmt.Errorf("both password and useMsi=true were passed.") | |||
) | |||
|
|||
var lockErrorMap = map[mssql.ReturnStatus]string{ | |||
var lockErrorMap = map[int]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you continue to use mssql.ReturnStatus
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Sorry for the late reply!
I think int
is the correct choice here, since we are selecting an int
result (DECLARE @lockResult int;
).
Note that in the previous version, this map was never used, since status
variable was always 0
(this statement does not actually write anything to status
variable: ss.conn.ExecContext(context.Background(), query, aid, &status)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, you're right. The docs mention ReturnStatus
being used for PROC
s
94afe47
to
6b9f24b
Compare
@dhui Thanks for the review. I rebased the PR and added a comment to your review. |
This PR fixes the SQL Server not actually getting a lock if lock is already set (e.g. by another instance).
As mentioned in this comment by @dhui, the SQL statement for
sp_getapplock
is not correct.I tested the locking with this script and added some debugging logging as seen in this commit (debug log is not actually part of this PR, I only used it for testing).
The result before applying the fixes was:
The first two lines are from
d1, err := p.Open(connectionString)
. Thend1
acquires the lock (line 3). Then_, err = p.Open(connectionString) // d2
should fail when trying to acquire the second lock, but it does not - the code succeeds but the lock is actually not acquired (line 5). The code actually fails whend2
tries to unlock - it can't because it hasn't acquired the lock in the first place (line 6+).This is exactly right, the SELECT statement is missing and this PR fixes this, along with some other improvements to the locking:
From the documentation the
Update
should be used when updating resources, which is not the case here. The documentation is not entirely clear to me, but from what I can tell, this is the same asExclusive
with added mechanism, that prevents deadlocks from read-lock-update pattern. I think theExclusive
is the right choice here, so the PR also changes that.From the documentation:
Since we are acquiring lock outside the transaction, the
Session
is the correct option to use. I actually tried using theTransaction
value as the LockOwner and the result was-999
, which "Indicates a parameter validation or other call error".I also set the timeout of the
sp_getapplock
to infinite, similar to what is used in the postgres implementation. Since migrations can take a long time (e.g. updating an index on a large database), I think that the only good options for timeout are either infinite or somehow configurable by the user. Since the second option is probably not feasible in the current architecture of this library, I opted for the first one. One PR is already open addressing this timeout, but it would still incorrectly obtain the lock if another instance was holding the lock for more than 10 seconds.This PR should close #1123 and #253.
I didn't manage to get the tests working locally, but I'm hoping the tests will pass in CI.
Lastly, the result of the above script with fixes applied (and non-infinite timeout for testing):
When
d2
tries to acquire the lock, it fails with-1: The lock request timed out
after timeout as expected.