Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible out-of-order/inconsistent seqno-to-time mapping #13279

Closed

Conversation

pdillinger
Copy link
Contributor

Summary: The crash test with COERCE_CONTEXT_SWITCH=1 is showing a failure:

db_stress: db/seqno_to_time_mapping.cc:480: bool rocksdb::SeqnoToTimeMapping::Append(rocksdb::SequenceNumber, uint64_t): Assertion `false' failed.

with DBImpl::SetOptions() in the call stack. This assertion and those around it are mostly there for catching systematic problems with recording the mappings, as small imprecisions here and there are not a problem in production. Nevertheless, we need to fix this to maintain the assertions for catching possible future systematic problems.

Because the seqno and time are acquired before holding the DB mutex, there could be a race where T1 acquires latest seqno, T1 acquires latest seqno, T2 acquires unix time, T1 acquires unix time, and entries are not just saved out-of-order, but would represent an inconsistent (time traveling) mapping if they were saved.

We can fix this by getting the seqno and unix times while under the mutex. (Hopefully this is not caused by non-monotonic clock adjustments.)

Test Plan: local run blackbox_crash_test with COERCE_CONTEXT_SWITCH=1. This is not really a production concern, and the conditions are not really reproducible in a unit test after the fix.

Summary: The crash test with COERCE_CONTEXT_SWITCH=1 is showing a
failure:

```
db_stress: db/seqno_to_time_mapping.cc:480: bool rocksdb::SeqnoToTimeMapping::Append(rocksdb::SequenceNumber, uint64_t): Assertion `false' failed.
```

with `DBImpl::SetOptions()` in the call stack. This assertion and those
around it are mostly there for catching systematic problems with
recording the mappings, as small imprecisions here and there are not a
problem in production. Nevertheless, we need to fix this to maintain the
assertions for catching possible future systematic problems.

Because the seqno and time are acquired before holding the DB mutex,
there could be a race where T1 acquires latest seqno, T1 acquires latest
seqno, T2 acquires unix time, T1 acquires unix time, and entries are
not just saved out-of-order, but would represent an inconsistent (time
traveling) mapping if they were saved.

We can fix this by getting the seqno and unix times while under the
mutex. (Hopefully this is not caused by non-monotonic clock adjustments.)

Test Plan: local run blackbox_crash_test with COERCE_CONTEXT_SWITCH=1.
This is not really a production concern, and the conditions are not
really reproducible in a unit test after the fix.
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in b341dc8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants