Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CF mark for Lock and Rollback records #102

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sticnarf
Copy link
Contributor

No description provided.

@sticnarf
Copy link
Contributor Author

I think @ekexium has an extensive analysis about different solutions to the problem. This is what I think the most practical one weighing the benefit and the cost.

Are you happy to share the desensitized document and the comparison sheet? @ekexium

@ekexium
Copy link
Contributor

ekexium commented Sep 29, 2022

Are you happy to share the desensitized document and the comparison sheet? @ekexium

Here you go! https://docs.google.com/document/d/1cg_pAVPnAvOz9CwMihDkregvyN-jl20nSSr75NuU0Is/edit# tikv/sig-transaction#111


In `CheckSecondaryKeys`, we need to check both CFs of all the given secondary keys to know whether some keys are already committed or rolled back.

And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prewrite request with deferred constraint check may also need the precise status of the key, maybe it could be considered as a happy path for committing this kind of transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it... Could you detail the reason a bit more?

It seems to me that prewrite with deferred constraint check needs (1) write conflict check and (2) the latest effective record (PUT/DELETE). Write conflict check only needs the latest record in the write CF. And we don't cost more for reading the latest effective record.

If there is a mark record for this key, prewrite must fail because of a newer record in the write CF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually the deferred prewrite request processing does not need to read the mark CF every time? Seems I misunderstood I thought it needed to check mark CF each time and there would be more overhead than before for this kind of transaction.


If a new `Lock` record is written to the write CF while the latest version is also a `Lock` or `Rollback`, instead of removing the previous version, just overwrite that record and add a `real_commit_ts` to the record. When checking write conflicts, we should parse the value and check the real commit TS because the timestamp encoded into the key may be not accurate.

It may help reduce tombstones but breaks too many assumptions before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no typical user scenarios in which this kind of tombstone issue has a significant impact, the priority for this optimization could be lowered as it's a bit complex to prove the correctness.

text/0000-mark-cf.md Show resolved Hide resolved

And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs.

Luckily, all of these don't happen frequently in production. The extra cost is not a big issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the key format is {user_key}{start_ts} and it's different from keys is write cf, maybe we could describe a bit more details about the conflict check process here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict checking itself only needs to read the write CF for the maximum version. When we need to read the mark CF, we are confirming whether the transaction we are prewriting has been rolled back. So, it's always a point get according to the start_ts on the mark CF.

text/0000-mark-cf.md Show resolved Hide resolved

- Support generating and ingesting snapshot for the mark CF
- Consider the keys and size of the mark CF in the split checker
- After removing the WAL of KV DB, the memtable needs to be flushed if it blocks the GC of the Raft logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the separation for kv db instances, there may be some compatible work.

When BR takes a snapshot of TiKV, all the locks before the snapshot should be resolved. In this case, the records in the mark CF really don't matter.

BR can just ignore the mark CF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit that could be mentioned here is pessimistic locking behavior issues like pingcap/tidb#36438 could be resolved completely and previous tricky solutions could be optimized.


#### Format

Key: `{user_key}{start_ts}`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always read the mark CF by key with a specific timestamp, am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


The records in the mark CF don't need to exist after all keys in the transaction are totally resolved. The client resolves all the locks before a certain timestamp before updating this timestamp as the safe point.

So, when TiKV is ready to do GC, all records in the mark CF whose `commit_ts` is less than the safe point can be deleted. It can be done in the compaction filter.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Rollback records, they do not have commit_ts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally, the commit_ts of Rollback is exactly its start_ts.


We already have a collapsing mechanism to merge consecutive `Rollback` records. And it was invented back to the days when pessimistic transactions are not supported. Now, it's unlikely to have many `Rollback` records that affect read performance.

The `Lock` record is a bit more complicated. At the very beginning, when there is no pessimistic transaction or async-commit transaction, it is only used to check read-write conflicts, mostly for write-skew prevention. In pessimistic transactions, if a key is locked but not changed in the end, the pessimistic lock will be finally turned into a `Lock` record. In these cases, `Lock` records exist to cause write conflicts. If it happens to be the primary key of the transaction, it also marks the committed status. So, if the `Lock` record is only to cause write conflicts, it doesn't need to exist after any newer record is written. However, it is not true for the primary keys.
Copy link

@TonsnakeLin TonsnakeLin Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lock record for optimistic transaction is also used to cause write conflicts, right ?
And, how to understand write-skew prevention?

I executed the optimistic transaction like "begin optimistic;select * from t1 where name = "xxxxx" for update ;commit;". It also generated a Lock record for write CF. So, the LOCK record also as primary key of optimistic transaction ans marks the committed status, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lock record for optimistic transaction is also used to cause write conflicts, right ?
And, how to understand write-skew prevention?

Yes. write-skew prevention is right the reason why people use SELECT FOR UPDATE in optimistic transactions. For example, select where id = 1 and use the result to update the row id = 2. There can be a write-skew if we don't check conflict on id = 1.

I executed the optimistic transaction like "begin optimistic;select * from t1 where name = "xxxxx" for update ;commit;". It also generated a Lock record for write CF. So, the LOCK record also as primary key of optimistic transaction ans marks the committed status, right?

Right. And that's why I say "However, it is not true for the primary keys.". In this case, the Lock cannot be collapsed.

@TonsnakeLin
Copy link

It brings a little extra jobs when committing the transaction such as writing two column families, deleting old Lock record. We should do the performance test for small oltp transactions which need to be committed as quickly as possible.
The reset LGTM.

@sticnarf
Copy link
Contributor Author

To completely solve the issue, it is a must to overwrite the latest Lock or Rollback. Otherwise, the tombstones in the memtable can greatly slow down reading.

However, reusing the key of the latest Lock is too tricky. It is a huge question mark whether it is worth doing this way.

@cfzjywxk
Copy link
Contributor

To completely solve the issue, it is a must to overwrite the latest Lock or Rollback. Otherwise, the tombstones in the memtable can greatly slow down reading.

However, reusing the key of the latest Lock is too tricky. It is a huge question mark whether it is worth doing this way.

It's quite risky to overwrite the existing key, maybe other workarounds are needed to avoid the worst case with too much tombstone next in the memory table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants