Skip to content

Commit bb12397

Browse files
authored
Rawkv backup and restore (#104)
* Add backup & restore section to API v2 Signed-off-by: haojinming <[email protected]>
1 parent 4aeef16 commit bb12397

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

text/0069-api-v2.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,30 @@ More about `GC` of deleted/expired entries:
236236

237237
### Backup and Restore
238238

239-
*To be supplemented in another PR*
239+
Due to the significant change on storage format, only if the existing TiKV cluster is empty or storing only `TiDB` data, users can enable or disable API v2 smoothly. In other scenario, we need implement one tool to migrate data, which is called [TiKV-BR].
240+
241+
[TiKV-BR] forks from [TiDB BR] and need some improvements:
242+
- Storage data conversion from `API V1`(with or without [TTL]) to `API V2`.
243+
- Backup supports return `backup-ts` in `API V2` TiKV cluster. [RawKV CDC] can use `backup-ts` as `start-ts` for replication tasks.
244+
245+
#### API Version Conversion
246+
247+
To support API version conversion, a parameter `dst-api-version` is introduced in [TiKV-BR] and pass it to TiKV stores in [`BackupRequest`].
248+
During the backup process, TiKV stores scan on all raw key-value entries and convert from current `api-version` to `dst-api-version` if they are different, then write the converted data to [SST] files. Restoration does not need the rewriting and conversion, which can speed up the restoration process.
249+
250+
#### Backup Ts
251+
252+
`backup-ts` only takes effect in the `API V2` cluster, which is defined as the timestamp that all data written before which have been backed up.
253+
As RawKV uses pre-fetches `TSO` for writing, the latest `TSO` from PD does not satisfy the `backup-ts` requirements obviously.
254+
255+
The process to get `backup-ts` is as following:
256+
1. Get current `TSO` in [TiKV-BR] at the beginning of backup.
257+
2. Flush cached `TSO` in every TiKV store before scanning process during backup, to make sure that all writes afterward will have larger timestamps.
258+
3. Subtract `safe-interval` from current `TSO`.
259+
260+
The third step is introduced because there would be **inflight** RawKV entries with timestamp before the current `TSO`. The `safe-interval` is a safe enough time duration that during which **inflight** writes should have finished. This is an empirical value and defaults to 1 minute, which is safe enough even in a high-pressure system.
261+
262+
Besides, this process can be optimized after we implement `safe-ts` for RawKV in stale read scenario, which is a timestamp that all RawKV writes before this timestamp can be seen **safely**.
240263

241264
### Change Data Capture
242265

@@ -269,4 +292,9 @@ Upgrade to the latest TiKV Go Client and use `V1` mode.
269292
[TiDB BR]: https://docs.pingcap.com/tidb/stable/backup-and-restore-overview
270293
[TiCDC]: https://docs.pingcap.com/tidb/stable/ticdc-overview
271294
[GC Overview]: https://docs.pingcap.com/tidb/dev/garbage-collection-overview#gc-overview
295+
[TiKV-BR]: https://github.com/tikv/migration/tree/main/br
296+
[TTL]: https://docs.pingcap.com/zh/tidb/dev/tikv-configuration-file#enable-ttl
297+
[`BackupRequest`]: https://github.com/pingcap/kvproto/blob/3debb6820e46da7a8f310e3f081222183cdd8030/proto/brpb.proto#L170
298+
[creating change feed]: https://tikv.org/docs/dev/concepts/explore-tikv-features/cdc/cdc/#manage-replication-tasks-changefeed
272299
[RawKV Change Data Capture]: ./0083-rawkv-change-data-capture.md
300+
[RawKV CDC]: ./0083-rawkv-change-data-capture.md

0 commit comments

Comments
 (0)