-
Notifications
You must be signed in to change notification settings - Fork 699
restore: update the definition of the parameter --load-stats and the usage of pitr id map #21078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Jianjun Liao <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Leavrth, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request. This PR updates the documentation for TiDB's Backup & Restore (BR) tool, specifically focusing on changes introduced in version 9.0.0. The updates cover how BR handles the pitr_id_map
for Point-in-Time Recovery (PITR), clarifies the behavior of the --load-stats
parameter during restore, and introduces documentation for a new parameter, --fast-load-sys-tables
, which enables physical restoration of system tables like statistics.
Highlights
- PITR
pitr_id_map
Handling: Documentation is updated to reflect that starting from v9.0.0, thepitr_id_map
data might be stored in the log backup directory (pitr_id_maps/pitr_id_map...
) for compatibility with older clusters, in addition to or instead of themysql.tidb_pitr_id_map
system table. Warnings about deleting this data are also updated. --load-stats
Parameter Clarification: A note is added to clarify that when the--load-stats
parameter is set tofalse
(starting v9.0.0), BR will not updatemysql.stats_meta
for restored tables, requiring manualanalyze table
.- New
--fast-load-sys-tables
Parameter: Documentation is added for the new--fast-load-sys-tables
parameter (defaulting to enabled from v9.0.0). This parameter allows BR to physically restore system tables usingRENAME TABLE
DDL when restoring to a new cluster where table/partition IDs can be reused. This method completely overwrites original data in the system tables, unlike the logicalREPLACE INTO
method.
Changelog
- br/br-checkpoint-restore.md
- Updated description of where
pitr_id_map
is persisted during log restore, mentioning the external storage file path (pitr_id_maps/...
) as an alternative to the system table (mysql.tidb_pitr_id_map
) for v9.0.0+ compatibility (around line 92-96). - Updated the warning about deleting
pitr_id_map
data/files, emphasizing potential inconsistency (around line 92, 158).
- Updated description of where
- br/br-snapshot-guide.md
- Added a bullet point introducing the
--fast-load-sys-tables
parameter and its function for physical system table restore in v9.0.0+ (around line 154).
- Added a bullet point introducing the
- br/br-snapshot-manual.md
- Added a note clarifying the behavior of
--load-stats
when set tofalse
in v9.0.0+ (around line 130-132). - Added a section introducing and explaining the
--fast-load-sys-tables
parameter, including its default state, conditions for physical load, and theRENAME TABLE
mechanism (around line 135-137). - Included example commands demonstrating the use of
--load-stats
and--fast-load-sys-tables
(around line 140-143, 199-206). - Added a note reiterating that physical system table restore overwrites original data (around line 209-211).
- Added a note clarifying the behavior of
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
Thank you for your contribution to the TiDB documentation! This pull request updates the documentation regarding the --load-stats
parameter and the usage of the pitr_id_map
, as well as introducing the new --fast-load-sys-tables
parameter for v9.0. The changes improve the clarity and completeness of the documentation in these areas.
I've reviewed the changes against our documentation style guide and the standard review criteria. Overall, the changes are well-structured and technically accurate. I've identified a few areas for potential improvement, primarily related to avoiding repetition and slightly refining phrasing for clarity, as outlined in the review comments below.
Specifically, I've referenced the style guide's principles on avoiding unnecessary repetition [^1] and maintaining clarity/simplicity [^2].
Summary of Findings
- Duplication of information about
--fast-load-sys-tables
: Information about the new--fast-load-sys-tables
parameter, its mechanism, and the difference between physical and logical restore is duplicated acrossbr-snapshot-guide.md
andbr-snapshot-manual.md
(appearing twice in the latter). This violates the style guide's principle of avoiding unnecessary repetition [^1] and makes the documentation harder to maintain and read. It is recommended to consolidate this information into a single primary location and link to it from other relevant sections. - Informal phrasing: The phrase "at will" is used in two places to describe the risk of deleting
pitr_id_map
data/files. Using a more formal term like "arbitrarily" or "without understanding the implications" would improve clarity [^2]. - Documentation structure and flow: The introduction of the
--fast-load-sys-tables
parameter inbr-snapshot-manual.md
could benefit from a clearer transition or a dedicated sub-heading to improve logical flow [^3]. - Example clarity: An example command in
br-snapshot-manual.md
includes the--load-stats
parameter, which is the default behavior, while the preceding note discusses the behavior when this parameter is set tofalse
. This could potentially confuse readers. Clarifying the example or the note could improve understanding.
Merge Readiness
The pull request introduces valuable updates regarding restore parameters and pitr_id_map
usage. However, the significant duplication of information about the --fast-load-sys-tables
parameter is a high-severity issue that impacts the clarity and maintainability of the documentation. I recommend addressing this duplication and the other minor points before merging. I am unable to approve this pull request directly; please have other reviewers review and approve this code before merging.
Starting from TiDB v9.0.0, BR lets you specify `--fast-load-sys-tables` to restore statistic data physically in a new cluster: | ||
|
||
```shell | ||
tiup br restore full \ | ||
--pd "${PD_IP}:2379" \ | ||
--with-sys-table \ | ||
--fast-load-sys-tables \ | ||
--storage "s3://${backup_collection_addr}/snapshot-${date}?access-key=${access-key}&secret-access-key=${secret-access-key}" \ | ||
--ratelimit 128 \ | ||
--log-file restorefull.log | ||
``` | ||
|
||
> **Note:** | ||
> | ||
> Different from restoring system tables logically by `REPLACE INTO` SQL, restoring system tables physically will completely overwrite the original data in the system tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section introduces the --fast-load-sys-tables
parameter and includes a note explaining the difference between physical and logical restore. This information appears to be duplicated from the previous diff in this file (lines 136-137 and the following example) and also from br/br-snapshot-guide.md
(lines 154 and the following note). To avoid unnecessary repetition 1 and maintain consistency, could we consolidate this information into a single primary location (perhaps in br-snapshot-guide.md
where system table restore is discussed) and link to it from other relevant sections?
Style Guide References
Footnotes
When the backup and restore feature backs up data, it stores statistics in JSON format within the `backupmeta` file. When restoring data, it loads statistics in JSON format into the cluster. For more information, see [LOAD STATS](/sql-statements/sql-statement-load-stats.md). | ||
|
||
Starting from 9.0.0, BR introduces the parameter `--fast-load-sys-tables`, which is enabled by default. When the br restore data in a new cluster and the IDs of tables and partitions can be reused (otherwise, it will automatically fall back to logically load statistic data), by setting `--fast-load-sys-tables`, br will use the `RENAME TABLE` DDL to atomically swap the system tables in the database `__TiDB_BR_Temporary_mysql` with the system tables in the database `mysql`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph introduces the --fast-load-sys-tables
parameter. It currently starts immediately after a sentence about LOAD STATS
. To improve the logical flow and readability 1, consider adding a sub-heading for this new parameter or rephrasing the introductory sentence to provide a clearer transition.
Style Guide References
Starting from 9.0.0, BR introduces the parameter `--fast-load-sys-tables`, which is enabled by default. When the br restore data in a new cluster and the IDs of tables and partitions can be reused (otherwise, it will automatically fall back to logically load statistic data), by setting `--fast-load-sys-tables`, br will use the `RENAME TABLE` DDL to atomically swap the system tables in the database `__TiDB_BR_Temporary_mysql` with the system tables in the database `mysql`. | |
### Fast Load System Tables | |
Starting from 9.0.0, BR introduces the parameter `--fast-load-sys-tables`, which is enabled by default. When the br restore data in a new cluster and the IDs of tables and partitions can be reused (otherwise, it will automatically fall back to logically load statistic data), by setting `--fast-load-sys-tables`, br will use the `RENAME TABLE` DDL to atomically swap the system tables in the database `__TiDB_BR_Temporary_mysql` with the system tables in the database `mysql`. |
Footnotes
@@ -89,7 +89,11 @@ During the initial restore, `br` first enters the snapshot restore phase. BR rec | |||
|
|||
When entering the log restore phase during the initial restore, `br` creates a `__TiDB_BR_Temporary_Log_Restore_Checkpoint` database in the target cluster. This database records checkpoint data, the upstream cluster ID, and the restore time range (`start-ts` and `restored-ts`). If restore fails during this phase, you need to specify the same `start-ts` and `restored-ts` as recorded in the checkpoint database when retrying. Otherwise, `br` will report an error and prompt that the current specified restore time range or upstream cluster ID is different from the checkpoint record. If the restore cluster has been cleaned, you can manually delete the `__TiDB_BR_Temporary_Log_Restore_Checkpoint` database and retry with a different backup. | |||
|
|||
Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. Deleting data from `mysql.tidb_pitr_id_map` might lead to inconsistent PITR restore data. | |||
Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. **Deleting data from `mysql.tidb_pitr_id_map` at will might lead to inconsistent PITR restore data.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "at will" is a bit informal. Could we use a more formal or precise term here, such as "arbitrarily" or "without understanding the implications"? This would align with maintaining clarity and simplicity in technical documentation 1.
Style Guide References
Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. **Deleting data from `mysql.tidb_pitr_id_map` at will might lead to inconsistent PITR restore data.** | |
Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. **Deleting data from `mysql.tidb_pitr_id_map` arbitrarily might lead to inconsistent PITR restore data.** |
Footnotes
@@ -151,4 +155,4 @@ During the initial restore, `br` first enters the snapshot restore phase. BR rec | |||
|
|||
When entering the log restore phase during the initial restore, `br` creates a `restore-{downstream-cluster-ID}/log` path in the target cluster. This path records checkpoint data, the upstream cluster ID, and the restore time range (`start-ts` and `restored-ts`). If restore fails during this phase, you need to specify the same `start-ts` and `restored-ts` as recorded in the checkpoint database when retrying. Otherwise, `br` will report an error and prompt that the current specified restore time range or upstream cluster ID is different from the checkpoint record. If the restore cluster has been cleaned, you can manually clean up the checkpoint data in the external storage or specify another external storage path to store checkpoint data, and retry with a different backup. | |||
|
|||
Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. Deleting data from `mysql.tidb_pitr_id_map` might lead to inconsistent PITR restore data. | |||
Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the checkpoint storage with the file name `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}` to prevent duplicate allocation of database and table IDs. **Deleting files from the directory `pitr_id_maps` at will might lead to inconsistent PITR restore data.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous comment, the phrase "at will" could be replaced with a more formal term like "arbitrarily" or "without understanding the implications" for better clarity 1.
Style Guide References
Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the checkpoint storage with the file name `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}` to prevent duplicate allocation of database and table IDs. **Deleting files from the directory `pitr_id_maps` at will might lead to inconsistent PITR restore data.** | |
Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the checkpoint storage with the file name `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}` to prevent duplicate allocation of database and table IDs. **Deleting files from the directory `pitr_id_maps` arbitrarily might lead to inconsistent PITR restore data.** |
Footnotes
|
||
```shell | ||
tiup br restore full \ | ||
--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --load-stats --fast-load-sys-tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example command includes --load-stats
, which is mentioned as the default behavior earlier in the document (before this diff). While including it is not incorrect, the preceding note discusses the behavior when --load-stats
is false. To avoid potential confusion and focus the example on the new --fast-load-sys-tables
parameter, you could consider removing --load-stats
from this example, or adding a brief note explaining why it's included (e.g., "explicitly enabling the default").
--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --load-stats --fast-load-sys-tables | |
tiup br restore full \ | |
--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --fast-load-sys-tables |
@gemini-code-assist[bot]: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
--load-stats
and the usage of pitr id map docs-cn#20346Do your changes match any of the following descriptions?