restore: update the definition of the parameter --load-stats and the usage of pitr id map #21078

Leavrth · 2025-05-30T04:36:30Z

First-time contributors' checklist

I've signed Contributor License Agreement that's required for repo owners to accept my contribution.

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

What is the related PR or file link(s)?

This PR is translated from: restore: update the definition of the parameter --load-stats and the usage of pitr id map docs-cn#20346
Other reference link(s):

Do your changes match any of the following descriptions?

Delete files
Change aliases
Need modification after applied to another branch
Might cause conflicts after applied to another branch

Signed-off-by: Jianjun Liao <[email protected]>

ti-chi-bot · 2025-05-30T04:36:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign breezewish for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Hello @Leavrth, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. This PR updates the documentation for TiDB's Backup & Restore (BR) tool, specifically focusing on changes introduced in version 9.0.0. The updates cover how BR handles the pitr_id_map for Point-in-Time Recovery (PITR), clarifies the behavior of the --load-stats parameter during restore, and introduces documentation for a new parameter, --fast-load-sys-tables, which enables physical restoration of system tables like statistics.

Highlights

PITR pitr_id_map Handling: Documentation is updated to reflect that starting from v9.0.0, the pitr_id_map data might be stored in the log backup directory (pitr_id_maps/pitr_id_map...) for compatibility with older clusters, in addition to or instead of the mysql.tidb_pitr_id_map system table. Warnings about deleting this data are also updated.
--load-stats Parameter Clarification: A note is added to clarify that when the --load-stats parameter is set to false (starting v9.0.0), BR will not update mysql.stats_meta for restored tables, requiring manual analyze table.
New --fast-load-sys-tables Parameter: Documentation is added for the new --fast-load-sys-tables parameter (defaulting to enabled from v9.0.0). This parameter allows BR to physically restore system tables using RENAME TABLE DDL when restoring to a new cluster where table/partition IDs can be reused. This method completely overwrites original data in the system tables, unlike the logical REPLACE INTO method.

Changelog

br/br-checkpoint-restore.md
- Updated description of where pitr_id_map is persisted during log restore, mentioning the external storage file path (pitr_id_maps/...) as an alternative to the system table (mysql.tidb_pitr_id_map) for v9.0.0+ compatibility (around line 92-96).
- Updated the warning about deleting pitr_id_map data/files, emphasizing potential inconsistency (around line 92, 158).
br/br-snapshot-guide.md
- Added a bullet point introducing the --fast-load-sys-tables parameter and its function for physical system table restore in v9.0.0+ (around line 154).
br/br-snapshot-manual.md
- Added a note clarifying the behavior of --load-stats when set to false in v9.0.0+ (around line 130-132).
- Added a section introducing and explaining the --fast-load-sys-tables parameter, including its default state, conditions for physical load, and the RENAME TABLE mechanism (around line 135-137).
- Included example commands demonstrating the use of --load-stats and --fast-load-sys-tables (around line 140-143, 199-206).
- Added a note reiterating that physical system table restore overwrites original data (around line 209-211).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

Thank you for your contribution to the TiDB documentation! This pull request updates the documentation regarding the --load-stats parameter and the usage of the pitr_id_map, as well as introducing the new --fast-load-sys-tables parameter for v9.0. The changes improve the clarity and completeness of the documentation in these areas.

I've reviewed the changes against our documentation style guide and the standard review criteria. Overall, the changes are well-structured and technically accurate. I've identified a few areas for potential improvement, primarily related to avoiding repetition and slightly refining phrasing for clarity, as outlined in the review comments below.

Specifically, I've referenced the style guide's principles on avoiding unnecessary repetition [^1] and maintaining clarity/simplicity [^2].

Summary of Findings

Duplication of information about --fast-load-sys-tables: Information about the new --fast-load-sys-tables parameter, its mechanism, and the difference between physical and logical restore is duplicated across br-snapshot-guide.md and br-snapshot-manual.md (appearing twice in the latter). This violates the style guide's principle of avoiding unnecessary repetition [^1] and makes the documentation harder to maintain and read. It is recommended to consolidate this information into a single primary location and link to it from other relevant sections.
Informal phrasing: The phrase "at will" is used in two places to describe the risk of deleting pitr_id_map data/files. Using a more formal term like "arbitrarily" or "without understanding the implications" would improve clarity [^2].
Documentation structure and flow: The introduction of the --fast-load-sys-tables parameter in br-snapshot-manual.md could benefit from a clearer transition or a dedicated sub-heading to improve logical flow [^3].
Example clarity: An example command in br-snapshot-manual.md includes the --load-stats parameter, which is the default behavior, while the preceding note discusses the behavior when this parameter is set to false. This could potentially confuse readers. Clarifying the example or the note could improve understanding.

Merge Readiness

The pull request introduces valuable updates regarding restore parameters and pitr_id_map usage. However, the significant duplication of information about the --fast-load-sys-tables parameter is a high-severity issue that impacts the clarity and maintainability of the documentation. I recommend addressing this duplication and the other minor points before merging. I am unable to approve this pull request directly; please have other reviewers review and approve this code before merging.

gemini-code-assist · 2025-05-30T04:37:23Z

br/br-snapshot-manual.md

+Starting from TiDB v9.0.0, BR lets you specify `--fast-load-sys-tables` to restore statistic data physically in a new cluster:
+
+```shell
+tiup br restore full \
+    --pd "${PD_IP}:2379" \
+    --with-sys-table \
+    --fast-load-sys-tables \
+    --storage "s3://${backup_collection_addr}/snapshot-${date}?access-key=${access-key}&secret-access-key=${secret-access-key}" \
+    --ratelimit 128 \
+    --log-file restorefull.log
+```
+
+> **Note:**
+>
+> Different from restoring system tables logically by `REPLACE INTO` SQL, restoring system tables physically will completely overwrite the original data in the system tables.


This section introduces the --fast-load-sys-tables parameter and includes a note explaining the difference between physical and logical restore. This information appears to be duplicated from the previous diff in this file (lines 136-137 and the following example) and also from br/br-snapshot-guide.md (lines 154 and the following note). To avoid unnecessary repetition ¹ and maintain consistency, could we consolidate this information into a single primary location (perhaps in br-snapshot-guide.md where system table restore is discussed) and link to it from other relevant sections?

Style Guide References

Footnotes

Avoid unnecessary words and repetition. (link) ↩

gemini-code-assist · 2025-05-30T04:37:23Z

br/br-snapshot-manual.md

 When the backup and restore feature backs up data, it stores statistics in JSON format within the `backupmeta` file. When restoring data, it loads statistics in JSON format into the cluster. For more information, see [LOAD STATS](/sql-statements/sql-statement-load-stats.md).

+Starting from 9.0.0, BR introduces the parameter `--fast-load-sys-tables`, which is enabled by default. When the br restore data in a new cluster and the IDs of tables and partitions can be reused (otherwise, it will automatically fall back to logically load statistic data), by setting `--fast-load-sys-tables`, br will use the `RENAME TABLE` DDL to atomically swap the system tables in the database `__TiDB_BR_Temporary_mysql` with the system tables in the database `mysql`.


This paragraph introduces the --fast-load-sys-tables parameter. It currently starts immediately after a sentence about LOAD STATS. To improve the logical flow and readability ¹, consider adding a sub-heading for this new parameter or rephrasing the introductory sentence to provide a clearer transition.

Style Guide References

Suggested change

Starting from 9.0.0, BR introduces the parameter `--fast-load-sys-tables`, which is enabled by default. When the br restore data in a new cluster and the IDs of tables and partitions can be reused (otherwise, it will automatically fall back to logically load statistic data), by setting `--fast-load-sys-tables`, br will use the `RENAME TABLE` DDL to atomically swap the system tables in the database `__TiDB_BR_Temporary_mysql` with the system tables in the database `mysql`.

### Fast Load System Tables

Starting from 9.0.0, BR introduces the parameter `--fast-load-sys-tables`, which is enabled by default. When the br restore data in a new cluster and the IDs of tables and partitions can be reused (otherwise, it will automatically fall back to logically load statistic data), by setting `--fast-load-sys-tables`, br will use the `RENAME TABLE` DDL to atomically swap the system tables in the database `__TiDB_BR_Temporary_mysql` with the system tables in the database `mysql`.

Footnotes

Maintain logical flow and sentence structure. (link) ↩

gemini-code-assist · 2025-05-30T04:37:23Z

br/br-checkpoint-restore.md

@@ -89,7 +89,11 @@ During the initial restore, `br` first enters the snapshot restore phase. BR rec

 When entering the log restore phase during the initial restore, `br` creates a `__TiDB_BR_Temporary_Log_Restore_Checkpoint` database in the target cluster. This database records checkpoint data, the upstream cluster ID, and the restore time range (`start-ts` and `restored-ts`). If restore fails during this phase, you need to specify the same `start-ts` and `restored-ts` as recorded in the checkpoint database when retrying. Otherwise, `br` will report an error and prompt that the current specified restore time range or upstream cluster ID is different from the checkpoint record. If the restore cluster has been cleaned, you can manually delete the `__TiDB_BR_Temporary_Log_Restore_Checkpoint` database and retry with a different backup.

-Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. Deleting data from `mysql.tidb_pitr_id_map` might lead to inconsistent PITR restore data.
+Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. **Deleting data from `mysql.tidb_pitr_id_map` at will might lead to inconsistent PITR restore data.**


The phrase "at will" is a bit informal. Could we use a more formal or precise term here, such as "arbitrarily" or "without understanding the implications"? This would align with maintaining clarity and simplicity in technical documentation ¹.

Style Guide References

Suggested change

Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. **Deleting data from `mysql.tidb_pitr_id_map` at will might lead to inconsistent PITR restore data.**

Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. **Deleting data from `mysql.tidb_pitr_id_map` arbitrarily might lead to inconsistent PITR restore data.**

Footnotes

Maintain clarity and simplicity. (link) ↩

gemini-code-assist · 2025-05-30T04:37:23Z

br/br-checkpoint-restore.md

@@ -151,4 +155,4 @@ During the initial restore, `br` first enters the snapshot restore phase. BR rec

 When entering the log restore phase during the initial restore, `br` creates a `restore-{downstream-cluster-ID}/log` path in the target cluster. This path records checkpoint data, the upstream cluster ID, and the restore time range (`start-ts` and `restored-ts`). If restore fails during this phase, you need to specify the same `start-ts` and `restored-ts` as recorded in the checkpoint database when retrying. Otherwise, `br` will report an error and prompt that the current specified restore time range or upstream cluster ID is different from the checkpoint record. If the restore cluster has been cleaned, you can manually clean up the checkpoint data in the external storage or specify another external storage path to store checkpoint data, and retry with a different backup.

-Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. Deleting data from `mysql.tidb_pitr_id_map` might lead to inconsistent PITR restore data.
+Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the checkpoint storage with the file name `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}` to prevent duplicate allocation of database and table IDs. **Deleting files from the directory `pitr_id_maps` at will might lead to inconsistent PITR restore data.**


Similar to the previous comment, the phrase "at will" could be replaced with a more formal term like "arbitrarily" or "without understanding the implications" for better clarity ¹.

Style Guide References

Suggested change

Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the checkpoint storage with the file name `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}` to prevent duplicate allocation of database and table IDs. **Deleting files from the directory `pitr_id_maps` at will might lead to inconsistent PITR restore data.**

Before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the checkpoint storage with the file name `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}` to prevent duplicate allocation of database and table IDs. **Deleting files from the directory `pitr_id_maps` arbitrarily might lead to inconsistent PITR restore data.**

Footnotes

Maintain clarity and simplicity. (link) ↩

gemini-code-assist · 2025-05-30T04:37:23Z

br/br-snapshot-manual.md

+
+```shell
+tiup br restore full \
+--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --load-stats --fast-load-sys-tables


The example command includes --load-stats, which is mentioned as the default behavior earlier in the document (before this diff). While including it is not incorrect, the preceding note discusses the behavior when --load-stats is false. To avoid potential confusion and focus the example on the new --fast-load-sys-tables parameter, you could consider removing --load-stats from this example, or adding a brief note explaining why it's included (e.g., "explicitly enabling the default").

Suggested change

--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --load-stats --fast-load-sys-tables

tiup br restore full \

--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --fast-load-sys-tables

ti-chi-bot · 2025-05-30T04:37:29Z

@gemini-code-assist[bot]: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Code Review

Thank you for your contribution to the TiDB documentation! This pull request updates the documentation regarding the --load-stats parameter and the usage of the pitr_id_map, as well as introducing the new --fast-load-sys-tables parameter for v9.0. The changes improve the clarity and completeness of the documentation in these areas.

I've reviewed the changes against our documentation style guide and the standard review criteria. Overall, the changes are well-structured and technically accurate. I've identified a few areas for potential improvement, primarily related to avoiding repetition and slightly refining phrasing for clarity, as outlined in the review comments below.

Specifically, I've referenced the style guide's principles on avoiding unnecessary repetition [^1] and maintaining clarity/simplicity [^2].

Summary of Findings

Duplication of information about --fast-load-sys-tables: Information about the new --fast-load-sys-tables parameter, its mechanism, and the difference between physical and logical restore is duplicated across br-snapshot-guide.md and br-snapshot-manual.md (appearing twice in the latter). This violates the style guide's principle of avoiding unnecessary repetition [^1] and makes the documentation harder to maintain and read. It is recommended to consolidate this information into a single primary location and link to it from other relevant sections.

Informal phrasing: The phrase "at will" is used in two places to describe the risk of deleting pitr_id_map data/files. Using a more formal term like "arbitrarily" or "without understanding the implications" would improve clarity [^2].

Documentation structure and flow: The introduction of the --fast-load-sys-tables parameter in br-snapshot-manual.md could benefit from a clearer transition or a dedicated sub-heading to improve logical flow [^3].

Example clarity: An example command in br-snapshot-manual.md includes the --load-stats parameter, which is the default behavior, while the preceding note discusses the behavior when this parameter is set to false. This could potentially confuse readers. Clarifying the example or the note could improve understanding.

Merge Readiness

The pull request introduces valuable updates regarding restore parameters and pitr_id_map usage. However, the significant duplication of information about the --fast-load-sys-tables parameter is a high-severity issue that impacts the clarity and maintainability of the documentation. I recommend addressing this duplication and the other minor points before merging. I am unable to approve this pull request directly; please have other reviewers review and approve this code before merging.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

draft

9dd0ed6

Signed-off-by: Jianjun Liao <[email protected]>

ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 30, 2025

gemini-code-assist bot reviewed May 30, 2025

View reviewed changes

Leavrth mentioned this pull request May 30, 2025

restore: update the definition of the parameter --load-stats and the usage of pitr id map pingcap/docs-cn#20346

Open

16 tasks

gemini-code-assist bot suggested changes May 30, 2025

View reviewed changes

lilin90 added the translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. label May 30, 2025

ti-chi-bot bot removed the missing-translation-status This PR does not have translation status info. label May 30, 2025

lilin90 self-assigned this May 30, 2025

lilin90 added v9.0-beta.2 This PR/issue applies to TiDB v9.0-beta.2. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels May 30, 2025

lilin90 requested a review from BornChanger May 30, 2025 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

restore: update the definition of the parameter --load-stats and the usage of pitr id map #21078

restore: update the definition of the parameter --load-stats and the usage of pitr id map #21078

Leavrth commented May 30, 2025 •

edited

Loading

Uh oh!

ti-chi-bot bot commented May 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot May 30, 2025

Uh oh!

gemini-code-assist bot May 30, 2025

Uh oh!

gemini-code-assist bot May 30, 2025

Uh oh!

gemini-code-assist bot May 30, 2025

Uh oh!

gemini-code-assist bot May 30, 2025

Uh oh!

ti-chi-bot bot commented May 30, 2025

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

		When the backup and restore feature backs up data, it stores statistics in JSON format within the `backupmeta` file. When restoring data, it loads statistics in JSON format into the cluster. For more information, see [LOAD STATS](/sql-statements/sql-statement-load-stats.md).

		Starting from 9.0.0, BR introduces the parameter `--fast-load-sys-tables`, which is enabled by default. When the br restore data in a new cluster and the IDs of tables and partitions can be reused (otherwise, it will automatically fall back to logically load statistic data), by setting `--fast-load-sys-tables`, br will use the `RENAME TABLE` DDL to atomically swap the system tables in the database `__TiDB_BR_Temporary_mysql` with the system tables in the database `mysql`.

	--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --load-stats --fast-load-sys-tables
	tiup br restore full \
	--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --fast-load-sys-tables

restore: update the definition of the parameter --load-stats and the usage of pitr id map #21078

Are you sure you want to change the base?

restore: update the definition of the parameter --load-stats and the usage of pitr id map #21078

Conversation

Leavrth commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

Uh oh!

ti-chi-bot bot commented May 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot May 30, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist bot May 30, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist bot May 30, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist bot May 30, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist bot May 30, 2025

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented May 30, 2025

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Leavrth commented May 30, 2025 •

edited

Loading