Skip to content

tidb pd中region分布 不均匀 对应的几个scheduler的状态是halt #21025

@liuzhao900928

Description

@liuzhao900928

【TiDB 使用环境】生产环境
【TiDB 版本】7.5.1
【操作系统】龙蜥7.9
【部署方式】机器部署(160G内存、磁盘1.1T)
【集群数据量】 173.5G
【集群节点数】1
【问题复现路径】
单节点tidb集群,当时部署了三个tikv节点,因跑任务有时会报错反馈资源紧张,备份也会报region is unavailable 后 强制下线掉两个tikv节点(这里是个雷)后 报错,又扩容回3个tikv,扩容结束后集群正常,但region分布不均匀,原始正常的1个tikv节点数据目录200G,扩容的2个节点数据目录10G+。跑任务报错比之前频繁了,报错region is unavailable 或please make sure tidb can connect to tikv。
【遇到的问题:问题现象及影响】
单节点tidb集群,当时部署了三个tikv节点,因跑任务有时会报错反馈资源紧张,备份也会报region is unavailable 后 强制下线掉两个tikv节点(这里是个雷)后 报错,又扩容回3个tikv,扩容结束后集群正常,但region分布不均匀,原始正常的1个tikv节点数据目录200G,扩容的2个节点数据目录10G+。跑任务报错比之前频繁了,报错region is unavailable 或please make sure tidb can connect to tikv。
影响就是偶发的region不可达

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【复制黏贴 ERROR 报错的日志】
pd的报错日志:
[2025/10/23 16:16:13.710 +08:00] [INFO] [prepare_checker.go:65] [“not loaded from storage region number is satisfied, finish prepare checker”] [not-from-storage-region=17584] [total-region=17584]
[2025/10/23 16:16:13.710 +08:00] [INFO] [coordinator.go:390] [“coordinator has finished cluster information preparation”]
[2025/10/23 16:16:13.710 +08:00] [INFO] [coordinator.go:400] [“coordinator starts to run schedulers”]
[2025/10/23 16:16:13.711 +08:00] [INFO] [coordinator.go:461] [“create scheduler with independent configuration”] [scheduler-name=balance-hot-region-scheduler]
[2025/10/23 16:16:13.713 +08:00] [INFO] [coordinator.go:461] [“create scheduler with independent configuration”] [scheduler-name=balance-leader-scheduler]
[2025/10/23 16:16:13.715 +08:00] [INFO] [coordinator.go:461] [“create scheduler with independent configuration”] [scheduler-name=balance-region-scheduler]
[2025/10/23 16:16:13.716 +08:00] [INFO] [coordinator.go:461] [“create scheduler with independent configuration”] [scheduler-name=balance-witness-scheduler]
[2025/10/23 16:16:13.720 +08:00] [INFO] [coordinator.go:461] [“create scheduler with independent configuration”] [scheduler-name=evict-leader-scheduler]
[2025/10/23 16:16:13.720 +08:00] [ERROR] [coordinator.go:464] [“can not add scheduler with independent configuration”] [scheduler-name=evict-leader-scheduler] [scheduler-args=“[4422294]”] [error=“[PD:scheduler:ErrSchedulerExisted]scheduler existed”]
[2025/10/23 16:16:13.720 +08:00] [INFO] [coordinator.go:461] [“create scheduler with independent configuration”] [scheduler-name=transfer-witness-leader-scheduler]
[2025/10/23 16:16:13.721 +08:00] [INFO] [coordinator.go:487] [“create scheduler”] [scheduler-name=balance-region-scheduler] [scheduler-args=“”]
[2025/10/23 16:16:13.721 +08:00] [INFO] [coordinator.go:487] [“create scheduler”] [scheduler-name=balance-leader-scheduler] [scheduler-args=“”]
[2025/10/23 16:16:13.722 +08:00] [INFO] [coordinator.go:487] [“create scheduler”] [scheduler-name=balance-witness-scheduler] [scheduler-args=“”]
[2025/10/23 16:16:13.722 +08:00] [INFO] [coordinator.go:487] [“create scheduler”] [scheduler-name=balance-hot-region-scheduler] [scheduler-args=“”]
[2025/10/23 16:16:13.722 +08:00] [INFO] [coordinator.go:487] [“create scheduler”] [scheduler-name=transfer-witness-leader-scheduler] [scheduler-args=“”]
[2025/10/23 16:16:13.722 +08:00] [INFO] [coordinator.go:487] [“create scheduler”] [scheduler-name=evict-leader-scheduler] [scheduler-args=“[4422294]”]
[2025/10/23 16:16:13.722 +08:00] [INFO] [coordinator.go:487] [“create scheduler”] [scheduler-name=evict-leader-scheduler] [scheduler-args=“[1]”]
[2025/10/23 16:16:13.726 +08:00] [INFO] [coordinator.go:256] [“coordinator begins to check suspect key ranges”]
[2025/10/23 16:16:13.726 +08:00] [INFO] [coordinator.go:320] [“coordinator begins to actively drive push operator”]
[2025/10/23 16:16:13.726 +08:00] [INFO] [coordinator.go:147] [“coordinator starts patrol regions”]
[2025/10/23 16:19:03.542 +08:00] [WARN] [grpclog.go:60] [“grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"”]
[2025/10/23 16:19:22.445 +08:00] [WARN] [heartbeat_streams.go:165] [“send keepalive message fail, store maybe disconnected”] [target-store-id=12250001] [error=EOF]

Metadata

Metadata

Assignees

No one assigned

    Labels

    contributionThis PR is from a community contributor.first-time-contributorIndicates that the PR was contributed by an external member and is a first-time contributor.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions