-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Description
Bug Report
1. Minimal reproduce step (Required)
- Run a workload that produces many concurrent transactions with 2PC commit.
- Make secondaries commit slow (e.g. TiKV overloaded/network slow/region changes) so commit RPCs stay in-flight for a long time.
- Observe TiDB goroutine count and memory usage.
2. What did you expect to see? (Required)
When TiKV is slow, TiDB should apply backpressure instead of creating an unbounded number of goroutines for background tasks.
3. What did you see instead? (Required)
TiDB can accumulate many background goroutines originating from the 2PC commit path, and memory can grow due to long-lived closures keeping transaction state reachable.
A representative goroutine dump shows:
-
Background secondaries commit:
txnkv/transaction.(*twoPhaseCommitter).doActionOnGroupMutations.func1txnkv/transaction.(*KVTxn).spawnWithStorePool.func1github.com/tiancaiamao/gp.workerLoop
-
Slow/blocked commit RPC:
internal/client.sendBatchRequesttxnkv/transaction.actionCommit.handleSingleBatch
4. Root cause analysis (Optional)
spawnWithStorePool submits background work via KVStore.Go, which uses the store goroutine pool. The default pool implementation is tikv.NewSpool(...) backed by github.com/tiancaiamao/gp.
gp.Pool.Go does not provide backpressure: when it cannot immediately hand off the task to an idle worker (p.ch is unbuffered), it falls back to default: go worker(p, f) which can create additional goroutines.
When secondaries commit is slow, many tasks remain in-flight and each closure can keep *twoPhaseCommitter / *KVTxn / *memBufferMutations / *unionstore.MemDB reachable, contributing to memory growth.
5. Proposed fix
Use a bounded pool (with backpressure) for store background tasks, configured when creating TiKV store in TiDB (via tikv.WithPool).