planner: add slow query QPS metric and show it in TiDB QPS panel by qw4990 · Pull Request #67249 · pingcap/tidb

qw4990 · 2026-03-24T02:24:08Z

What problem does this PR solve?

Issue Number: close #67247

Problem Summary: planner: add slow query QPS metric and show it in TiDB QPS panel

What changed and how does it work?

Add a new TiDB server counter metric: tidb_server_slow_query_total (label: sql_type).
Increment this counter when a statement is logged as a slow query (both general and internal SQL paths).
Expose slow query QPS in METRICS_SCHEMA via tidb_slow_query_qps.
Update Grafana pkg/metrics/grafana/tidb.json QPS panel to include a SlowQuery series:
- sum(rate(tidb_server_slow_query_total{...}[1m]))

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

New Features
- Added slow-query metrics with separate counters for general and internal slow queries.
- Added a slow query QPS metric and updated Grafana dashboards to show SlowQuery series.
Tests
- Added test coverage asserting slow-query counter increments alongside existing slow-log assertions.

pantheon-ai · 2026-03-24T02:24:14Z

Review Complete

Findings: 1 issues
Posted: 1
Duplicates/Skipped: 1

_{ℹ️ Learn more details on Pantheon AI.}

tiprow · 2026-03-24T02:24:26Z

Hi @qw4990. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-03-24T02:24:31Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 245d8f1f-c0f7-4e3b-9c56-003c0928e6cd

📥 Commits

Reviewing files that changed from the base of the PR and between 64235bb and 84bd6a6.

📒 Files selected for processing (2)

pkg/metrics/nextgengrafana/tidb_with_keyspace_name.json
pkg/metrics/nextgengrafana/tidb_worker.json

📝 Walkthrough

Walkthrough

Adds a slow-query counter metric and wires it through server metrics, executor increments (internal vs. general), metric table definition, Grafana panels, and tests to expose slow-query QPS.

Changes

Cohort / File(s)	Summary
Server metric declaration & registration `pkg/metrics/server.go`, `pkg/metrics/metrics.go`	Introduce `SlowQueryCounter` (`*prometheus.CounterVec`) and register it in global metrics registration.
Executor metric binding & increments `pkg/executor/adapter.go`, `pkg/executor/metrics/metrics.go`	Add executor-level counters `SlowQueryCounterGeneral` and `SlowQueryCounterInternal`; increment the appropriate counter in `ExecStmt.LogSlowQuery`.
Executor tests `pkg/executor/adapter_test.go`	Extend `TestWriteSlowLog` to assert slow-query counter values before/after execution (expect 0.0 or 1.0).
Metric table & Grafana panels `pkg/infoschema/metric_table_def.go`, `pkg/metrics/grafana/tidb.json`, `pkg/metrics/nextgengrafana/tidb_with_keyspace_name.json`, `pkg/metrics/nextgengrafana/tidb_worker.json`	Add `tidb_slow_query_qps` metric definition and add Grafana query targets that surface `tidb_server_slow_query_total` rate (legend "SlowQuery").

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

*: add tidb_slow_log_max_per_sec variable to control the number of slow logs written per second (#63996) #66709: Also modifies ExecStmt.LogSlowQuery (adds slow-log rate-limiting); directly touches the same method.
*: parse and match slow log trigger rules for multi-dimensional triggering (#63132) #66582: Modifies LogSlowQuery in pkg/executor/adapter.go (changes to ExecDetail passing / retry-count typing); code-level overlap.
ingestor, metrics: add nextgen write/ingest API latency metrics #66943: Adds/registers Prometheus metrics in the metrics registration path; related to metric registration changes.

Suggested labels

size/L

Suggested reviewers

yibin87
XuHuaiyu
nolouch

Poem

🐰
I hopped through code with counters bright,
Counting slow queries in day and night,
Internal, general — each gets a cheer,
Grafana paints what I hold dear,
Tiny paws, big metrics delight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a slow query QPS metric and integrating it into the TiDB QPS Grafana panel.
Description check	✅ Passed	The PR description includes the required Issue Number (close `#67247`), problem summary, detailed explanation of what changed, and a completed checklist with unit test marked as done.
Linked Issues check	✅ Passed	The code changes fully implement the requirements from `#67247`: adding a Prometheus counter metric for slow queries, incrementing it on slow-query paths, exposing it in METRICS_SCHEMA, and integrating it into the Grafana dashboard.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing the slow-query QPS metric feature; no out-of-scope modifications were introduced.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

hawkingrei · 2026-03-24T02:25:14Z

/ok-to-test

coderabbitai

🧹 Nitpick comments (1)

pkg/executor/adapter_test.go (1)

417-433: Extend this test to cover the internal SQL counter path too.

The new assertions validate only sql_type=general. Since production code now increments both general and internal counters, please add an internal-SQL assertion in this test (or companion test) to cover the restricted branch as well.

✅ Minimal extension sketch

- readSlowQueryCounter := func() float64 {
-   counter := metrics.SlowQueryCounter.WithLabelValues(metrics.LblGeneral)
+ readSlowQueryCounter := func(sqlType string) float64 {
+   counter := metrics.SlowQueryCounter.WithLabelValues(sqlType)
    pb := &dto.Metric{}
    require.NoError(t, counter.Write(pb))
    return pb.GetCounter().GetValue()
  }

+ // existing general-path checks
- before := readSlowQueryCounter()
+ before := readSlowQueryCounter(metrics.LblGeneral)
  tk.MustExec(sql)
- after := readSlowQueryCounter()
+ after := readSlowQueryCounter(metrics.LblGeneral)

+ // add internal-path check (example)
+ beforeInternal := readSlowQueryCounter(metrics.LblInternal)
+ rs, err := tk.Session().ExecuteInternal(context.Background(), sql)
+ require.NoError(t, err)
+ _, err = session.ResultSetToStringSlice(context.Background(), tk.Session(), rs)
+ require.NoError(t, err)
+ afterInternal := readSlowQueryCounter(metrics.LblInternal)
+ require.Equal(t, 1.0, afterInternal-beforeInternal)

Based on learnings: "For SQL behavior changes in executor, perform targeted unit test plus relevant integration test."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/adapter_test.go` around lines 417 - 433, The test currently only
reads the general slow-query metric; extend it to also assert the internal-SQL
counter path by adding a reader for the internal label (e.g., add a function
similar to readSlowQueryCounter that accepts or uses metrics.LblInternal and
calls metrics.SlowQueryCounter.WithLabelValues(metrics.LblInternal)), then
update or add assertions in checkWriteSlowLog (or create
checkWriteSlowLogInternal) to capture the before/after delta for the internal
counter and validate expected increments when expectWrite is true and no change
when false; reference readSlowQueryCounter, checkWriteSlowLog,
metrics.SlowQueryCounter, metrics.LblInternal to find where to add the new
reader and assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/executor/adapter_test.go`:
- Around line 417-433: The test currently only reads the general slow-query
metric; extend it to also assert the internal-SQL counter path by adding a
reader for the internal label (e.g., add a function similar to
readSlowQueryCounter that accepts or uses metrics.LblInternal and calls
metrics.SlowQueryCounter.WithLabelValues(metrics.LblInternal)), then update or
add assertions in checkWriteSlowLog (or create checkWriteSlowLogInternal) to
capture the before/after delta for the internal counter and validate expected
increments when expectWrite is true and no change when false; reference
readSlowQueryCounter, checkWriteSlowLog, metrics.SlowQueryCounter,
metrics.LblInternal to find where to add the new reader and assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cc8ae555-a8f3-4b25-bf5d-7da4b44bfffb

📥 Commits

Reviewing files that changed from the base of the PR and between dffbd3e and 64235bb.

📒 Files selected for processing (7)

pkg/executor/adapter.go
pkg/executor/adapter_test.go
pkg/executor/metrics/metrics.go
pkg/infoschema/metric_table_def.go
pkg/metrics/grafana/tidb.json
pkg/metrics/metrics.go
pkg/metrics/server.go

codecov · 2026-03-24T02:48:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.2846%. Comparing base (8c83d3f) to head (84bd6a6).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67249        +/-   ##
================================================
+ Coverage   77.7724%   79.2846%   +1.5122%     
================================================
  Files          2022       1968        -54     
  Lines        554420     545905      -8515     
================================================
+ Hits         431186     432819      +1633     
+ Misses       121492     111753      -9739     
+ Partials       1742       1333       -409

Flag	Coverage Δ
integration	`46.2751% <100.0000%> (-1.8513%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`61.5065% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`62.2266% <ø> (+1.3639%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ti-chi-bot · 2026-03-24T03:14:17Z

[LGTM Timeline notifier]

Timeline:

2026-03-24 03:10:53.860741118 +0000 UTC m=+237849.896811378: ☑️ agreed by AilinKid.
2026-03-24 03:14:16.850776968 +0000 UTC m=+238052.886847238: ☑️ agreed by guo-shaoge.

pkg/metrics/grafana/tidb.json

ti-chi-bot · 2026-03-24T03:52:33Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AilinKid, GMHDBJD, guo-shaoge
Once this PR has been reviewed and has the lgtm label, please assign zimulala for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [AilinKid,GMHDBJD,guo-shaoge]
~~pkg/infoschema/OWNERS~~ [GMHDBJD]
pkg/metrics/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

qw4990 · 2026-03-24T03:52:34Z

/retest

zimulala · 2026-03-24T03:59:10Z

pkg/metrics/grafana/tidb.json

              "refId": "B"
+            },
+            {
+              "expr": "sum(rate(tidb_server_slow_query_total{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", instance=~\"$instance\"}[1m]))",


Do we need to add it to tidb_with_keyspace_name.json?

tiprow · 2026-03-24T06:21:50Z

@qw4990: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
tidb_parser_test	`84bd6a6`	link	true	`/test tidb_parser_test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

qw4990 added 2 commits March 24, 2026 09:35

fixup

be70244

fixup

64235bb

ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Mar 24, 2026

ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 24, 2026

ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Mar 24, 2026

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

AilinKid approved these changes Mar 24, 2026

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 24, 2026

guo-shaoge approved these changes Mar 24, 2026

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 24, 2026

pantheon-ai bot reviewed Mar 24, 2026

View reviewed changes

pkg/metrics/grafana/tidb.json Show resolved Hide resolved

GMHDBJD approved these changes Mar 24, 2026

View reviewed changes

zimulala reviewed Mar 24, 2026

View reviewed changes

metrics: add SlowQuery series to nextgen QPS panels

84bd6a6

Conversation

qw4990 commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiprow bot commented Mar 24, 2026

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

hawkingrei commented Mar 24, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ti-chi-bot bot commented Mar 24, 2026

[LGTM Timeline notifier]

Uh oh!

Uh oh!

ti-chi-bot bot commented Mar 24, 2026

Uh oh!

qw4990 commented Mar 24, 2026

Uh oh!

zimulala Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

tiprow bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

qw4990 commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

pantheon-ai bot commented Mar 24, 2026 •

edited

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading