Skip to content

Conversation

@wfxr
Copy link
Member

@wfxr wfxr commented Jan 29, 2026

What problem does this PR solve?

Issue Number: close #64198

Problem Summary:

When autocommit=0 and tidb_read_staleness is set, each SELECT statement recalculates its own read_ts instead of reusing the same snapshot. This causes inconsistent reads across multiple statements within what users expect to be a single transaction.

What changed and how does it work?

Changed EnterNewTxnWithReplaceProvider to EnterNewTxnWithBeginStmt when !IsAutocommit() in updateStateFromStaleReadProcessor(). This triggers activateStaleTxn() in StalenessTxnContextProvider.This ensures:

  • InTxn() becomes true after the first SELECT
  • Subsequent reads reuse the same snapshot (StartTS)
  • Write statements are rejected (existing behavior)
  • COMMIT/ROLLBACK ends the transaction normally

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix an issue that stale read with `tidb_read_staleness` does not provide consistent snapshot when `autocommit=0`, causing each SELECT to use a different `read_ts` instead of reusing the same snapshot within the transaction.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. sig/planner SIG: Planner labels Jan 29, 2026
@tiprow
Copy link

tiprow bot commented Jan 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 29, 2026
@wfxr wfxr force-pushed the stale-read/64198 branch 3 times, most recently from 2857898 to da39e55 Compare January 29, 2026 08:21
@wfxr
Copy link
Member Author

wfxr commented Jan 29, 2026

/test all

@wfxr
Copy link
Member Author

wfxr commented Jan 29, 2026

/retest

@wfxr wfxr force-pushed the stale-read/64198 branch from da39e55 to ea4473c Compare January 29, 2026 10:13
@wfxr
Copy link
Member Author

wfxr commented Jan 29, 2026

/test all

@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.4591%. Comparing base (060097f) to head (897af91).
⚠️ Report is 38 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65908        +/-   ##
================================================
- Coverage   77.7756%   77.4591%   -0.3166%     
================================================
  Files          2001       1923        -78     
  Lines        545524     538914      -6610     
================================================
- Hits         424285     417438      -6847     
- Misses       119577     121461      +1884     
+ Partials       1662         15      -1647     
Flag Coverage Δ
integration 41.4981% <57.1428%> (-6.6667%) ⬇️
unit 76.7096% <71.4285%> (+0.2969%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.4687% <ø> (-12.5171%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wfxr wfxr force-pushed the stale-read/64198 branch from ea4473c to a487a74 Compare January 29, 2026 12:46
@wfxr
Copy link
Member Author

wfxr commented Jan 29, 2026

/test all

@wfxr
Copy link
Member Author

wfxr commented Jan 29, 2026

/retest

1 similar comment
@wfxr
Copy link
Member Author

wfxr commented Jan 30, 2026

/retest

@wfxr wfxr marked this pull request as ready for review January 30, 2026 03:05
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2026
@wfxr
Copy link
Member Author

wfxr commented Jan 30, 2026

/retest

When `autocommit=0` and stale read is enabled, the first SELECT now
activates a proper stale-read transaction instead of just replacing
the provider. This ensures:

1. `InTxn()` becomes true after the first SELECT
2. Subsequent reads reuse the same snapshot (StartTS)
3. Write statements are rejected (existing behavior)
4. COMMIT/ROLLBACK ends the transaction normally
tk.MustExec("create table t (id int primary key, v int)")
tk.MustExec("insert into t values (1, 10)")

time.Sleep(2 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use mock/injected ts allocator or failpoint so the Sleep here could be avoided?

Sleeping in CI test case may slow down CI processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried rewriting this using a mock or failpoint to eliminate the Sleep, but neither approach worked out.

I also looked at other similar tests, and they seem also rely on a sleep to ensure stale read returns the expected data (please let me know if I missed a better pattern).

As a small improvement, we could shorten the sleep to 1.2s; it passes reliably in repeated runs on my machine. Would you be OK with this change?

require.Equal(t, uint64(99), count)
}

func TestStaleReadTxnWithAutocommitOff(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to also cover the plan cache path(so is the prepared statement) as key information of stale read is processed during planning statge.

Former critical issue encounterred #54652.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases for this.

tk.MustExec("create table t2 (id int)")
require.False(t, sessVars.InTxn(), "DDL should implicitly end the stale-read txn")
tk.MustExec("drop table t2")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should select as of timestamp be verified here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, using select as of timestamp together with autocommit = 0 yields uncertain results.

According to the Stale Read Product Design, SELECT ... AS OF TIMESTAMP

  • Can not be used together with other Stale Read features or variables.
    • If the variable tidb_read_staleness is set at the same time, it will be ignored and the tso in SELECT ... AS OF TIMESTAMP will take effect.
    • If the variable tidb_external_ts is set at the same time,
  • Only used in the implicit transaction with autocommit = 1.
    Statement level stale read can not be used in an explicit transaction. Otherwise, there may be different staleness in one transaction.

There is a separate pr #65960 to resolve this, making the behavior conform to the intended design.

@cfzjywxk cfzjywxk added the sig/transaction SIG:Transaction label Feb 2, 2026
enterType := sessiontxn.EnterNewTxnWithReplaceProvider
if !p.sctx.GetSessionVars().IsAutocommit() {
// start a stale-read transaction so that subsequent reads reuse the same snapshot.
enterType = sessiontxn.EnterNewTxnWithBeginStmt
Copy link
Collaborator

@lcwangchao lcwangchao Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnterNewTxnWithBeginStmt seems only be used by by begin / start tarnsaction only. How about still to use EnterNewTxnWithReplaceProvider and check the IsAutocommit in EnterNewTxnWithReplaceProvider:: enterNewStaleTxnWithReplaceProvider and if IsAutocommit returns true, you can call activateStaleTxn.

func (p *StalenessTxnContextProvider) OnInitialize(ctx context.Context, tp sessiontxn.EnterNewTxnType) error {
p.ctx = ctx
switch tp {
case sessiontxn.EnterNewTxnDefault, sessiontxn.EnterNewTxnWithBeginStmt:
return p.activateStaleTxn()
case sessiontxn.EnterNewTxnWithReplaceProvider:
return p.enterNewStaleTxnWithReplaceProvider()
default:
return errors.Errorf("Unsupported type: %v", tp)
}
}

func (p *StalenessTxnContextProvider) enterNewStaleTxnWithReplaceProvider() error {
if p.is == nil {
is, err := GetSessionSnapshotInfoSchema(p.sctx, p.ts)
if err != nil {
return err
}
p.is = is
}
txnCtx := p.sctx.GetSessionVars().TxnCtx
txnCtx.TxnScope = kv.GlobalTxnScope
txnCtx.IsStaleness = true
txnCtx.InfoSchema = p.is
return nil
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe we should call ActivateTxn and call SetInTxn(true) in GetStmtReadTS if autocommit=0 to keep the autocommit behavior consistency with normal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And seems there is bug (not introduced by this PR):

> start transaction read only as of timestamp now()- interval 1 second;
> select * from information_schema.tables limit 1; -- the transaction will be closed after this statement

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe we should call ActivateTxn and call SetInTxn(true) in GetStmtReadTS if autocommit=0 to keep the autocommit behavior consistency with normal.

👍 Good suggestion! This approach is indeed more elegant, and I've verified that it works.

I have one question: I noticed another method, GetSnapshotWithStmtReadTS, which is similar to GetStmtReadTS. However, it seems to be used only in PointGet / BatchPointGet during the execution phase and some tests. I assume we don't need to apply the same handling there, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select * from information_schema.tables limit 1;

@lcwangchao I can confirm the bug. Considering the current PR's scope, would you prefer me to open a new issue to document this?

@wfxr
Copy link
Member Author

wfxr commented Feb 3, 2026

/retest

@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign qiuyesuifeng for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wfxr
Copy link
Member Author

wfxr commented Feb 4, 2026

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner sig/transaction SIG:Transaction size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support activiating transaction for stale read used with autocommit = 0

3 participants