Skip to content

Commit 5f91990

Browse files
craig[bot]arulajmaniDrewKimballyuzefovich
committed
143751: kvclient: flush the write buffer if it gets too large r=arulajmani a=arulajmani This patch introduces a new cluster setting, kv.transaction.write_buffering.max_buffer_size, which dictates how large a transaction's write buffer can get before we decide to flush all buffered writes to KV. It defaults to 4MB, for now. Once a transaction's buffer is flushed, subsequent writes will no longer be buffered on the client. Instead, the transaction will write intents, as it would have in a pre-buffered writes world. I briefly considered other schemes where we didn't disable buffered writes completely once a transaction goes over budget -- either by only flushing the buffer partly or flushing the buffer in its entirety but allowing subsequent writes to be buffered as long as the transaction has budget. However, I decided against either of these, as many of the benefits of having buffered writes (e.g. 1PC) are no longer possible after the first flush. Moreover, other benefits (e.g. batching, cheaper read-your-own-writes) don't generalize either. For now, we do the simple thing. Resolves #139056 Release note: None 143820: plpgsql: add support for set-returning functions r=yuzefovich a=DrewKimball #### plpgsql/parser: add parser support for RETURN NEXT This commit adds support for `RETURN NEXT` statements to the PL/pgSQL parser. A following commit will add execution support. Informs #105240 Release note: None #### plpgsql/parser: add parser support for RETURN QUERY This commit adds support for `RETURN QUERY` statements to the PL/pgSQL parser. A following commit will add execution support. Informs #105240 Release note: None #### plpgsql: pass options struct to plpgsqlBuilder This commit moves the bool arguments used when constructing a `plpgsqlBuilder` instance to an options struct. This will simplify adding new options in the future. Informs #105240 Release note: None #### optbuilder: refactor routine output stmt finalization This commit is a mechanical refactor to the logic that finalizes a routine's result type and last body statement. This change will make it easier for PL/pgSQL `RETURN NEXT` and `RETURN QUERY` statements to perform their own validation. Informs #105240 Release note: None #### sql: add ability to redirect first statement to srf result set Similar to the existing method to direct the result of the first body statement of a routine to a cursor, this commit adds the ability to redirect to the result buffer of an SRF. This will be used in a later commit to implement PL/pgSQL `RETURN NEXT` and `RETURN QUERY` statements. This commit also adds an option to prevent a routine from adding the result of its *last* body statement to its result set - this will be used by set-returning PL/pgSQL functions that rely on sub-routines to fill in the result set during execution. Informs #105240 Release note: None #### plpgsql: add support for set-returning functions This commit adds support for set-returning PL/pgSQL functions. Unlike SQL SRFs, PL/pgSQL SRFs add to the result set at arbitrary points during execution using `RETURN NEXT` (scalar) and `RETURN QUERY` (relational) statements. We support this feature by allocating a `RoutineResultBufferID` that allows the sub-routines that implement the PL/pgSQL routine body to access the SRF's result buffer directly during execution. Fixes #105240 Release note (sql change): Set-returning PL/pgSQL functions are now supported. A PL/pgSQL SRF can be created by declaring the return type as `SETOF <type>` or `TABLE`. 143966: row: harden inconsistent scan machinery r=yuzefovich a=yuzefovich Table statistics collection uses inconsistent scan machinery which means that we paginate over the table via separate transactions with constantly advancing read timestamps. In particular, once we determine that the current read timestamp is at least 5 minutes old (controlled via `sql.stats.max_timestamp_age` cluster setting), we commit the current txn and open a new one in which we advance the timestamp by the amount of time that has passed since the start of the previous one (i.e. by the duration of the previous txn). This process requires an "initial timestamp" for the very first txn that we use. That value comes from evaluating AS OF SYSTEM TIME clause of CREATE STATISTICS stmt which happens when the job record is created. Previously, if there was a long delay between the job record creation and the job being actually executed, we would return "cannot specify timestamp older than ..." error before creating the first txn. It's unclear to me why this check was put in place since it's not really necessary - we could simply remove it, which would make the very first txn of the inconsistent scan to be committed right away, and things would proceed easily. This commit fixes this problem (by removing the check) but also improves things a bit further by explicitly advancing the "initial timestamp" before the first txn is open. If the initial timestamp is too old, it is advanced to make its age to be 1/10 of the max timestamp age. An example of the txn cycle: ``` current time: 100 initial timestamp: 50 max timestamp age: 10 time 100: advance initial timestamp to 99 100: start scan, timestamp=99 100-109: continue scanning at timestamp=99 109: bump timestamp to 108 109-118: continue scanning at timestamp=108 118: bump timestamp to 117 118-127: continue scanning at timestamp=117 ``` Fixes: #100304. Release note (bug fix): CockroachDB could previously encounter "cannot specify timestamp older than ..." error during the table statistics collection in some cases (like when the cluster is overloaded), and this is now fixed. The bug has been present since 19.1 version. Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Drew Kimball <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
4 parents e9db41f + 202d7ad + 06a1863 + 3e9e83d commit 5f91990

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+2334
-521
lines changed

docs/generated/settings/settings-for-tenants.txt

+1
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ kv.transaction.max_refresh_spans_bytes integer 4194304 maximum number of bytes u
9898
kv.transaction.randomized_anchor_key.enabled boolean false dictates whether a transactions anchor key is randomized or not application
9999
kv.transaction.reject_over_max_intents_budget.enabled boolean false if set, transactions that exceed their lock tracking budget (kv.transaction.max_intents_bytes) are rejected instead of having their lock spans imprecisely compressed application
100100
kv.transaction.write_buffering.enabled boolean false if enabled, transactional writes are buffered on the client application
101+
kv.transaction.write_buffering.max_buffer_size integer 4194304 if non-zero, defines that maximum size of the buffer that will be used to buffer transactional writes per-transaction application
101102
kv.transaction.write_pipelining.locking_reads.enabled boolean true if enabled, transactional locking reads are pipelined through Raft consensus application
102103
kv.transaction.write_pipelining.ranged_writes.enabled boolean true if enabled, transactional ranged writes are pipelined through Raft consensus application
103104
kv.transaction.write_pipelining.enabled (alias: kv.transaction.write_pipelining_enabled) boolean true if enabled, transactional writes are pipelined through Raft consensus application

docs/generated/settings/settings.html

+1
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@
127127
<tr><td><div id="setting-kv-transaction-randomized-anchor-key-enabled" class="anchored"><code>kv.transaction.randomized_anchor_key.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>dictates whether a transactions anchor key is randomized or not</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
128128
<tr><td><div id="setting-kv-transaction-reject-over-max-intents-budget-enabled" class="anchored"><code>kv.transaction.reject_over_max_intents_budget.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>if set, transactions that exceed their lock tracking budget (kv.transaction.max_intents_bytes) are rejected instead of having their lock spans imprecisely compressed</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
129129
<tr><td><div id="setting-kv-transaction-write-buffering-enabled" class="anchored"><code>kv.transaction.write_buffering.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>if enabled, transactional writes are buffered on the client</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
130+
<tr><td><div id="setting-kv-transaction-write-buffering-max-buffer-size" class="anchored"><code>kv.transaction.write_buffering.max_buffer_size</code></div></td><td>integer</td><td><code>4194304</code></td><td>if non-zero, defines that maximum size of the buffer that will be used to buffer transactional writes per-transaction</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
130131
<tr><td><div id="setting-kv-transaction-write-pipelining-locking-reads-enabled" class="anchored"><code>kv.transaction.write_pipelining.locking_reads.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if enabled, transactional locking reads are pipelined through Raft consensus</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
131132
<tr><td><div id="setting-kv-transaction-write-pipelining-ranged-writes-enabled" class="anchored"><code>kv.transaction.write_pipelining.ranged_writes.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if enabled, transactional ranged writes are pipelined through Raft consensus</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
132133
<tr><td><div id="setting-kv-transaction-write-pipelining-enabled" class="anchored"><code>kv.transaction.write_pipelining.enabled<br />(alias: kv.transaction.write_pipelining_enabled)</code></div></td><td>boolean</td><td><code>true</code></td><td>if enabled, transactional writes are pipelined through Raft consensus</td><td>Serverless/Dedicated/Self-Hosted</td></tr>

0 commit comments

Comments
 (0)