-
Notifications
You must be signed in to change notification settings - Fork 118
feat: optimize migrations #1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/v2.3
Are you sure you want to change the base?
Conversation
WalkthroughThis change updates several migration scripts to create persistent tables ( Changes
Sequence Diagram(s)sequenceDiagram
participant MigrationScript
participant Database
MigrationScript->>Database: CREATE TABLE moves_view / txs_view AS ...
MigrationScript->>Database: CREATE INDEX on transactions_seq / seq
MigrationScript->>Database: ALTER TABLE ADD FOREIGN KEY (transactions_seq / seq) REFERENCES transactions(seq)
MigrationScript->>Database: LOOP batch update
alt No rows found
MigrationScript->>Database: DROP TABLE moves_view / txs_view
MigrationScript->>Database: EXIT loop
end
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
b8e5ffb
to
54cab5c
Compare
54cab5c
to
fa52187
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)
30-31
: Same FK concern as migration 28 – heavy lock, short-lived tableSee the comment on migration 28: the FK gives negligible value but can lock
transactions
. Prefer adding an index or usingNOT VALID
.internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)
30-31
: Same FK concern as migration 28 – heavy lock, short-lived tableSee the comment on migration 28: the FK gives negligible value but can lock
transactions
. Prefer adding an index or usingNOT VALID
.
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (1)
30-31
: Foreign key brings little benefit but may lock & scan – consider a lightweight alternative
ALTER TABLE … ADD FOREIGN KEY
forces an immediate validation scan and can take anACCESS EXCLUSIVE
lock ontransactions
, which is exactly the table being batch-updated just below.
Given thatmoves_view
lives only for the duration of the migration and is dropped at the end, the referential-integrity guarantee is not valuable here. An index ontransactions_seq
is already present and is sufficient for a fast hash join.Suggested lighter approach:
--- speed up hash join when updating rows later -alter table moves_view add foreign key(transactions_seq) references transactions(seq); +-- keep the optimisation but avoid the heavyweight FK validation/lock +-- create the same b-tree index Postgres would create for the FK +create index moves_view_tx_seq_idx on moves_view(transactions_seq);If you still want the FK for documentation, add
NOT VALID
to skip validation and lock, then (optionally)VALIDATE CONSTRAINT
after the update.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql
(2 hunks)internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
(2 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
(2 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
(2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (3)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:50-55
Timestamp: 2025-05-20T13:07:54.504Z
Learning: In the ledger codebase's `handleState` method, when updating ledger state from `StateInitializing` to `StateInUse`, it's intentional to proceed silently when `rowsAffected == 0`. This indicates another parallel transaction has already updated the ledger state and configured the sequences, so no error needs to be returned and no sequence updating is required.
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Learnt from: gfyrag
PR: formancehq/ledger#892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Tests
- GitHub Check: Dirty
- GitHub Check: Cursor BugBot
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
Outdated
Show resolved
Hide resolved
create table txs_view as | ||
select * | ||
from transactions | ||
where updated_at is null; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Selecting *
inflates the staging table – project only needed columns
create table txs_view as select * from transactions where updated_at is null;
txs_view
is used solely to obtain seq
and inserted_at
. Copying every column (including large JSONB blobs) can blow up disk / temp-file usage and I/O for large ledgers.
-create table txs_view as
-select *
-from transactions
-where updated_at is null;
+create table txs_view as
+select seq, inserted_at
+from transactions
+where updated_at is null;
Smaller table ⇒ less I/O, faster scan & hash join.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
create table txs_view as | |
select * | |
from transactions | |
where updated_at is null; | |
create table txs_view as | |
select seq, inserted_at | |
from transactions | |
where updated_at is null; |
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
around lines 10 to 14, the create table statement uses select * which copies all
columns including large JSONB blobs, causing unnecessary disk and I/O usage.
Modify the select statement to include only the needed columns seq and
inserted_at to reduce table size and improve performance.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## release/v2.3 #1019 +/- ##
================================================
- Coverage 80.55% 80.52% -0.03%
================================================
Files 186 186
Lines 10199 10199
================================================
- Hits 8216 8213 -3
- Misses 1563 1565 +2
- Partials 420 421 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (3)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (3)
14-24
: Range pagination +_offset += _batch_size
can skip gapsThe loop relies on
seq >= _offset AND seq < _offset + _batch_size
.
When a hole ≥_batch_size
exists (e.g.seq=1-5000
thenseq=20000
), the iteration for10000 ≤ seq<20000
updates 0 rows;FOUND
becomes false and the loop exits, leavingseq=20000…
untouched.A safer approach is to keep the upper bound but advance
_offset
to the MIN unprocessed seq instead of blindly adding_batch_size
, e.g.:- exit when not found; - _offset = _offset + _batch_size; + if NOT FOUND then + SELECT min(seq) INTO _offset FROM logs WHERE seq > _offset; + EXIT WHEN _offset IS NULL; -- nothing left + CONTINUE; + end if; + _offset = _offset + _batch_size;This prevents silent data loss while preserving the no-OFFSET scan pattern.
84-88
:exit when not found;
prematurely terminates the loopBecause of the gap issue above, hitting a 0-row UPDATE does not necessarily mean the table is fully processed. Replace the unconditional
EXIT
with logic that confirms no remaining rows (select 1 from logs where seq >= _offset LIMIT 1
).
88-90
:COMMIT
inside aDO
block is illegal in PostgreSQL functions
DO $$ … $$
runs inside an implicit transaction; issuingCOMMIT
triggersERROR: cannot COMMIT while a subtransaction is active
.
If you need autonomous commits, convert the block to aCREATE PROCEDURE … LANGUAGE plpgsql
and invoke it withCALL
, or run the loop at the script level outside a function.
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)
4-4
: Batch size ×10 – double-check RAM/lock impactJumping from 1 000 to 10 000 rows per chunk multiplies the in-memory JSON re-encoding workload and the number of rows locked per UPDATE. On large ledgers this can hold the
logs
PK lock for noticeably longer and bloat the WAL.If the original hotspot was the OFFSET scan, consider a smaller step first (e.g. 2 000–5 000) and benchmark before settling on 10 000.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql
(3 hunks)internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
(3 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
(3 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
(2 hunks)internal/storage/bucket/migrations/34-fix-memento-format/up.sql
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
- internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
- internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
- internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Cursor BugBot
- GitHub Check: Tests
- GitHub Check: Dirty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Foreign Key Constraint Violation During Migration
The migration introduces a foreign key constraint on moves_view.transactions_seq
referencing transactions.seq
. This can cause the migration to fail if moves_view
contains transactions_seq
values that do not exist in the transactions
table, leading to runtime errors.
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql#L30-L31
-- speed up hash join when updating rows later | |
alter table moves_view add foreign key(transactions_seq) references transactions(seq); |
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql#L30-L31
ledger/internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
Lines 30 to 31 in 160d737
-- speed up hash join when updating rows later | |
alter table moves_view add foreign key(transactions_seq) references transactions(seq); |
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql#L30-L31
-- speed up hash join when updating rows later | |
alter table moves_view add foreign key(transactions_seq) references transactions(seq); |
Was this report helpful? Give feedback by reacting with 👍 or 👎
No description provided.