-
Notifications
You must be signed in to change notification settings - Fork 118
fix: speed up migration 19 #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe migration scripts were updated to create permanent tables ( Changes
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)
56-58
:COMMIT
inside aDO $$ … $$
block is illegal – the migration will fail at runtime
COMMIT
/ROLLBACK
are only allowed in top-level scripts andCREATE PROCEDURE
bodies (PostgreSQL ≥ 11).
ADO
block executes as an anonymous function and cannot manage transactions. When this script runs, PostgreSQL will raiseERROR: cannot commit while a subtransaction is active
, aborting the whole migration.Suggested fix – simplest: keep everything in one transaction and just drop the
commit;
:- perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size); - - commit; + perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size);If you really need batch commits, wrap the logic in a
CREATE PROCEDURE
+CALL
instead of aDO
block, or split the migration into multiple statements executed from psql/goose outside an explicit transaction.
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)
30-31
: Foreign-key adds validation overhead but brings no join-time benefitThe FK from
moves_view.transactions_seq
→transactions(seq)
:
- Forces an immediate full-table validation scan (could be large) and adds triggers on
moves_view
, yet- Provides zero planning benefit for the upcoming
UPDATE … JOIN
– the optimiser only needs the b-tree you already created (moves_view_idx
).Since
moves_view
is dropped a few lines later, the FK’s cost outweighs any integrity value.Consider dropping the FK altogether:
--- speed up hash join when updating rows later -alter table moves_view add foreign key(transactions_seq) references transactions(seq);If you want extra safety without the upfront cost, add the constraint
NOT VALID
thenVALIDATE CONSTRAINT
after the update completes.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor BugBot
- GitHub Check: Tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (2)
10-27
: Same optimisation as migration 27 – makemoves_view
UNLOGGEDThe reasoning and benefits are identical to those described for migration 27.
30-31
: Defer FK validation to avoid full-table scansSee detailed rationale in migration 27 review.
🧹 Nitpick comments (2)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)
10-27
: Consider makingmoves_view
UNLOGGED for faster bulk processing
CREATE TABLE moves_view AS …
materialises ~1 K rows per batch into a regular, WAL-logged table.
Because the table is dropped at the end of the block and never survives the migration, logging its changes is wasted I/O.- create table moves_view as + create unlogged table moves_view asUsing
UNLOGGED
speeds up the migration and still preserves crash-safety for the main data because the source tables are fully logged.
(If logical-replication support is required, keep it regular.)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (1)
10-13
: Copying all columns is unnecessary and bloats the working set
txs_view
is only used as a driver table forseq
. Copying the entiretransactions
row (SELECT *
) doubles storage and I/O.-create table txs_view as -select * -from transactions -where updated_at is null; +create unlogged table txs_view as +select seq +from transactions +where updated_at is null;Lean materialisation speeds up both the
CREATE TABLE
and the subsequentJOIN
.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
(2 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
(2 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (3)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:50-55
Timestamp: 2025-05-20T13:07:54.504Z
Learning: In the ledger codebase's `handleState` method, when updating ledger state from `StateInitializing` to `StateInUse`, it's intentional to proceed silently when `rowsAffected == 0`. This indicates another parallel transaction has already updated the ledger state and configured the sequences, so no error needs to be returned and no sequence updating is required.
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Learnt from: gfyrag
PR: formancehq/ledger#892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Tests
- GitHub Check: Dirty
- GitHub Check: Cursor BugBot
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
Outdated
Show resolved
Hide resolved
17ffbef
to
fa628d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)
90-91
:COMMIT
inside aDO $$ … $$
block will raisecannot COMMIT inside a function
DO
executes in a single implicit transaction; explicitCOMMIT
is prohibited (unless this is a procedure, which it isn’t). This statement will abort the migration at runtime.
Drop thecommit;
or convert the block into a properCREATE PROCEDURE … LANGUAGE plpgsql
andCALL
it.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql
(3 hunks)internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
(3 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
(3 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
(2 hunks)internal/storage/bucket/migrations/34-fix-memento-format/up.sql
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
- internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
- internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql
- internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor BugBot
- GitHub Check: Tests
🔇 Additional comments (1)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)
4-4
: 10 k batch-size is fine, but confirm RAM/lock impact on large ledgers
Jumping from 1 000 to 10 000 multiplies in-memory row materialisation and lock duration by ×10. On busy instances this can noticeably increase lock contention and shared-buffers churn. Please benchmark on a production-sized dataset to make sure the gain outweighs the extra pressure.
No description provided.