Skip to content

feat: optimize migrations #1019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: release/v2.3
Choose a base branch
from

Conversation

gfyrag
Copy link
Collaborator

@gfyrag gfyrag commented Jul 19, 2025

No description provided.

@gfyrag gfyrag requested a review from a team as a code owner July 19, 2025 13:28
Copy link

coderabbitai bot commented Jul 19, 2025

Walkthrough

This change updates several migration scripts to create persistent tables (moves_view and txs_view) instead of temporary ones, removes explicit numeric casting on the transactions_seq column, adds foreign key constraints referencing transactions(seq) to optimize hash join performance, and modifies control flow to explicitly drop these tables when empty during batch updates. Additionally, one migration increases batch size and changes pagination logic for updating the logs table.

Changes

File(s) Change Summary
internal/storage/bucket/migrations/19-, 27-, 28-fix-*-pcv/up.sql Changed moves_view from temporary to permanent, removed numeric cast on transactions_seq, added foreign key constraint, and refined control flow to drop table if empty.
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql Changed txs_view to permanent, added foreign key on seq, simplified update join condition, and refined control flow to drop table if empty.
internal/storage/bucket/migrations/34-fix-memento-format/up.sql Increased batch size for logs update from 1000 to 10000 and changed pagination from OFFSET/LIMIT to filtering by seq range.

Sequence Diagram(s)

sequenceDiagram
    participant MigrationScript
    participant Database

    MigrationScript->>Database: CREATE TABLE moves_view / txs_view AS ...
    MigrationScript->>Database: CREATE INDEX on transactions_seq / seq
    MigrationScript->>Database: ALTER TABLE ADD FOREIGN KEY (transactions_seq / seq) REFERENCES transactions(seq)
    MigrationScript->>Database: LOOP batch update
    alt No rows found
        MigrationScript->>Database: DROP TABLE moves_view / txs_view
        MigrationScript->>Database: EXIT loop
    end
Loading

Possibly related PRs

  • fix: invalid pcv #831: Modifies the same migration scripts related to moves_view table creation and constraints, sharing code-level changes on table persistence and foreign key additions.

Suggested reviewers

  • paul-nicolas

Poem

In the warren where data hops and flows,
Tables once fleeting now firmly repose.
Foreign keys link, like tunnels below,
For hash joins swift and queries that glow.
With every migration, the schema grows—
A rabbit’s delight as the database shows! 🐇


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4e350d and 160d737.

📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (2 hunks)
  • internal/storage/bucket/migrations/34-fix-memento-format/up.sql (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/storage/bucket/migrations/34-fix-memento-format/up.sql
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor BugBot
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hotfix/v2.3/optimize-migrations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gfyrag gfyrag force-pushed the hotfix/v2.3/optimize-migrations branch from b8e5ffb to 54cab5c Compare July 19, 2025 13:33
@gfyrag gfyrag closed this Jul 19, 2025
@gfyrag gfyrag reopened this Jul 19, 2025
@gfyrag gfyrag force-pushed the hotfix/v2.3/optimize-migrations branch from 54cab5c to fa52187 Compare July 19, 2025 13:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)

30-31: Same FK concern as migration 28 – heavy lock, short-lived table

See the comment on migration 28: the FK gives negligible value but can lock transactions. Prefer adding an index or using NOT VALID.

internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)

30-31: Same FK concern as migration 28 – heavy lock, short-lived table

See the comment on migration 28: the FK gives negligible value but can lock transactions. Prefer adding an index or using NOT VALID.

🧹 Nitpick comments (1)
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (1)

30-31: Foreign key brings little benefit but may lock & scan – consider a lightweight alternative

ALTER TABLE … ADD FOREIGN KEY forces an immediate validation scan and can take an ACCESS EXCLUSIVE lock on transactions, which is exactly the table being batch-updated just below.
Given that moves_view lives only for the duration of the migration and is dropped at the end, the referential-integrity guarantee is not valuable here. An index on transactions_seq is already present and is sufficient for a fast hash join.

Suggested lighter approach:

--- speed up hash join when updating rows later
-alter table moves_view add foreign key(transactions_seq) references transactions(seq);
+-- keep the optimisation but avoid the heavyweight FK validation/lock
+-- create the same b-tree index Postgres would create for the FK
+create index moves_view_tx_seq_idx on moves_view(transactions_seq);

If you still want the FK for documentation, add NOT VALID to skip validation and lock, then (optionally) VALIDATE CONSTRAINT after the update.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 460f186 and fa52187.

📒 Files selected for processing (4)
  • internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (2 hunks)
  • internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (2 hunks)
  • internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (2 hunks)
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (3)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:50-55
Timestamp: 2025-05-20T13:07:54.504Z
Learning: In the ledger codebase's `handleState` method, when updating ledger state from `StateInitializing` to `StateInUse`, it's intentional to proceed silently when `rowsAffected == 0`. This indicates another parallel transaction has already updated the ledger state and configured the sequences, so no error needs to be returned and no sequence updating is required.
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Learnt from: gfyrag
PR: formancehq/ledger#892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests
  • GitHub Check: Dirty
  • GitHub Check: Cursor BugBot

Comment on lines +10 to 14
create table txs_view as
select *
from transactions
where updated_at is null;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Selecting * inflates the staging table – project only needed columns

create table txs_view as select * from transactions where updated_at is null;

txs_view is used solely to obtain seq and inserted_at. Copying every column (including large JSONB blobs) can blow up disk / temp-file usage and I/O for large ledgers.

-create table txs_view as
-select *
-from transactions
-where updated_at is null;
+create table txs_view as
+select seq, inserted_at
+from transactions
+where updated_at is null;

Smaller table ⇒ less I/O, faster scan & hash join.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
create table txs_view as
select *
from transactions
where updated_at is null;
create table txs_view as
select seq, inserted_at
from transactions
where updated_at is null;
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
around lines 10 to 14, the create table statement uses select * which copies all
columns including large JSONB blobs, causing unnecessary disk and I/O usage.
Modify the select statement to include only the needed columns seq and
inserted_at to reduce table size and improve performance.

cursor[bot]

This comment was marked as outdated.

Copy link

codecov bot commented Jul 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.52%. Comparing base (460f186) to head (160d737).

Additional details and impacted files
@@               Coverage Diff                @@
##           release/v2.3    #1019      +/-   ##
================================================
- Coverage         80.55%   80.52%   -0.03%     
================================================
  Files               186      186              
  Lines             10199    10199              
================================================
- Hits               8216     8213       -3     
- Misses             1563     1565       +2     
- Partials            420      421       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cursor[bot]

This comment was marked as outdated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (3)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (3)

14-24: Range pagination + _offset += _batch_size can skip gaps

The loop relies on seq >= _offset AND seq < _offset + _batch_size.
When a hole ≥ _batch_size exists (e.g. seq=1-5000 then seq=20000), the iteration for 10000 ≤ seq<20000 updates 0 rows; FOUND becomes false and the loop exits, leaving seq=20000… untouched.

A safer approach is to keep the upper bound but advance _offset to the MIN unprocessed seq instead of blindly adding _batch_size, e.g.:

- exit when not found;
- _offset = _offset + _batch_size;
+ if NOT FOUND then
+   SELECT min(seq) INTO _offset FROM logs WHERE seq > _offset;
+   EXIT WHEN _offset IS NULL;     -- nothing left
+   CONTINUE;
+ end if;
+ _offset = _offset + _batch_size;

This prevents silent data loss while preserving the no-OFFSET scan pattern.


84-88: exit when not found; prematurely terminates the loop

Because of the gap issue above, hitting a 0-row UPDATE does not necessarily mean the table is fully processed. Replace the unconditional EXIT with logic that confirms no remaining rows (select 1 from logs where seq >= _offset LIMIT 1).


88-90: COMMIT inside a DO block is illegal in PostgreSQL functions

DO $$ … $$ runs inside an implicit transaction; issuing COMMIT triggers ERROR: cannot COMMIT while a subtransaction is active.
If you need autonomous commits, convert the block to a CREATE PROCEDURE … LANGUAGE plpgsql and invoke it with CALL, or run the loop at the script level outside a function.

🧹 Nitpick comments (1)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)

4-4: Batch size ×10 – double-check RAM/lock impact

Jumping from 1 000 to 10 000 rows per chunk multiplies the in-memory JSON re-encoding workload and the number of rows locked per UPDATE. On large ledgers this can hold the logs PK lock for noticeably longer and bloat the WAL.

If the original hotspot was the OFFSET scan, consider a smaller step first (e.g. 2 000–5 000) and benchmark before settling on 10 000.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa52187 and b4e350d.

📒 Files selected for processing (5)
  • internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (3 hunks)
  • internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (3 hunks)
  • internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (3 hunks)
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (2 hunks)
  • internal/storage/bucket/migrations/34-fix-memento-format/up.sql (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
  • internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
  • internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)
Learnt from: gfyrag
PR: formancehq/ledger#935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Cursor BugBot
  • GitHub Check: Tests
  • GitHub Check: Dirty

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Foreign Key Constraint Violation During Migration

The migration introduces a foreign key constraint on moves_view.transactions_seq referencing transactions.seq. This can cause the migration to fail if moves_view contains transactions_seq values that do not exist in the transactions table, leading to runtime errors.

internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql#L30-L31

-- speed up hash join when updating rows later
alter table moves_view add foreign key(transactions_seq) references transactions(seq);

internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql#L30-L31

-- speed up hash join when updating rows later
alter table moves_view add foreign key(transactions_seq) references transactions(seq);

internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql#L30-L31

-- speed up hash join when updating rows later
alter table moves_view add foreign key(transactions_seq) references transactions(seq);

Fix in CursorFix in Web


Was this report helpful? Give feedback by reacting with 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants