Skip to content

Conversation

@TDemeco
Copy link
Contributor

@TDemeco TDemeco commented Dec 5, 2025

This PR fixes a bug where the is_in_bucket field of files in the indexer DB wasn't being populated correctly when creating new file records for a file that was already previously stored by the MSP.

We allow new storage requests for file keys that have already been fulfilled to allow the user to add redundancy to them (in the form of more BSPs), but since the MSP already has the file from the previous storage request, it can accept the new one with an inclusion proof instead of a non-inclusion proof. This makes it so the runtime doesn't update its bucket root (since it shouldn't, as the bucket already had the file from before), which means the MutationsApplied event is never emitted, and this is the event that the indexer detects to update the is_in_bucket field of a file record.
This made it so the new file records in the indexer DB created by these subsequent storage requests permanently had their is_in_bucket field set to false, and so this could create inconsistencies which could lead to the indexer trying to delete a file record that still has an MSP association, failing to import the block and stalling.

The fix is twofold:

  • First, correct the mistake in the current records found in the DB, by executing a migration that checks for file records that have both is_in_bucket set to false and a sibling record (i.e. another file record for the same file key) that has its is_in_bucket set to true (I believe it would have been enough to check the oldest sibling record, but just in case we check all of them), and setting the is_in_bucket field for those files to true.
  • Then the more permanent fix is that now when the indexer creates a new file record, instead of defaulting the is_in_bucket field of the new record to false, it checks if any sibling record has its is_in_bucket field set to true and if so, creates the new record with the field set to true as well.

Note

As a comment, I left a function to check MSP associations to file records in the indexer. This is not currently used, but I believe we should add that check before every deletion once we are up and running in an environment where the indexer stalling would end up being critical and we rather keep it working while we work on the fix of whatever caused the inconsistency. Fixing inconsistent information could prove hard though, so we might end up deciding not to use this function at all and maintain the indexer's current behaviour.

⚠️ Breaking Changes ⚠️

  • Short description

    There's a new migration for the indexer DB that must be executed for all existing DBs.

  • Who is affected

    • Indexer node runners since they'll have to run the new migration.
  • Suggested code changes

    No code changes required.

@TDemeco TDemeco requested a review from snowmead December 5, 2025 20:45
@TDemeco TDemeco added B5-clientnoteworthy Changes should be mentioned client-related release notes breaking Needs to be mentioned in breaking changes D4-nicetohaveaudit⚠️ PR contains trivial changes to logic that should be properly reviewed. indexer-db Changes include migrations for the Indexer DB labels Dec 5, 2025
@TDemeco TDemeco requested a review from HermanObst December 5, 2025 20:54
@HermanObst
Copy link
Contributor

Starting 👀

Comment on lines +3 to +4
-- If ANY file record with a given file_key has is_in_bucket=true, then ALL
-- records for that file_key should have is_in_bucket=true. This is because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only way this can happen is if the SR is already accepted by the MSP and the user send other one to increase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as that's the only way to have more than one file record for the same file key.

let block_hash_bytes = block_hash.as_bytes().to_vec();
let tx_hash_bytes = evm_tx_hash.map(|h| h.as_bytes().to_vec());

// Check if this file key is already present in the bucket of the MSP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when the user ask for more redundancy the MSP needs to "re-accept" it, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise the new storage request will expire without an MSP response, so it will be considered rejected.

Copy link
Contributor

@HermanObst HermanObst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
For tracking reasons, it would be cool to be bit more specific with the current bug.
e.g: exactly what was happening with the files with the flag set on false, and why these where triggering a failure in the block import and thus the stall.

Not for this PR, but it would be high priority that we add a test that replicates this specific case, that way we can't re introduce the bug later.
@santikaplan

@TDemeco
Copy link
Contributor Author

TDemeco commented Dec 5, 2025

LGTM. For tracking reasons, it would be cool to be bit more specific with the current bug. e.g: exactly what was happening with the files with the flag set on false, and why these where triggering a failure in the block import and thus the stall.

It's a bit convoluted and hard to explain in detail so it wouldn't add much context, but simplifying it a bit the issue we faced was:
The indexer checks after each deletion done by the fisherman whether a file record of that file has any remaining association (BSP or MSP) and tries to delete the file record from the DB if there are none. Since the is_in_bucket for a file was incorrectly set to false even though it had an MSP association, the indexer thought that it was safe to delete it from the DB, tried, but the DB constraint of not allowing a file record deletion if it has any remaining associations made it so the delete returned an error, to which the indexer reacted by erroring out from the block indexing function and retrying for the same block, indefinitely.

@TDemeco TDemeco merged commit 0a076dc into main Dec 5, 2025
43 checks passed
@TDemeco TDemeco deleted the fix/is-in-bucket-consistency branch December 5, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B5-clientnoteworthy Changes should be mentioned client-related release notes breaking Needs to be mentioned in breaking changes D4-nicetohaveaudit⚠️ PR contains trivial changes to logic that should be properly reviewed. indexer-db Changes include migrations for the Indexer DB

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants