-
Notifications
You must be signed in to change notification settings - Fork 11
feat(msp): MSP retries responding storage requests #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
snowmead
wants to merge
58
commits into
main
Choose a base branch
from
feat/msp-batch-respond-storage-requests
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Refactored the storage request submission process in `move-bucket.test.ts` to utilize a batch processing helper
…Postgres DB (#563) * fix: 🩹 Remove and update old comments * feat: 🚧 Add `blockchain-service-db` crate and postgres schema * feat: 🚧 Wire pending tx DB updating to `send_extrinsic` * feat: 🚧 Add CLI param for db URL, initialise DB on BS startup and update status with watcher * fix: 🐛 Use i64 for nonce * feat: ✨ Add pending tx postgres to integration test suite * fix: 🐛 Wire CLI pending db param to blockchain service initialisation * test: ✅ Fix passing CLI pending db param in test suites * feat: ✨ Clear pending txs from DB in finality * docs: 📝 Document functions in `store.rs` for pending DB * feat: ✨ Log when a pending tx has a state update but for a different tx hash * fix: 🐛 Initialise Blockchain Service last processed blocks with genesis * test: ✅ Fix tests using old indexer db container name * test: ✅ Add back backend container initialisation * fix: 🐛 Remove duplicate indexer nodes * test: ✅ Fix name change mistakenly * test: ✨ Add new pending DB testing utilities * fix: 🗑️ Remove deprecated `createApiObject` * test: ✅ Add persistent pending Tx DB integration tests * feat: 🚧 Add `load_account_with_states` query to enable re-subscription at startup * feat: ✨ Re-watch transactions pending after restarting MSP * test: ✅ Add test for not re-watching extrinsic with nonce below on-chain nonce * feat: ✨ Add `watched` boolean field to pending db * feat: ✨ Persist gap filling remark transactions * fix: ✅ Fix race condition where container wasn't fully paused * feat: ✨ Add pendingDbUrl option to `addBspContainer` as well * refactor: 🚚 Rename `insert_sent` to `upsert_sent` * feat: 🔥 Remove unused `load_active` function from pending db interface * refactor: ♻️ Use `Vec<String>` directly in `load_resubscribe_rows` params * feat: 🩹 Remove usage of `sent` state in pending DB * refactor: ♻️ Use `TransactionStatus` in db interface functions * fix: 🐛 Track transaciton in `transaction_manager` even with no `call_scale` * refactor: ♻️ Use constants for container names of DBs and backend * test: ✅ Add check after sealing block of pending tx not updating * feat: ✨ Add message in remark fake tx * fix: ✅ Use new constant instead of hardcoded postgres container name * fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup * fix: 🐛 Set all txs pending to `watched=false` then only those we re-watch back to `true` * Revert "fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup" This reverts commit df6af95. * fix: 🐛 Try to watch in_block pending transacitons too * fix: 🐛 Not filter by on-chain nonce when re-subscribing to pending txs * test: ✅ Improve test error logging * feat: ✨ Add custom error to submit_and_watch_extrinsic * fix: 🩹 Log and skip when error in re-subscribing is old nonce * fix: ✅ Consider node race condition in test
- Consolidate capacity management with single increase per batch - Add batch trimming to fit within capacity limits - Implement batch rejection with single extrinsic for efficiency - Extract helper methods for file metadata construction - Improve logging for batch processing visibility - Clean up imports and remove unused file_key_cleanup field
- retry block sealing for checking msp acceptance
…rage requests api
…iles that have already been accepted
Add `msp: Option<(ProviderIdFor<T>, bool)>` field to the NewStorageRequest event to propagate MSP assignment information through events. This allows MSP clients to determine if a storage request was created for them and whether they have already accepted it, without needing to query storage request metadata from the chain. Prevents the MSP from reaccepting storage requests
… api, let msp retry
This reverts commit fc7404c.
The MSP could queue the same file key multiple times for acceptance, causing MspAlreadyConfirmed errors when batches processed duplicate entries. This occurred when multiple code paths (BatchProcessStorageRequests and RemoteUploadRequest handlers) both called on_file_complete for the same file. Add a persistent HashSet (CFHashSetAPI) alongside the existing deque to track pending file keys. Before queueing, check if the file key exists in the set - skip if present, insert and queue if not. When popping from the deque for batch processing, remove the file key from the set.
…arc hashmap of file key statuses
This file contains local MCP server configuration and should not be tracked in the repository. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…up after fisherman resume The `waitForIndexing` helper was using `system.number()` which returns the best block (e.g., #28), but the indexer only indexes finalized blocks (e.g., #22). This caused a 30-second timeout waiting for "Indexing block #28" log that wouldn't appear until finalization caught up.
* refactor: 🚚 Move release docs to `resources` * refactor: 🚚 Move config files to dedicated directory * fix: 📝 Minor fix in release process doc * refactor: 🚚 Move `backend_config` as well * fix: 🩹 Remove extra lines in config file
* fix: 🚑 refactor upload code to avoid write lock deadlocks * fix: 🔊 update logs to print file key and fingerprint as hexadecimal strings * fix: 🐛 correctly error out when a file inconsistency gets detected * revert: ⏪ revert changes to file storage
* fix: 🐛 avoid concurrent uploads for the same file key in the backend * fix: ⏪ undo change in file diesel table
…d-storage-requests # Conflicts: # client/src/tasks/msp_upload_file.rs
Resolved conflict in client/src/tasks/msp_upload_file.rs: - Kept HEAD's handle_rejected_storage_request logic for proper on-chain rejection - Updated return type to anyhow::Result<String> per EventHandler trait change from main - Updated BatchProcessStorageRequests handler to return Result<String>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
B0-silent
Changes should not be mentioned in any release notes
D3-trivial👶
PR contains trivial changes that do not require an audit
not-breaking
Does not need to be mentioned in breaking changes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MSPs retry storage requests in the event when they fail due to proof invalidations i.e.
ForestProofVerificationFailed,KeyProofVerificationFailed,FailedToApplyDelta. Under these circumstances, the MSP will re-queue the storage requests to be responded to again.Design
BatchProcessStorageRequestsis a new event handler which queries the pending storage requests which are waiting for the MSP's response. It will cause; aNewStorageRequestevent to be emitted via a new command (PreprocessStorageRequest) to initiate the natural storage request process which receives the file from the user and adds the file to storage and queue the storage request for response.The file keys that are processed and responded to are tracked via a shared
HashMapin a singleMspUploadFileTaskinstance. This single instance is critical for the retry mechanism to work correctly, as all event handlers (BatchProcessStorageRequests,NewStorageRequest,RemoteUploadRequest,ProcessMspRespondStoringRequest) must share the samefile_key_statusesHashMap to see status updates from each other.A new
FileKeyStatusenum tracks each file key's processing state across concurrent event handlers:Processing- File key is in the pipelineAccepted- Successfully accepted on-chainRejected- Explicitly rejected on-chainAbandoned- Failed with non-proof dispatch errors (permanent failures, won't be retried)Retry Mechanism: When proof errors occur (
ForestProofVerificationFailed,FailedToApplyDelta), file keys are removed from thefile_key_statusesHashMap rather than being marked with aFailedstatus. This signals to the nextBatchProcessStorageRequestscycle that the file key should be re-processed. The file key will be re-inserted withProcessingstatus, triggering a newNewStorageRequestevent and regenerating proofs with the updated forest root.The batch processing cycle is executed atomically controlled via a semaphore (size 1). The blockchain service will emit this event at every block if the semaphore isn't held. This allows more linear control over the process of when the MSP will preprocess storage requests and queue them up for responding.
The
MspHandlertracks storage requests which are queued in an in-memory FIFO queue (pending_respond_storage_requests) with O(1) deduplication via aHashSet(pending_respond_storage_request_file_keys) to prevent duplicate queueing of the same file key.Storage Deprecation
Deprecates Column Families from RocksDb storage:
pending_msp_respond_storage_requestpending_msp_respond_storage_request_left_indexpending_msp_respond_storage_request_right_indexWill be removed once we implement proper migration automations for RockDb.
Notable changes
ModuleErrorassociated type to theStorageEnableRuntimeto enable type safety when reading decoding extrinsic errors into specific pallet errorsbatchStorageRequestshelper api for integration tests that create multiple storage requestsbatchStorageRequestshelper api and adds logging for observability sake.mcp.jsonto .gitignoreSimplified Dataflow
flowchart TD A[BatchProcessStorageRequests<br/><i>periodic, from BlockchainService</i>] --> B[NewStorageRequest<br/><i>file 1</i>] A --> C[NewStorageRequest<br/><i>file 2</i>] A --> D[NewStorageRequest<br/><i>file N</i>] B --> E{File in storage?} C --> E D --> E E -->|No| F[RemoteUploadRequest<br/><i>chunk uploads from user</i>] E -->|Yes| G[on_file_complete] F -->|file complete| G G --> H[queue_msp_respond_storage_request<br/><i>in-memory FIFO queue</i>] H --> I[ProcessMspRespondStoringRequest<br/><i>batched extrinsic submission</i>] I -->|Proof Error| J[Remove from file_key_statuses<br/><i>triggers retry on next cycle</i>] I -->|Success| K[Accepted/Rejected status] I -->|Non-proof Error| L[Abandoned status<br/><i>permanent skip</i>]