-
Notifications
You must be signed in to change notification settings - Fork 11
feat: RocksDB Column Family Migration Framework #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
snowmead
wants to merge
121
commits into
main
Choose a base branch
from
feat/rocksdb-migrations
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Refactored the storage request submission process in `move-bucket.test.ts` to utilize a batch processing helper
…Postgres DB (#563) * fix: 🩹 Remove and update old comments * feat: 🚧 Add `blockchain-service-db` crate and postgres schema * feat: 🚧 Wire pending tx DB updating to `send_extrinsic` * feat: 🚧 Add CLI param for db URL, initialise DB on BS startup and update status with watcher * fix: 🐛 Use i64 for nonce * feat: ✨ Add pending tx postgres to integration test suite * fix: 🐛 Wire CLI pending db param to blockchain service initialisation * test: ✅ Fix passing CLI pending db param in test suites * feat: ✨ Clear pending txs from DB in finality * docs: 📝 Document functions in `store.rs` for pending DB * feat: ✨ Log when a pending tx has a state update but for a different tx hash * fix: 🐛 Initialise Blockchain Service last processed blocks with genesis * test: ✅ Fix tests using old indexer db container name * test: ✅ Add back backend container initialisation * fix: 🐛 Remove duplicate indexer nodes * test: ✅ Fix name change mistakenly * test: ✨ Add new pending DB testing utilities * fix: 🗑️ Remove deprecated `createApiObject` * test: ✅ Add persistent pending Tx DB integration tests * feat: 🚧 Add `load_account_with_states` query to enable re-subscription at startup * feat: ✨ Re-watch transactions pending after restarting MSP * test: ✅ Add test for not re-watching extrinsic with nonce below on-chain nonce * feat: ✨ Add `watched` boolean field to pending db * feat: ✨ Persist gap filling remark transactions * fix: ✅ Fix race condition where container wasn't fully paused * feat: ✨ Add pendingDbUrl option to `addBspContainer` as well * refactor: 🚚 Rename `insert_sent` to `upsert_sent` * feat: 🔥 Remove unused `load_active` function from pending db interface * refactor: ♻️ Use `Vec<String>` directly in `load_resubscribe_rows` params * feat: 🩹 Remove usage of `sent` state in pending DB * refactor: ♻️ Use `TransactionStatus` in db interface functions * fix: 🐛 Track transaciton in `transaction_manager` even with no `call_scale` * refactor: ♻️ Use constants for container names of DBs and backend * test: ✅ Add check after sealing block of pending tx not updating * feat: ✨ Add message in remark fake tx * fix: ✅ Use new constant instead of hardcoded postgres container name * fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup * fix: 🐛 Set all txs pending to `watched=false` then only those we re-watch back to `true` * Revert "fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup" This reverts commit df6af95. * fix: 🐛 Try to watch in_block pending transacitons too * fix: 🐛 Not filter by on-chain nonce when re-subscribing to pending txs * test: ✅ Improve test error logging * feat: ✨ Add custom error to submit_and_watch_extrinsic * fix: 🩹 Log and skip when error in re-subscribing is old nonce * fix: ✅ Consider node race condition in test
- Consolidate capacity management with single increase per batch - Add batch trimming to fit within capacity limits - Implement batch rejection with single extrinsic for efficiency - Extract helper methods for file metadata construction - Improve logging for batch processing visibility - Clean up imports and remove unused file_key_cleanup field
- retry block sealing for checking msp acceptance
…rage requests api
…iles that have already been accepted
Add `msp: Option<(ProviderIdFor<T>, bool)>` field to the NewStorageRequest event to propagate MSP assignment information through events. This allows MSP clients to determine if a storage request was created for them and whether they have already accepted it, without needing to query storage request metadata from the chain. Prevents the MSP from reaccepting storage requests
… api, let msp retry
This reverts commit fc7404c.
The MSP could queue the same file key multiple times for acceptance, causing MspAlreadyConfirmed errors when batches processed duplicate entries. This occurred when multiple code paths (BatchProcessStorageRequests and RemoteUploadRequest handlers) both called on_file_complete for the same file. Add a persistent HashSet (CFHashSetAPI) alongside the existing deque to track pending file keys. Before queueing, check if the file key exists in the set - skip if present, insert and queue if not. When popping from the deque for batch processing, remove the file key from the set.
…status tracking Replace batch-capacity-centric approach with an event-driven per-file processing model: - Add FileKeyStatus enum (Processing, Accepted, Rejected, Failed, Abandoned) to track file key processing state across concurrent event handlers using Arc<RwLock<HashMap>> - BatchProcessStorageRequests now emits NewStorageRequest events for each pending request via new PreprocessStorageRequest command, skipping already-processed/accepted/rejected keys and automatically retrying Failed ones - NewStorageRequest handler performs per-file capacity management, storage creation, and P2P upload registration. If file already complete, immediately queues accept response - ProcessMspRespondStoringRequest uses type-safe pallet_proofs_dealer::Error decoding to distinguish proof errors (mark Failed for retry) from non-proof errors (mark Abandoned) - Move pending_respond_storage_requests queue from RocksDB to in-memory MspHandler struct, removing 4 column families (14 -> 10) since this state doesn't need persistence - Remove batch_reject_storage_requests, ensure_batch_capacity, and trim_batch_to_fit_capacity methods as capacity is now managed per-file in NewStorageRequest handler
…grations - Add downgrade prevention in `MigrationRunner::run_pending()` to reject databases created with newer schema versions than the current code supports - Add `MigrationRunner::validate_migration_order()` to check for duplicate versions, proper sequencing starting from 1, and gaps - Add `Clone` derive to `MigrationDescriptor` for flexibility - Add TypedRocksDB migration integration tests covering fresh database creation, deprecated CF handling, downgrade prevention, and context usage - Move migration tests from mod.rs to dedicated tests.rs file
Make the Migration trait object-safe by using instance methods instead of static methods and const VERSION. This eliminates the need for the MigrationDescriptor type-erasing wrapper, simplifying the codebase. Changes: - Migration trait now uses fn version(&self), fn deprecated_column_families(&self), and fn description(&self) instead of const/static methods - MigrationRunner::all_migrations() returns Vec<Box<dyn Migration>> - Remove MigrationDescriptor struct entirely - Update V1Migration and all tests to use the new trait signature
- Remove unused `all_deprecated_cfs` and `deprecated_cfs_up_to_version` functions that were only used in tests - Add `validate_migration_order()` call at start of `run_pending()` to ensure migrations are validated before execution - Fix `open_db_with_migrations` to use CURRENT file detection instead of swallowing all list_cf errors - properly distinguishes between new databases and corrupted existing databases - Add test for RocksDB error propagation when CURRENT file exists but database is corrupted
… migrations Add safety guardrails and improve the robustness of the RocksDB column family migration system: Guardrails: - Reject `current_schema_cfs` containing deprecated CF names (permanently reserved after deprecation to prevent data confusion) - Reject `current_schema_cfs` containing reserved `__schema_version__` CF - Add `InvalidColumnFamilyConfig` error variant with actionable messages Resilient cleanup: - Refactor cleanup pass to only process migrations <= current schema version - Always drop straggler deprecated CFs from already-applied migrations on startup (handles partial failures from crashes) - Document why idempotent cleanup is used instead of transactional drops (RocksDB does not support batching multiple `drop_cf()` atomically) New helper: - Add `MigrationRunner::all_deprecated_column_families()` to aggregate deprecated CF names across all registered migrations Tests: - Add `cf_guardrail_tests` module for validation behavior - Add `cleanup_resilience_tests` module for straggler cleanup scenarios
Remove 11 redundant tests that duplicated coverage already provided by other tests in the migration test suite: - 2 downgrade prevention tests (duplicated `prevents_downgrade`) - 3 store simulation tests (covered by other migration tests) - 2 cleanup resilience tests (covered by existing tests) - 4 TypedRocksDB migration tests (duplicated tests.rs tests) Kept `used_with_context` test as it uniquely validates TypedRocksDB with TypedDbContext integration.
- Move V1 migration from shc_common to blockchain-service crate - Change MigrationRunner from static methods to instance-based - Add TypedRocksDB::open() for stores without migrations - Add TypedRocksDB::open_with_migrations() for stores with migrations - Auto-sort migrations by version in MigrationRunner constructors - Write version 0 explicitly in open_db() for consistency - Use distinct test-only CF names to decouple tests from production - Keep &'static str lifetime in all_deprecated_column_families() Each store now defines and owns its own migrations, following the principle of locality. Stores without migrations use open() which writes version 0, ensuring all databases have consistent schema version tracking and clean upgrade paths. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Rename `typed_rocks_db_migration_tests` to `typed_rocks_db_open_tests` to accurately reflect test scope (tests open methods, not migrations) - Add `open_with_migrations_drops_deprecated_cfs_and_works_with_context` test that verifies TypedRocksDB::open_with_migrations() correctly drops deprecated column families and works with TypedDbContext - Extract shared TestDataCf definition to module level 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ementations - Remove `validate_custom_migrations` helper and update validation_tests to use `MigrationRunner::validate_order()` directly - Remove `run_migrations_with_list` helper and update multi_version_tests to use `MigrationRunner::run_pending()` directly This ensures tests verify the actual implementation behavior rather than duplicated logic that could diverge from the real code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
# Conflicts: # client/blockchain-service/src/state.rs
…ard compatibility Include the deprecated `last_processed_block_number` column family in CURRENT_COLUMN_FAMILIES to maintain backward compatibility with existing RocksDB databases until the V2 migration is added in a separate PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Split the migrations module into a more logical rocksdb module structure: - rocksdb/database.rs: Database opening functions and DatabaseError - rocksdb/migrations.rs: Migration trait, runner, and error types - rocksdb/mod.rs: Public exports This separates concerns by domain - database opening logic is now in database.rs while migration-specific code stays in migrations.rs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
B5-clientnoteworthy
Changes should be mentioned client-related release notes
breaking
Needs to be mentioned in breaking changes
D5-needsaudit👮
PR contains changes to logic that should be properly reviewed and externally audited
rocksdb-migrations
Changes include migrations for RocksDB
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds a versioned migration framework for managing RocksDB column family schema changes. This enables safe removal of deprecated column families when upgrading existing databases, ensuring backward compatibility with older node installations.
Design
RocksDB requires all existing column families to be specified when opening a database. The migration system discovers existing CFs, opens the database, then drops deprecated ones via versioned migrations.
How it works:
DB::list_cf()__schema_version__CF)Safety features:
New Module Structure
Notable Changes
Migrationtrait andMigrationRunnerinclient/common/src/rocksdb/TypedRocksDB::open_with_migrations()for stores needing migrationsTypedRocksDB::open()for stores without migrationsBlockchainServiceStateStore,DownloadStateStore,BspPeerManagerStoreTest Coverage
Tests use temporary RocksDB instances to verify migration functionality including schema version tracking, CF dropping, and error handling.
Adding New Migrations
A new v1 migration will be applied to the blockchain service state store to drop deprecated MSP respond storage request column families:
pending_msp_respond_storage_requestpending_msp_respond_storage_request_left_indexpending_msp_respond_storage_request_right_indexNo code changes required. The migration runs automatically on startup.