-
Notifications
You must be signed in to change notification settings - Fork 531
Description
Issue Description
The current Kuzu migration functionality (cognee/infrastructure/databases/graph/kuzu/kuzu_migrate.py
) only supports migrating Kuzu databases stored in local storage. However, the system already supports S3 storage for Kuzu databases through the KuzuAdapter
class, which can handle S3 paths and automatically sync with S3 storage.
When Kuzu databases are stored in S3 and a version migration is required, the current migration process fails because it relies on local filesystem operations.
Current Behavior
The current migration code in kuzu_migrate.py
:
- Uses
os.path.exists()
to check database existence (line 135) - Assumes local file paths for database operations
- Does not handle S3 path patterns (
s3://bucket/path
) - Does not handle temporary file operations for S3 databases
This causes failures when:
STORAGE_BACKEND=s3
is configured- Database paths contain
s3://
prefix - Automatic migration is triggered in
KuzuAdapter._initialize_connection()
(lines 89-98) - Alembic migration runs for S3-stored databases
Expected Behavior
The migration system should seamlessly handle both local and S3 storage:
- Detect S3 database paths automatically
- Download S3 databases to temporary files for migration processing
- Perform migration on temporary local files
- Upload migrated databases back to S3
- Clean up temporary files after migration
- Support both automatic migration in
KuzuAdapter
and manual migration viakuzu_migrate.py
Technical Requirements
1. S3 Path Detection and Handling
- Modify
kuzu_migration()
function to detect S3 paths using existing patterns:- Check for
s3://
prefix in database paths - Check for
STORAGE_BACKEND=s3
environment variable
- Check for
- Add S3 path validation instead of
os.path.exists()
checks
2. S3 Database Download/Upload
- Implement S3 database download before migration starts
- Create temporary local files for migration processing
- Upload migrated database back to S3 after successful migration
- Reuse existing S3 infrastructure from
S3FileStorage
class
3. Integration Points
Update the following areas to support S3 migration:
A. kuzu_migrate.py
- Add S3 path detection in
kuzu_migration()
function - Add S3 download/upload workflow
- Modify file existence checks to work with S3
- Add proper error handling for S3 operations
B. KuzuAdapter
- Ensure automatic migration works with S3 paths
- Update migration trigger logic in
_initialize_connection()
C. Alembic Migration
- Update
b9274c27a25a_kuzu_11_migration.py
to handle S3 databases - Add S3 path detection in both multi-user and single-user modes
4. Temporary File Management
- Use
tempfile
module for secure temporary file creation - Ensure proper cleanup of temporary files in all scenarios
- Handle edge cases (migration failures, interruptions)
5. Configuration
- Leverage existing S3 configuration from
s3_config.py
- Use existing S3 credentials and settings
- No additional configuration required
Implementation Approach
Phase 1: Core Migration Function Enhancement
- Modify
kuzu_migration()
function to accept S3 paths - Add S3 detection logic similar to
KuzuAdapter._initialize_connection()
- Implement S3 download/upload workflow using existing
S3FileStorage
class - Add comprehensive error handling and cleanup
Phase 2: Integration Updates
- Update
KuzuAdapter
automatic migration to pass S3 context - Update alembic migration to handle S3 databases
- Add proper logging for S3 migration operations
Phase 3: Testing and Validation
- Add unit tests for S3 migration scenarios
- Add integration tests with actual S3 storage
- Test automatic migration triggers with S3 databases
- Test alembic migration with S3 configuration
Files to Modify
-
cognee/infrastructure/databases/graph/kuzu/kuzu_migrate.py
- Main migration logic enhancement
- S3 path detection and handling
- Temporary file management
-
cognee/infrastructure/databases/graph/kuzu/adapter.py
- Update automatic migration trigger
- Pass S3 context to migration function
-
cognee/alembic/versions/b9274c27a25a_kuzu_11_migration.py
- Add S3 database detection
- Handle S3 paths in both migration modes
Acceptance Criteria
- Migration works with S3 database paths (
s3://bucket/path
) - Migration works when
STORAGE_BACKEND=s3
is configured - Automatic migration in
KuzuAdapter
works with S3 databases - Alembic migration handles S3 databases correctly
- Temporary files are properly cleaned up in all scenarios
- Error handling provides clear messages for S3-related issues
- No breaking changes to existing local storage migration
- Migration preserves all S3 permissions and metadata
Dependencies
- Existing
S3FileStorage
infrastructure - Current S3 configuration system (
s3_config.py
) - Existing Kuzu migration framework
s3fs
library (already in use)
Priority
Medium-High - This issue blocks users who:
- Have configured S3 as their storage backend
- Need to migrate Kuzu databases when upgrading Kuzu versions
- Use multi-user Cognee deployments with S3 storage
Metadata
Metadata
Assignees
Labels
Type
Projects
Status