Skip to content

Enable Kuzu Migration to Work with S3 Storage #1179

@Vasilije1990

Description

@Vasilije1990

Issue Description

The current Kuzu migration functionality (cognee/infrastructure/databases/graph/kuzu/kuzu_migrate.py) only supports migrating Kuzu databases stored in local storage. However, the system already supports S3 storage for Kuzu databases through the KuzuAdapter class, which can handle S3 paths and automatically sync with S3 storage.

When Kuzu databases are stored in S3 and a version migration is required, the current migration process fails because it relies on local filesystem operations.

Current Behavior

The current migration code in kuzu_migrate.py:

  • Uses os.path.exists() to check database existence (line 135)
  • Assumes local file paths for database operations
  • Does not handle S3 path patterns (s3://bucket/path)
  • Does not handle temporary file operations for S3 databases

This causes failures when:

  • STORAGE_BACKEND=s3 is configured
  • Database paths contain s3:// prefix
  • Automatic migration is triggered in KuzuAdapter._initialize_connection() (lines 89-98)
  • Alembic migration runs for S3-stored databases

Expected Behavior

The migration system should seamlessly handle both local and S3 storage:

  • Detect S3 database paths automatically
  • Download S3 databases to temporary files for migration processing
  • Perform migration on temporary local files
  • Upload migrated databases back to S3
  • Clean up temporary files after migration
  • Support both automatic migration in KuzuAdapter and manual migration via kuzu_migrate.py

Technical Requirements

1. S3 Path Detection and Handling

  • Modify kuzu_migration() function to detect S3 paths using existing patterns:
    • Check for s3:// prefix in database paths
    • Check for STORAGE_BACKEND=s3 environment variable
  • Add S3 path validation instead of os.path.exists() checks

2. S3 Database Download/Upload

  • Implement S3 database download before migration starts
  • Create temporary local files for migration processing
  • Upload migrated database back to S3 after successful migration
  • Reuse existing S3 infrastructure from S3FileStorage class

3. Integration Points

Update the following areas to support S3 migration:

A. kuzu_migrate.py

  • Add S3 path detection in kuzu_migration() function
  • Add S3 download/upload workflow
  • Modify file existence checks to work with S3
  • Add proper error handling for S3 operations

B. KuzuAdapter

  • Ensure automatic migration works with S3 paths
  • Update migration trigger logic in _initialize_connection()

C. Alembic Migration

  • Update b9274c27a25a_kuzu_11_migration.py to handle S3 databases
  • Add S3 path detection in both multi-user and single-user modes

4. Temporary File Management

  • Use tempfile module for secure temporary file creation
  • Ensure proper cleanup of temporary files in all scenarios
  • Handle edge cases (migration failures, interruptions)

5. Configuration

  • Leverage existing S3 configuration from s3_config.py
  • Use existing S3 credentials and settings
  • No additional configuration required

Implementation Approach

Phase 1: Core Migration Function Enhancement

  1. Modify kuzu_migration() function to accept S3 paths
  2. Add S3 detection logic similar to KuzuAdapter._initialize_connection()
  3. Implement S3 download/upload workflow using existing S3FileStorage class
  4. Add comprehensive error handling and cleanup

Phase 2: Integration Updates

  1. Update KuzuAdapter automatic migration to pass S3 context
  2. Update alembic migration to handle S3 databases
  3. Add proper logging for S3 migration operations

Phase 3: Testing and Validation

  1. Add unit tests for S3 migration scenarios
  2. Add integration tests with actual S3 storage
  3. Test automatic migration triggers with S3 databases
  4. Test alembic migration with S3 configuration

Files to Modify

  1. cognee/infrastructure/databases/graph/kuzu/kuzu_migrate.py

    • Main migration logic enhancement
    • S3 path detection and handling
    • Temporary file management
  2. cognee/infrastructure/databases/graph/kuzu/adapter.py

    • Update automatic migration trigger
    • Pass S3 context to migration function
  3. cognee/alembic/versions/b9274c27a25a_kuzu_11_migration.py

    • Add S3 database detection
    • Handle S3 paths in both migration modes

Acceptance Criteria

  • Migration works with S3 database paths (s3://bucket/path)
  • Migration works when STORAGE_BACKEND=s3 is configured
  • Automatic migration in KuzuAdapter works with S3 databases
  • Alembic migration handles S3 databases correctly
  • Temporary files are properly cleaned up in all scenarios
  • Error handling provides clear messages for S3-related issues
  • No breaking changes to existing local storage migration
  • Migration preserves all S3 permissions and metadata

Dependencies

  • Existing S3FileStorage infrastructure
  • Current S3 configuration system (s3_config.py)
  • Existing Kuzu migration framework
  • s3fs library (already in use)

Priority

Medium-High - This issue blocks users who:

  • Have configured S3 as their storage backend
  • Need to migrate Kuzu databases when upgrading Kuzu versions
  • Use multi-user Cognee deployments with S3 storage

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions