Skip to content

Multi-Cloud Storage Support #1180

@Vasilije1990

Description

@Vasilije1990

Description

Expand Cognee's storage capabilities beyond AWS S3 to support Azure Blob Storage and Google Cloud Storage, while refactoring the existing storage architecture to be more extensible, maintainable, and cloud-agnostic. This enhancement will enable users to deploy Cognee on any major cloud platform and provide better storage flexibility.

Current State Analysis

Existing Architecture Limitations

Currently there are 20 hardcoded s3:// references throughout the codebase and get_file_storage() only handles binary local vs S3 choice. Storage configuration is scattered and cloud-specific. Similar S3 sync logic duplicated in KuzuAdapter and SQLAlchemyAdapter

Current Implementation Files

  • cognee/infrastructure/files/storage/S3FileStorage.py - S3-specific implementation
  • cognee/infrastructure/files/storage/get_file_storage.py - Basic binary factory
  • cognee/infrastructure/files/storage/s3_config.py - S3-only configuration
  • Multiple database adapters with hardcoded S3 logic

Technical Goals

1. Cloud Storage Provider Support

Add first-class support for:

  • Azure Blob Storage (azure://, abs:// schemes)
  • Google Cloud Storage (gs://, gcs:// schemes)
  • Maintain AWS S3 compatibility with enhanced implementation

2. Architecture Refactoring

  • Extensible Factory Pattern: Plugin-based storage provider registration
  • Unified Configuration: Cloud-agnostic configuration management
  • Provider Abstraction: Standardized cloud storage interface
  • Dependency Injection: Configurable storage backend selection

3. Enhanced Kuzu Migration

  • Support migration across all cloud storage providers
  • Cloud-to-cloud migration capabilities
  • Unified temporary file management
  • Cross-cloud database portability

Implementation Plan

  • Design and implement CloudStorageProvider interface
  • Create StorageProviderRegistry system
  • Implement unified CloudStorageConfig
  • Refactor get_file_storage() to use registry pattern
  • Update existing S3 provider to use new interface

Sprint 2: Azure Support

  • Implement AzureBlobStorageProvider
  • Add Azure configuration and authentication
  • Create Azure-specific tests
  • Update documentation for Azure setup

Sprint 3: Google Cloud Support (2 weeks)

  • Implement GoogleCloudStorageProvider
  • Add GCS configuration and authentication
  • Create GCS-specific tests
  • Update documentation for GCS setup

Sprint 4: Database Integration (2 weeks)

  • Create CloudDatabaseMixin
  • Refactor KuzuAdapter to use mixin
  • Refactor SQLAlchemyAdapter to use mixin
  • Update database initialization logic

Modified Files

  • cognee/infrastructure/files/storage/get_file_storage.py - Registry integration
  • cognee/infrastructure/files/storage/S3FileStorage.py - Refactor to new interface
  • cognee/infrastructure/databases/graph/kuzu/adapter.py - Use cloud mixin
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py - Use cloud mixin
  • cognee/infrastructure/databases/graph/kuzu/kuzu_migrate.py - Multi-cloud support
  • cognee/alembic/versions/b9274c27a25a_kuzu_11_migration.py - Cloud migration support
  • All files with hardcoded s3:// references (20+ files)

Configuration Migration

Environment Variables

# Current S3-only
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx

# New unified approach
STORAGE_BACKEND=azure  # local, s3, azure, gcs

# Provider-specific configs
AZURE_ACCOUNT_NAME=mystorageaccount
AZURE_ACCOUNT_KEY=xxx
AZURE_CONTAINER_NAME=cognee-data

GCS_PROJECT_ID=my-project
GCS_CREDENTIALS_PATH=/path/to/credentials.json
GCS_BUCKET=cognee-storage

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions