-
Notifications
You must be signed in to change notification settings - Fork 531
Open
Labels
help wantedExtra attention is neededExtra attention is needed
Description
Description
Expand Cognee's storage capabilities beyond AWS S3 to support Azure Blob Storage and Google Cloud Storage, while refactoring the existing storage architecture to be more extensible, maintainable, and cloud-agnostic. This enhancement will enable users to deploy Cognee on any major cloud platform and provide better storage flexibility.
Current State Analysis
Existing Architecture Limitations
Currently there are 20 hardcoded s3://
references throughout the codebase and get_file_storage()
only handles binary local vs S3 choice. Storage configuration is scattered and cloud-specific. Similar S3 sync logic duplicated in KuzuAdapter
and SQLAlchemyAdapter
Current Implementation Files
cognee/infrastructure/files/storage/S3FileStorage.py
- S3-specific implementationcognee/infrastructure/files/storage/get_file_storage.py
- Basic binary factorycognee/infrastructure/files/storage/s3_config.py
- S3-only configuration- Multiple database adapters with hardcoded S3 logic
Technical Goals
1. Cloud Storage Provider Support
Add first-class support for:
- Azure Blob Storage (
azure://
,abs://
schemes) - Google Cloud Storage (
gs://
,gcs://
schemes) - Maintain AWS S3 compatibility with enhanced implementation
2. Architecture Refactoring
- Extensible Factory Pattern: Plugin-based storage provider registration
- Unified Configuration: Cloud-agnostic configuration management
- Provider Abstraction: Standardized cloud storage interface
- Dependency Injection: Configurable storage backend selection
3. Enhanced Kuzu Migration
- Support migration across all cloud storage providers
- Cloud-to-cloud migration capabilities
- Unified temporary file management
- Cross-cloud database portability
Implementation Plan
- Design and implement
CloudStorageProvider
interface - Create
StorageProviderRegistry
system - Implement unified
CloudStorageConfig
- Refactor
get_file_storage()
to use registry pattern - Update existing S3 provider to use new interface
Sprint 2: Azure Support
- Implement
AzureBlobStorageProvider
- Add Azure configuration and authentication
- Create Azure-specific tests
- Update documentation for Azure setup
Sprint 3: Google Cloud Support (2 weeks)
- Implement
GoogleCloudStorageProvider
- Add GCS configuration and authentication
- Create GCS-specific tests
- Update documentation for GCS setup
Sprint 4: Database Integration (2 weeks)
- Create
CloudDatabaseMixin
- Refactor
KuzuAdapter
to use mixin - Refactor
SQLAlchemyAdapter
to use mixin - Update database initialization logic
Modified Files
cognee/infrastructure/files/storage/get_file_storage.py
- Registry integrationcognee/infrastructure/files/storage/S3FileStorage.py
- Refactor to new interfacecognee/infrastructure/databases/graph/kuzu/adapter.py
- Use cloud mixincognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
- Use cloud mixincognee/infrastructure/databases/graph/kuzu/kuzu_migrate.py
- Multi-cloud supportcognee/alembic/versions/b9274c27a25a_kuzu_11_migration.py
- Cloud migration support- All files with hardcoded
s3://
references (20+ files)
Configuration Migration
Environment Variables
# Current S3-only
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
# New unified approach
STORAGE_BACKEND=azure # local, s3, azure, gcs
# Provider-specific configs
AZURE_ACCOUNT_NAME=mystorageaccount
AZURE_ACCOUNT_KEY=xxx
AZURE_CONTAINER_NAME=cognee-data
GCS_PROJECT_ID=my-project
GCS_CREDENTIALS_PATH=/path/to/credentials.json
GCS_BUCKET=cognee-storage
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed