Skip to content

Conversation

@Swiftyos
Copy link
Contributor

@Swiftyos Swiftyos commented Nov 20, 2025

Summary

Adds SQLAlchemy infrastructure to the backend as foundation for incrementally replacing Prisma for runtime database operations, while maintaining Prisma for migration generation.

Changes

Core Infrastructure

  • backend/data/sqlalchemy.py: Async SQLAlchemy engine with connection pooling

    • Engine creation with QueuePool (10 persistent + 5 overflow connections)
    • Session factory for dependency injection
    • get_session() FastAPI dependency
    • Lifecycle management (initialize(), dispose())
  • backend/data/sqlalchemy_test.py: Comprehensive test suite

    • URL conversion, schema extraction, engine creation
    • Session factory and dependency injection tests
    • All tests passing ✅

Configuration

  • backend/util/settings.py: SQLAlchemy settings

    • Pool size, overflow, timeouts
    • Echo mode for debugging
  • backend/.env.default: Default environment variables

Service Integration

  • backend/executor/database.py: DatabaseManager lifespan
  • backend/server/rest_api.py: AgentServer lifespan

Both services now initialize SQLAlchemy on startup and dispose on shutdown.

Dependencies

  • pyproject.toml: Added sqlalchemy[asyncio] and asyncpg

Technical Details

Connection Pool:

  • 10 persistent connections per service
  • 5 overflow connections
  • 30s pool timeout
  • Pre-ping enabled

Schema Handling:

  • Extracts from existing DATABASE_URL
  • Sets via search_path parameter
  • Compatible with Prisma configuration

Session Lifecycle:

  • Automatic transaction management
  • Commit on success, rollback on error
  • Connection returned to pool after use

Migration Approach

This PR establishes infrastructure only. Both Prisma and SQLAlchemy will coexist during incremental migration:

  1. ✅ Infrastructure (this PR)
  2. Next: Proof of concept with new features
  3. Then: Systematic table migration
  4. Finally: Remove Prisma runtime usage

Testing

poetry run pytest backend/backend/data/sqlalchemy_test.py -xvs

All tests passing with coverage of:

  • URL conversion and schema extraction
  • Engine and session factory creation
  • Dependency injection lifecycle
  • Error handling and rollback

Breaking Changes

None - purely additive. Prisma continues to work unchanged.

Checklist 📋

For code changes:

  • I have clearly listed my changes in the PR description
  • I have made a test plan
  • I have tested my changes according to the test plan:
    • I have added tests for the new functionality

@Swiftyos Swiftyos requested a review from a team as a code owner November 20, 2025 09:36
@Swiftyos Swiftyos requested review from Bentlybro and ntindle and removed request for a team November 20, 2025 09:36
@github-project-automation github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Nov 20, 2025
@netlify
Copy link

netlify bot commented Nov 20, 2025

Deploy Preview for auto-gpt-docs-dev canceled.

Name Link
🔨 Latest commit 39839a5
🔍 Latest deploy log https://app.netlify.com/projects/auto-gpt-docs-dev/deploys/6920c304a7314d00087de5df

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch swiftyos/sqlalchemy-plumbing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the platform/backend AutoGPT Platform - Back end label Nov 20, 2025
@netlify
Copy link

netlify bot commented Nov 20, 2025

Deploy Preview for auto-gpt-docs canceled.

Name Link
🔨 Latest commit 39839a5
🔍 Latest deploy log https://app.netlify.com/projects/auto-gpt-docs/deploys/6920c30410b6290008fdeabf

@Swiftyos Swiftyos marked this pull request as draft November 20, 2025 09:37
@qodo-merge-pro
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Misuse

Using QueuePool with create_async_engine may be unnecessary or problematic since async engines manage pooling internally via the underlying driver; verify that specifying QueuePool is supported and won’t lead to unexpected behavior with asyncpg.

# Connection pool configuration
poolclass=QueuePool,  # Standard connection pool
pool_size=config.sqlalchemy_pool_size,  # Persistent connections
max_overflow=config.sqlalchemy_max_overflow,  # Burst capacity
pool_timeout=config.sqlalchemy_pool_timeout,  # Wait time for connection
pool_pre_ping=True,  # Validate connections before use
# Async configuration
Transaction Semantics

The FastAPI dependency commits after yielding regardless of whether any writes occurred; this can surprise callers that expect explicit commit control. Consider scoping transactions explicitly or documenting that each request auto-commits and ensure read-only routes don’t incur unnecessary commits.

# Create session (borrows connection from pool)
async with _session_factory() as session:
    try:
        yield session  # Inject into route handler or context manager
        # If we get here, route succeeded - commit any pending changes
        await session.commit()
    except Exception:
        # Error occurred - rollback transaction
URL Sanitization

Regex-based stripping of schema query params may miss edge cases (ordering, URL encoding, additional params). Consider parsing via urllib.parse to robustly remove only schema while preserving other parameters.

async_url = prisma_url.replace("postgresql://", "postgresql+asyncpg://")

# Remove schema parameter (we'll handle via MetaData)
async_url = re.sub(r"\?schema=\w+", "", async_url)

# Remove any remaining query parameters that might conflict
async_url = re.sub(r"&schema=\w+", "", async_url)

return async_url

@deepsource-io
Copy link

deepsource-io bot commented Nov 20, 2025

Here's the code health analysis summary for commits 0edc669..39839a5. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource JavaScript LogoJavaScript✅ SuccessView Check ↗
DeepSource Python LogoPython✅ Success
❗ 20 occurences introduced
🎯 2 occurences resolved
View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

@Swiftyos Swiftyos force-pushed the swiftyos/sqlalchemy-plumbing branch from 682f01f to 8ae5cbe Compare November 20, 2025 10:10
@AutoGPT-Agent
Copy link

Thank you for this well-structured PR that adds SQLAlchemy infrastructure to the backend. The code looks well-designed with comprehensive test coverage and clear documentation.

A few items to address before merging:

  1. Missing checklist: Your PR is missing the required checklist. Even though this is primarily infrastructure code, we still need the checklist filled out. You can mark the testing sections as completed with your test plan since you've clearly tested the SQLAlchemy integration.

  2. Configuration design: The SQLAlchemy configuration in settings.py looks good, but should we add some comments about reasonable values for these settings in different environments (dev/test/prod)?

  3. Documentation: While you mentioned SQLAlchemy_INTEGRATION.md, I don't see it in the diff. Make sure this documentation is included to help other developers understand the migration plan.

  4. Error handling: The error handling in the lifespan hooks looks good, but consider adding more specific error types in your exception handlers where possible for better debugging.

Overall, this is a well-structured foundation for the gradual migration from Prisma to SQLAlchemy. Once you address the checklist issue, this should be ready for merging.

@Swiftyos Swiftyos requested a review from majdyz November 20, 2025 10:41
@Swiftyos Swiftyos marked this pull request as ready for review November 20, 2025 10:58
@qodo-merge-pro
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 Security concerns

Sensitive information exposure:
Initialization error logs may reveal parts of the database connection string (host, port, db name) in both executor/database.py and server/rest_api.py. While credentials are not logged, internal endpoints can be sensitive; recommend redacting or removing URL details from logs. No other obvious injection or secret exposure found.

⚡ Recommended focus areas for review

Transaction Semantics

get_session() unconditionally commits on successful exit. This may lead to unintended commits for pure read-only operations or when callers expect to manage transactions explicitly. Consider making commit opt-in, supporting read-only sessions, or documenting clearly to avoid accidental writes.

async def get_session() -> AsyncGenerator[AsyncSession, None]:
    """
    FastAPI dependency that provides database session.

    Usage in routes:
        @router.get("/users/{user_id}")
        async def get_user(
            user_id: int,
            session: AsyncSession = Depends(get_session)
        ):
            result = await session.execute(select(User).where(User.id == user_id))
            return result.scalar_one_or_none()

    Usage in DatabaseManager RPC methods:
        @expose
        async def get_user(user_id: int):
            async with get_session() as session:
                result = await session.execute(select(User).where(User.id == user_id))
                return result.scalar_one_or_none()

    Lifecycle:
    1. Request arrives
    2. FastAPI calls this function (or used as context manager)
    3. Session is created (borrows connection from pool)
    4. Session is injected into route handler
    5. Route executes (may commit/rollback)
    6. Route returns
    7. Session is closed (returns connection to pool)

    Error handling:
    - If exception occurs, session is rolled back
    - Connection is always returned to pool (even on error)
    """
    if _session_factory is None:
        raise RuntimeError(
            "SQLAlchemy not initialized. Call initialize() in lifespan context."
        )

    # Create session (borrows connection from pool)
    async with _session_factory() as session:
        try:
            yield session  # Inject into route handler or context manager
            # If we get here, route succeeded - commit any pending changes
            await session.commit()
        except Exception:
            # Error occurred - rollback transaction
            await session.rollback()
            raise
        finally:
            # Always close session (returns connection to pool)
            await session.close()
URL Sanitization

get_database_url() strips all query parameters indiscriminately; if future required parameters (e.g., sslmode) are added to DATABASE_URL, they will be lost. Consider preserving safe/whitelisted params or explicitly migrating supported ones via connect_args.

def get_database_url() -> str:
    """
    Extract database URL from environment and convert to async format.

    Prisma URL: postgresql://user:pass@host:port/db?schema=platform&connect_timeout=60
    Async URL:  postgresql+asyncpg://user:pass@host:port/db

    Returns the async-compatible URL without query parameters (handled via connect_args).
    """
    prisma_url = Config().database_url

    # Replace postgresql:// with postgresql+asyncpg://
    async_url = prisma_url.replace("postgresql://", "postgresql+asyncpg://")

    # Remove ALL query parameters (schema, connect_timeout, etc.)
    # We'll handle these through connect_args instead
    async_url = re.sub(r"\?.*$", "", async_url)

    return async_url
Error Logging Detail

Initialization logs include the tail of database_url (host:port/db) which could leak sensitive info in some deployments. Ensure no credentials or internal hostnames are exposed in logs; consider redaction or omitting the URL altogether.

if config.enable_sqlalchemy:
    from sqlalchemy.exc import DatabaseError, OperationalError
    from sqlalchemy.exc import TimeoutError as SQLAlchemyTimeoutError

    from backend.data import sqlalchemy as sa

    try:
        engine = sa.create_engine()
        sa.initialize(engine)
        app.state.db_engine = engine
        logger.info(
            f"[{self.service_name}] ✓ SQLAlchemy initialized "
            f"(pool_size={config.sqlalchemy_pool_size}, "
            f"max_overflow={config.sqlalchemy_max_overflow})"
        )
    except OperationalError as e:
        logger.error(
            f"[{self.service_name}] Failed to connect to database during SQLAlchemy initialization. "
            f"Check database connection settings (host, port, credentials). "
            f"Database URL: {config.database_url.split('@')[-1] if '@' in config.database_url else 'N/A'}. "
            f"Error: {e}"
        )
        raise
    except SQLAlchemyTimeoutError as e:
        logger.error(
            f"[{self.service_name}] Database connection timeout during SQLAlchemy initialization. "
            f"Timeout setting: {config.sqlalchemy_connect_timeout}s. "
            f"Check if database is accessible and increase timeout if needed. "
            f"Error: {e}"
        )
        raise
    except DatabaseError as e:
        logger.error(
            f"[{self.service_name}] Database error during SQLAlchemy initialization. "
            f"Check database permissions and configuration. "
            f"Error: {e}"
        )
        raise
    except Exception as e:
        logger.error(
            f"[{self.service_name}] Unexpected error during SQLAlchemy initialization. "
            f"Configuration: pool_size={config.sqlalchemy_pool_size}, "
            f"max_overflow={config.sqlalchemy_max_overflow}, "
            f"pool_timeout={config.sqlalchemy_pool_timeout}s. "
            f"Error: {e}",
            exc_info=True,
        )
        raise

Comment on lines +39 to +49
"""
prisma_url = Config().database_url

# Replace postgresql:// with postgresql+asyncpg://
async_url = prisma_url.replace("postgresql://", "postgresql+asyncpg://")

# Remove ALL query parameters (schema, connect_timeout, etc.)
# We'll handle these through connect_args instead
async_url = re.sub(r"\?.*$", "", async_url)

return async_url
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Empty DATABASE_URL with enable_sqlalchemy=true causes unhandled ArgumentError during startup.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

When enable_sqlalchemy=true is set and the DATABASE_URL environment variable is not provided, the Config().database_url defaults to an empty string. The get_database_url() function then attempts to create an engine with this empty URL, causing SQLAlchemy to raise an ArgumentError. This ArgumentError is not caught by the existing exception handlers in database.py or rest_api.py, leading to an unhandled exception and application crash during startup.

💡 Suggested Fix

Validate that database_url is not empty before calling create_async_engine(), or add ArgumentError to the list of caught exceptions, or set a sensible fallback for database_url.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: autogpt_platform/backend/backend/data/sqlalchemy.py#L31-L49

Potential issue: When `enable_sqlalchemy=true` is set and the `DATABASE_URL` environment
variable is not provided, the `Config().database_url` defaults to an empty string. The
`get_database_url()` function then attempts to create an engine with this empty URL,
causing SQLAlchemy to raise an `ArgumentError`. This `ArgumentError` is not caught by
the existing exception handlers in `database.py` or `rest_api.py`, leading to an
unhandled exception and application crash during startup.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference_id: 2841823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 🆕 Needs initial review

Development

Successfully merging this pull request may close these issues.

3 participants