Skip to content

RFC: Better migration architecture #366

@merlinz01

Description

@merlinz01

From my experience using aerich in a production system, the current migration approach leaves much to be desired.

Here are my complaints with the current architecture:

  • Using raw SQL is not database-independent, so you are tied to using the same database type for testing and production.
  • Sometimes the SQL commands are out of order.
  • You can't do custom migration logic. All you can do is tweak a generated SQL script.

This is what I've come up with as a better approach:

I think a migration file should consist of the following items:

  • The current and previous migration ID
  • The old schema as a dict literal
  • The new schema as a dict literal
  • A list of migration operations such as field renames, table creation, etc. with database-specific information only where absolutely unavoidable.
  • A function which can be used to do manual data migrations in Python or SQL, and its complement for downgrades

Benefits of this approach:

  • Database-agnostic
  • A clear way of describing what is going to happen to the database without having to mess with SQL.
  • Custom upgrade and downgrade logic
  • The aerich tool can lint the operations so that the schema stays consistent even if a user edits the migration file.
  • You could ask aerich to generate an null migration file and then implement some data-only (non-schema) updates in the custom migration logic.

Conceptual example:

# This is a migration generated by the aerich tool.
# You may edit this file, but be warned that your changes will be overwritten
# if you run aerich migrate again without upgrading the database to this migration.

from tortoise import BaseDBAsyncClient

# This is the ID of the migration previous to this one, or None if this is the first.
PREVIOUS_MIGRATION_ID = '20241121101234'

# This is the ID of the migration, used to determine if the migration has been run.
MIGRATION_ID = '20241121142100'

# This is the schema of the database as of the last migration (maybe could be imported directly from previous migration?)
OLD_SCHEMA = {
    'MyModel': {
        'table_name': 'my_models',
        'fields': [
            'id': {
                'type': 'IntField',
                'pk': True,
                'null': False,
            },
            'name': {
                'type': 'CharField',
                'pk': False,
                'null': False,
                'max_len': 255,
            },
            ...
        ],
    },
    ....
}

# This is the schema of the database after this migration has been run.
NEW_SCHEMA = {
    'MyModel': {
        'table_name': 'my_models',
        'fields': [
            'id': {
                'type': 'IntField',
                'pk': True,
                'null': False,
            },
            'display_name': {
                'type': 'CharField',
                'pk': False,
                'null': False,
                'max_len': 255,
            },
            ...
        ],
    },
    ....
}

# These are directives to the migration executor about what the migration consists of.
MIGRATION_OPERATIONS = {
    {'type': 'rename_field', 'table': 'my_models', 'old_name': 'name', 'new_name': 'display_name'},
}

# This is where you can implement fancy migration logic if the simple cases don't meet your needs.
async def CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn: BaseDBAsyncClient):
    # Example (pseudo-code):
    names = [(id, f"Hi, I'm {name}") for id, name in dbconn.execute('SELECT id, name FROM my_models').all()]

    # Wait for the migrations to be executed
    yield

    dbconn.executemany('UPDATE my_models SET display_name=? WHERE id=?', names)

# This is where you can implement the reverse logic in the same manner if you need to downgrade.
async def CUSTOM_MIGRATION_SCRIPT_DOWNGRADE(dbconn: BaseDBAsyncClient):
    yield

Migration pseudo-code:

# Evaluate the migration file
import migration_file

# Connect to the database
dbconn = get_db_conn()

# Check the previous migration ID against the database
if migration_file.PREVIOUS_MIGRATION_ID is not None and get_last_migration(dbconn) != migration_file.PREVIOUS_MIGRATION_ID:
    raise ValueError('You are running migrations out of order!')

# Make sure the operations update the schema correctly in case the user edited them
schema = migration_file.OLD_SCHEMA
for operation in migration_file.MIGRATION_OPERATIONS:
    modify_schema(schema, operation)
if schema != migration_file.NEW_SCHEMA:
    raise ValueError('The operations do not modify the schema correctly!')

# Run the first part of the user's upgrade logic
asyncgen = await migration_file.CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn)
await anext(asyncgen)

# Run the migration operations
for operation in migration_file.MIGRATION_OPERATIONS:
    execute_migration_operation(operation, dbconn, downgrade=False)

# Run the second part of the user's upgrade logic
try:
    await anext(asyncgen)
except StopIteration:
    pass
else:
    raise RuntimeError('Only one yield statement is allowed!')

# Update the migrations
add_migration_to_migrations_table(dbconn, migration_file.MIGRATION_ID)

# Done

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions