RFC: Better migration architecture

From my experience using aerich in a production system, the current migration approach leaves much to be desired.

Here are my complaints with the current architecture:
- Using raw SQL is not database-independent, so you are tied to using the same database type for testing and production.
- Sometimes the SQL commands are out of order.
- You can't do custom migration logic. All you can do is tweak a generated SQL script.

This is what I've come up with as a better approach:

I think a migration file should consist of the following items:
- The current and previous migration ID
- The old schema as a dict literal
- The new schema as a dict literal
- A list of migration operations such as field renames, table creation, etc. with database-specific information only where absolutely unavoidable.
- A function which can be used to do manual data migrations in Python or SQL, and its complement for downgrades

Benefits of this approach:
- Database-agnostic
- A clear way of describing what is going to happen to the database without having to mess with SQL.
- Custom upgrade and downgrade logic
- The aerich tool can lint the operations so that the schema stays consistent even if a user edits the migration file.
- You could ask aerich to generate an null migration file and then implement some data-only (non-schema) updates in the custom migration logic.

Conceptual example:
```python
# This is a migration generated by the aerich tool.
# You may edit this file, but be warned that your changes will be overwritten
# if you run aerich migrate again without upgrading the database to this migration.

from tortoise import BaseDBAsyncClient

# This is the ID of the migration previous to this one, or None if this is the first.
PREVIOUS_MIGRATION_ID = '20241121101234'

# This is the ID of the migration, used to determine if the migration has been run.
MIGRATION_ID = '20241121142100'

# This is the schema of the database as of the last migration (maybe could be imported directly from previous migration?)
OLD_SCHEMA = {
    'MyModel': {
        'table_name': 'my_models',
        'fields': [
            'id': {
                'type': 'IntField',
                'pk': True,
                'null': False,
            },
            'name': {
                'type': 'CharField',
                'pk': False,
                'null': False,
                'max_len': 255,
            },
            ...
        ],
    },
    ....
}

# This is the schema of the database after this migration has been run.
NEW_SCHEMA = {
    'MyModel': {
        'table_name': 'my_models',
        'fields': [
            'id': {
                'type': 'IntField',
                'pk': True,
                'null': False,
            },
            'display_name': {
                'type': 'CharField',
                'pk': False,
                'null': False,
                'max_len': 255,
            },
            ...
        ],
    },
    ....
}

# These are directives to the migration executor about what the migration consists of.
MIGRATION_OPERATIONS = {
    {'type': 'rename_field', 'table': 'my_models', 'old_name': 'name', 'new_name': 'display_name'},
}

# This is where you can implement fancy migration logic if the simple cases don't meet your needs.
async def CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn: BaseDBAsyncClient):
    # Example (pseudo-code):
    names = [(id, f"Hi, I'm {name}") for id, name in dbconn.execute('SELECT id, name FROM my_models').all()]

    # Wait for the migrations to be executed
    yield

    dbconn.executemany('UPDATE my_models SET display_name=? WHERE id=?', names)

# This is where you can implement the reverse logic in the same manner if you need to downgrade.
async def CUSTOM_MIGRATION_SCRIPT_DOWNGRADE(dbconn: BaseDBAsyncClient):
    yield

```

Migration pseudo-code:
```python
# Evaluate the migration file
import migration_file

# Connect to the database
dbconn = get_db_conn()

# Check the previous migration ID against the database
if migration_file.PREVIOUS_MIGRATION_ID is not None and get_last_migration(dbconn) != migration_file.PREVIOUS_MIGRATION_ID:
    raise ValueError('You are running migrations out of order!')

# Make sure the operations update the schema correctly in case the user edited them
schema = migration_file.OLD_SCHEMA
for operation in migration_file.MIGRATION_OPERATIONS:
    modify_schema(schema, operation)
if schema != migration_file.NEW_SCHEMA:
    raise ValueError('The operations do not modify the schema correctly!')

# Run the first part of the user's upgrade logic
asyncgen = await migration_file.CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn)
await anext(asyncgen)

# Run the migration operations
for operation in migration_file.MIGRATION_OPERATIONS:
    execute_migration_operation(operation, dbconn, downgrade=False)

# Run the second part of the user's upgrade logic
try:
    await anext(asyncgen)
except StopIteration:
    pass
else:
    raise RuntimeError('Only one yield statement is allowed!')

# Update the migrations
add_migration_to_migrations_table(dbconn, migration_file.MIGRATION_ID)

# Done
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Better migration architecture #366

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC: Better migration architecture #366

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions