Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/move fixtures to migrations #73

Merged

Conversation

mwestphall
Copy link
Collaborator

  • Create a database migration for importing the baseline schema from the most recent prod dump
  • Move all "fixture" database migrations into the migrations cli subsystem system
  • Give the ability to specify whether a migration depends on one or more other migrations having completed
  • Pull up a lot of the logic from individual subclasses of Migration into the main Migration class to allow for more concise definition of new migrations
  • Add a docker-compose for spinning up a local postgis instance with appropriate extensions installed

Copy link
Member

@davenquinn davenquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! A few changes to take a look at, but it looks nearly ready to merge.

CAN_APPLY = "can_apply"

# The postconditions for this migration are met, so it doesn't need to be applied
APPLIED = "applied"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any value in having a state UNNECESSARY which would mean that the postconditions are met but the preconditions are not? This could be valuable for migrations that are to recover from invalid/legacy database states of some sort.

Put another way, is there value to having the difference between CANT_APPLY and not APPLIED (needs running but can't be right now) and CANT_APPLY and APPLIED (does not need running since database is already in a valid state)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the current implementation of checking for whether a migration is applied is sophisticated enough to handle that use case, since it just checks the current state of tables in the database and assigns one of the 3 labels based on that state. We might be able to get at this by looking further down the dependency graph, eg. checking whether any dependents of a CANT_APPLY migration are in an APPLIED state.

MacrostratCoreMigration,
]
# Find all subclasses of Migration among imported modules
migrations = Migration.__subclasses__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am into this but maybe it's a bit too implicit? Should there be a register_migration method instead?

  • this could be wrapped into a MigrationsManager class
  • this might allow us to break up migrations by subsystem, if desired

Copy link
Member

@davenquinn davenquinn Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the DROP SCHEMA macrostrat command attempted in the MacrostratCoreMigration, I think it'll be necessary to disable that migration for now at least on the main branch. Don't want to have data-destroying migrations quite yet until we're satisfied that the MariaDB -> Postgres migration into the Macrostrat schema is functional. Then I think we can rewrite that baseline migration quite substantially.

for cls in migrations:
# Initialize migration
_migration = cls()
for _migration in instances:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One pattern we could eventually add here, which I have successfully used in the past, is to have a "dry run" migration automatically run first, based on a schema-only clone of the database.

if db is not None:
db.session.flush()
db.session.close()
db = Database(PG_DATABASE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add this method to the macrostrat.database module.

depends_on = ['map-source-slug']

# Confirm that the tables created by the API v3 migrations are present
postconditions = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postcondition checks are a fantastic addition. Looks like you went for higher-order functions simply to enhance composability, yeah?


# Confirm that the tables created by the API v3 migrations are present
postconditions = [
exists("storage","object_group","object"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to use a pattern like tables_exist("object_group", "object", schema="storage") for enhanced readability.


from psycopg2.sql import Identifier

MATCHES_SLUG_SQL = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look used here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be merged with local-root/docker-compose.yaml? This is what I use for a local copy of Macrostrat currently. Could perhaps rename this directory for clarity.

@@ -0,0 +1,5 @@
FROM postgis/postgis:15-3.4
RUN apt-get update && apt-get install -y --no-install-recommends \
postgresql-$PG_MAJOR-pgaudit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I've been meaning to get pgaudit working in my local environment.

I've used imresamu/postgis:15-3.4 while postgis/docker-postgis#216 is open, to help those of use on ARM machines.

MacrostratCoreMigration,
]
# Find all subclasses of Migration among imported modules
migrations = Migration.__subclasses__()
Copy link
Member

@davenquinn davenquinn Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the DROP SCHEMA macrostrat command attempted in the MacrostratCoreMigration, I think it'll be necessary to disable that migration for now at least on the main branch. Don't want to have data-destroying migrations quite yet until we're satisfied that the MariaDB -> Postgres migration into the Macrostrat schema is functional. Then I think we can rewrite that baseline migration quite substantially.

@mwestphall mwestphall force-pushed the feature/move-fixtures-to-migrations branch from 8dcccc2 to 2af13e5 Compare July 29, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants