-
Notifications
You must be signed in to change notification settings - Fork 5
Data Migration from v1 to v2 #14
Description
Summary
Data migration from the existing DynamoDB database to the v2 RDS instance.
Background
The EasyCLA v2 system will need to migrate existing data from the old v1 system into the new database model. This will require a migration script that can read from the existing DynamoDB database from {DEV, STAGING, and PROD} and write to the Aurora RDS PostgreSQL database tables in {DEV, STAGING, and PROD}. We will initially test by exporting data from DynamoDB DEV to a local PostgreSQL instance on the developer's machine. We will work to migrate the CLA specific data including signatures, permissions, and other metadata. We will not transfer data that is currently duplicated in our 'system of record' database: Salesforce.
User Stories
- As a developer, I want to leverage a script to export data from DynamoDB and import the data into AWS RDS PostgreSQL.
- As a developer, I want to re-run the export multiple times without duplicating records in the RDS database.
- As a developer, I want to specify the STAGE to export/import - e.g. LOCAL, DEV, STAGING, PROD environments. This will allow us to run a migration in each environment.
- As a developer, I want to specify the RDS host, port, user, password and database details. Deployment of the RDS system will require separate connection information for each environment
- As a developer, I want to see a report of what was exported/imported. A summary should be provided describing how many records were processed and any errors that occurred. The report should also include how long the process took.
- As a developer, I want to run the migration in
--dry-runmode which exercises the code but does not import the data.
Tasks
- Extend the existing tools folder scripts to include a python migration script
- Include any additional libraries in the requirements.txt file (e.g. dynamodb and postgresql drivers)
- provide the main routine with command-line options via the click library
- allow the user to specify the AWS region/credentials
- allow the user to specify the PostgreSQL database connection details
- provide a README with documentation on how to set up and run the tool with working examples (don't show credentials)
- Extract the data from the signatures table
- Extract the data from the companies table
- Extract the data from the projects table
- Extract the data from the users table
- Extract the data from the user-permissions table
- Extract the data from the repositories table
- Extract the data from the company-invitations table
- Extract the data from the github-orgs table
- Extract the data from the gerrit-instances table
- Import data into the relevant RDS tables (schema is TODO for some of the tables)
- Provide a migration report of what was exported/imported
- Private a report indicating how long the process took (this will give us a gauge on how long it will take for other environments).
Acceptance Criteria
The "done" criteria:
- DEV data is migrated from v1 to v2 migrated.
- Demonstrate the set of capabilities to the product team while the code is
running in the DEV environment.
References
See @dealako for script setup examples and usage of existing v1 python models.