If using docker:
docker build -t pipeline_app .
docker run -v "$(pwd)/data:/app/data" -it pipeline_app
Else:
python ./etl/run.py
- Pass in configuration as command line argument instead of contained within code. Update code to work with yaml instead.
- Investigate performance improvements (sqlalchemy querying)
- Update configuration paths relevant for deployment environment
- Unit and integration tests
- Replace sqllite with postgres prod database
- Add code to check for quality issues (e.g. capitalisation differences, spelling errors etc which could result in duplicate keys)
- CI/CD