Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

370 add the data quality report to the csv as a separate sheet #372

Merged

Conversation

reecehill
Copy link
Contributor

@reecehill reecehill commented Nov 13, 2024

This pull request includes significant refactoring and enhancements to the CSV handling functionality in the project/npda/general_functions module. The changes primarily focus on organizing the CSV-related functions into a dedicated subdirectory, adding new functionalities, and improving existing ones.

Key changes include:

Refactoring and Reorganization:

  • Moved CSV-related functions to a new subdirectory project/npda/general_functions/csv and updated import paths accordingly. [1] [2] [3] [4] [5] [6]

New Functionality:

  • Added a new function download_file to handle file downloads and refactored download_csv to use this function. Introduced a new download_xlsx function for downloading XLSX files. [1] [2]

Enhancements:

  • Introduced the csv_parse function to parse CSV files into pandas DataFrames, handling various edge cases and data type conversions.
  • Added the write_errors_to_xlsx function to generate an XLSX file highlighting validation errors in the CSV data.

Bug Fixes:

  • Fixed a bug in the serialize_error function to ensure keys are correctly serialized as integers.

Additional Improvements:

  • Ensured that errors are written to an XLSX file if the CSV file is created during the upload process.

@reecehill reecehill linked an issue Nov 13, 2024 that may be closed by this pull request
…s-a-separate-sheet

Additional changes:
- Refactor CSV handling (necessary due to circular imports): reorganize csv-related functions into a dedicated csv module and update imports
@reecehill
Copy link
Contributor Author

Note to self:

Due to a circular import when importing project.npda.general_functions.csv_upload, I have:

  1. Moved csv-related functions into a submodule project.npda.general_functions.csv

  2. To further improve distinction between read_csv() pd.read_csv(),

    project.npda.general_functions.read_csv.read_csv()

    has been renamed (and moved) to

    project.npda.general_functions.csv.csv_parse.csv_parse()

@reecehill reecehill self-assigned this Nov 13, 2024
@reecehill reecehill marked this pull request as ready for review November 13, 2024 16:10
Copy link
Member

@eatyourpeas eatyourpeas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking very nice @reecehill

@eatyourpeas eatyourpeas merged commit bbe7dbf into live Nov 15, 2024
1 check passed
@eatyourpeas eatyourpeas deleted the 370-add-the-data-quality-report-to-the-csv-as-a-separate-sheet branch November 15, 2024 11:32
@mbarton
Copy link
Member

mbarton commented Nov 15, 2024

Seen on STAGING (created by @reecehill and merged by @eatyourpeas 7 minutes and 41 seconds ago) Please check your changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add the data quality report to the csv as a separate sheet
3 participants