Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-869574kvp update snomed preprocessing naming #469

Merged
merged 13 commits into from
Sep 30, 2024

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Jul 25, 2024

As per #467

Using a regular expression to capture the release from the folder name.

Upon init, this check is non-strict (i.e no exception is raised if release is unable to be found) since it could refer to a parent folder.

When parsing through the subfolder, the check is string (i.e an exception is raised if no release is found) since those are expected to be Snomed release folders (with SnomedCT in the names).

Added some relevant tests for working and failing folder names. Both just the base names as well as longer paths. For both strict and non-strict mode.

@tomolopolis
Copy link
Member

@mart-r
Copy link
Collaborator Author

mart-r commented Jul 25, 2024

This might need some additions:

  • We may want to do the release calculations (_check_path_and_release) at init to allow failing early in case it's an incorrect / unknown Snomed release
  • It might be useful to automatically detect the UK / UK_drug / AU extension types per sub-folder so as to allow preprocessing all at once

Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Separate extensions into an Enum.
Do the release/paths check at init to allow for early failures in case of issues
Move common avoids to a common location.
Fix UK Drug relationship name
Remove some clutter by separating common prefixes for release types and file names.
Remove some clutter by separating common suffixes for release types.
New abstraction. Use supprted extensions which describe their file formats along with bundles which give some further insight and control.
For AU models, the folder name seems to be 'SnomedCT_Release_AU1000036_20240630T120000Z', so the 1st part is just 'Release' and the 2nd part is indicative of AU.
Add usage of this where relevant.
Add patch for files/folders where applicable.
Change the paths of attributes where applicable.
@mart-r
Copy link
Collaborator Author

mart-r commented Sep 11, 2024

PS:
Reworked the PR. Would need a new review.

Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mart-r mart-r merged commit a9544f7 into master Sep 30, 2024
8 checks passed
@mart-r mart-r deleted the CU-869574kvp-update-snomed-preprocessing-naming branch November 18, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants