Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename loaders & formats #85

Open
mikix opened this issue Nov 22, 2022 · 0 comments
Open

Rename loaders & formats #85

mikix opened this issue Nov 22, 2022 · 0 comments
Labels
good first issue Good for newcomers

Comments

@mikix
Copy link
Contributor

mikix commented Nov 22, 2022

There is some light refactoring that could be done to make the code easier to read.

Currently

  • Loader: a class that imports input data from a variety of places (local i2b2, bulk FHIR server export, etc) into a standard format of FHIR njdson, sitting in a local temporary directory.
  • Format: a class that exports output data into the final location, in a few different formats (json file tree, ndjson files, parquet files, and soon a delta lake).
  • Root: a class that abstracts filesystem access across cloud and local disk (basically a light wrapper around fsspec and is used by both Loaders and Formats).

The Problem

Cumulus ETL is an ETL. Standing for extract, transform, load. It might be considered confusing therefore that we are using Loaders to do the "extraction." (Though, I think it's crazy that the output step of ETL is called load in the first place. So... maybe it's better to just avoid the word "load"?)

Format and Root I don't hate. But they could be clearer maybe, especially in contrast to whatever we call the input step. And Root is defined in a file called store.py, which is another "output" word we are throwing in the mix.

One Solution

  • Loader -> Reader
  • Format -> Writer
  • I dunno on Root, maybe it stays as is?

Read/Write are very overloaded terms though. Even though that's what's happening here, there's plenty of other reading and writing happening in the ETL. It might be nice to have a more specific term of art? An insane suggestion would be something so specific like Ingester and Disgorger -- not ideal words, but they become terms of art instead of generic words... Dunno.

ETL Solution

We can lean into the text of ETL:

  • Loader -> Extractor
  • Format -> Loader

I personally think it's best to avoid the use of "loader". But I could be convinced otherwise probably.

Better Ideas

Any folks got a better naming ideas? Naming is hard. This might be a good thinker for Matt when he starts, and a good way to poke around the code base safely.

@mikix mikix added the good first issue Good for newcomers label Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant