-
-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Labels
data-cleaningTasks related to cleaning & regularizing data during ETL.Tasks related to cleaning & regularizing data during ETL.docsDocumentation for users and contributors.Documentation for users and contributors.ferc1Anything having to do with FERC Form 1Anything having to do with FERC Form 1xbrlRelated to the FERC XBRL transitionRelated to the FERC XBRL transition
Description
The transform process uses classes, such as AbstractTableTransformer
and Ferc1AbstractTableTransformer
, coupled with a standard set of method calls to modify tables. Whether the methods alter the table in question depends on some default parameter settings and the parameters specified in params.py
.
Right now, the generic transformation functions defined in classes.py
look like this:
def func_name():
if params is None:
params = self.params.drop_invalid_rows
logger.info(f"{self.table_id.value}: Dropping remaining invalid rows.")
Whereas the ones in the transform/ferc1.py
module look like this:
def func_name():
if params is None:
params = self.params.drop_duplicate_rows_dbf
if params.table_name:
logger.info(
f"{self.table_id.value}: Dropping rows where primary key and data "
"columns are duplicated."
)
The extra conditional before the logging output ensures that the logger is only printed if there are indeed valid parameters for that function. This is good because it avoid confusion as to whether the transformation was applied to a given table or not.
Other things to consider:
- Ideally the logging outputs should also contain specific information like the number and percent of rows dropped or modified. I say number and percent, because sometimes the percent is rounded to 0 but the number is not actually 0!
- Should we standardize what the conditional is that comes after
if params is None
like if != None?? - This issue could also serve to clarify where the
if params
conditional is called, whether that's in the top-level function definition or in the class-level function definition. - This issue can also fix instances of
if not params
that should beif params is None
for consistency.
Metadata
Metadata
Assignees
Labels
data-cleaningTasks related to cleaning & regularizing data during ETL.Tasks related to cleaning & regularizing data during ETL.docsDocumentation for users and contributors.Documentation for users and contributors.ferc1Anything having to do with FERC Form 1Anything having to do with FERC Form 1xbrlRelated to the FERC XBRL transitionRelated to the FERC XBRL transition
Type
Projects
Status
Icebox