Ss clean dependency grapher #66

ssorou1 · 2025-06-06T17:38:52Z

Characterize code coverage, identify functions that need more unit tests and create a dependency mapping graph

Additions

Removals

Changes

Testing

Screenshots

Notes

This Excel spreadsheet is created denoting the name of the functions in fs_algo_train_eval.py, and the priority for creating/updating the unit tests (based on simplicity criteria): the higher the number, the lower the priority, and some remarks. The spreadsheet is in:

`pkg/fs_algo/fs_algo/Unit test priority.xlsx`

Todos

Checklist

Testing checklist

Target Environment support

Windows
Linux
Browser

Accessibility

Keyboard friendly
Screen reader friendly

Other

…mas.py

…he code

…dat_resp

…coverage and dependency grapher

…pher to a subfolder

glitt13 · 2025-06-09T15:26:01Z

pkg/fs_algo/fs_algo/Unit test priority.xlsx

This is an analysis, so not something that should be part of the codebase.

True! This file is moved to google drive for Regionalization

glitt13 · 2025-06-09T15:26:33Z

pkg/fs_algo/fs_algo/dependency_graph/dependency_grapher.py

+    G.add_edges_from(edges)
+    return G
+
+# def draw_graph(G, title="Function Dependency Graph"):


Do you feel comfortable removing this commented out code yet?

Removed. Thanks

glitt13 · 2025-06-09T15:28:00Z

pkg/fs_algo/fs_algo/dependency_graph/fs_prep_dependency_grapher.py

+    G.add_edges_from(edges)
+    return G
+
+# def draw_graph(G, title="Function Dependency Graph"):


Is this commented out code something we can discard, or something worth keeping? If you're on the fence, another option could be to save commented out code in a file outside of the repo/not tracked by git as a just-in-case.

It was initially developed to create the dependency graph for fs_prep. The code is now refactored to create any given .py file. Thus, this file is no longer needed and removed. Thanks.

glitt13 · 2025-06-09T17:23:45Z

pkg/fs_algo/fs_algo/dependency_graph/dependency_grapher.py

@@ -0,0 +1,122 @@
+import ast


Provide documentation about what this does, who made it, when, and how you run it in comments at the top of this function.

Done! BTW, The dependency_grapher.py is replaced by multi_dependency_grapher.py following the needed external dependency analysis as commented below.

glitt13 · 2025-06-09T17:24:40Z

pkg/fs_algo/fs_algo/dependency_graph/dependency_grapher.py

+    net.show(html_output, notebook=False)
+    print(f"Interactive graph saved as: {html_output}")
+
+if __name__ == "__main__":


Refactor this using argparse and allow a user to run dependency grapher on multiple files so that functions in file A that call functions and file B may be graphed.

EDIT: allow a user to specify the file save directory for the plot locations so that the preferably do not write into the repo. Create the directory if it doesn't exist. Issue a warning in that case. Report the file save location as a normal terminal message when saving occurs.

Done! Please note that dependency_grapher.py is replaced by multi_dependency_grapher.py. Thanks

glitt13 · 2025-06-09T17:25:49Z

scripts/eval_ingest/xssa/schemas.py

At the top, provide the description of this file and how it can be used as comments at a bare minimum.

Done! BTW, The dependency_grapher.py is replaced by multi_dependency_grapher.py following the needed external dependency analysis as commented below.

…he plot locations

robertbartel

@ssorou1, Guy made some suggestions that I would have otherwise made, and I think there is a duplicated file. So at minimum I think those changes are needed.

Also, there are several changes in the PR that don't appear (on the surface at least) to be related to detecting code coverage, assessing testing, and dependency graphing. I commented on existing files with changes, but there also seem to be some new added files related to schemas and validation that I didn't comment on in-line.

Being less familiar with the project, it is possible I don't understand some of the ways things need to be connected, so please correct if those changes are within scope here. But otherwise, I suggest those be moved out of this PR to a separate branch or branches.

robertbartel · 2025-06-10T13:47:24Z

pkg/fs_algo/fs_algo/dependency_graph/fs_prep_dependency_grapher.py

@@ -0,0 +1,122 @@
+import ast


Is this entire file just a duplicate of pkg/fs_algo/fs_algo/dependency_graph/dependency_grapher.py?

It was initially developed to create the dependency graph for fs_prep. The code is now refactored to create any given .py file. Thus, this file is no longer needed and removed. Thanks.

robertbartel · 2025-06-10T13:50:54Z

pkg/fs_algo/fs_algo/fs_pred_algo.py

    args = parser.parse_args()

    path_pred_config = Path(args.path_pred_config) #Path(f'~/git/formulation-selector/scripts/eval_ingest/xssa/xssa_pred_config.yaml') 
+    config_dir = path_pred_config.parent


At least at first glance, this doesn't look like it belongs in the scope of this PR. I think that may true of all the changes in this file. Let me know if I've misunderstood their purpose.

You are right, Bobby. These belong to PR#63 that are under review.

robertbartel · 2025-06-10T13:54:51Z

pkg/fs_algo/fs_algo/fs_proc_algo.py

 import fs_algo.fs_algo_train_eval as fsate
 import ast
 import numpy as np
+import pandera as pa


As with the previous file, it appears these changes may relate to functionality separate from code coverage, testing, and dependency mapping.

You are right, Bobby. These belong to PR#63 that are under review.

I discussed w/ Soroush and we're going to roll with the pandera component for convenience. Comment relevant to this line. Soroush, please address the following warning to help out with future pandera compatibility:

FutureWarning: Importing pandas-specific classes and functions from the top-level pandera module will be **removed in a future version of pandera**. If you're using pandera to validate pandas objects, we highly recommend updating your import:

old import

import pandera as pa

new import

import pandera.pandas as pa

If you're using pandera to validate objects from other compatible libraries like pyspark or polars, see the supported libraries section of the documentation for more information on how to import pandera: https://pandera.readthedocs.io/en/stable/supported_libraries.html

Addressed. Thanks.

robertbartel · 2025-06-10T13:57:00Z

pkg/fs_algo/fs_algo/fs_tfrm_attrs.py

 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description = 'process the algorithm config file')
    parser.add_argument('path_tfrm_cfig', type=str, help='Path to the YAML configuration file specific for algorithm training')
+    parser.add_argument('--validate', action='store_true', help='Enable schema loading for data validation')


Similarly to the last two files, this seems to be adding functionality beyond the scope of this PR.

You are right, Bobby. These belong to PR#63 that are under review.

…that case. Report the file save location as a normal terminal message when saving occurs.

…he file (external dependencies)

…er.py files

glitt13 · 2025-06-12T17:43:00Z

scripts/eval_ingest/xssa/schemas.py

The scripts/eval_ingest/{dataset_shortname}/ directory is reserved for config files or scripts that are very specific to a single dataset.

This schema is very general, so it should live somewhere inside the package of interest. I suggest creating a new folder within a package directory for storing the pydantic schemas, e.g. pkg/fs_algo/fs_algo/file_schemas/

glitt13 · 2025-06-12T17:43:33Z

pkg/fs_algo/fs_algo/pydantic_schemas.py

Also move this file to schema-specific subdirectory.

Implemented. Thanks!

glitt13 · 2025-06-12T22:59:27Z

pkg/fs_algo/fs_algo/fs_pred_algo.py

+
+        # Validating DataFrame object
+        if arg_val:
+            try:


When testing this out for a different set of data, I came across some problems that may have been related to internal errors in pandera. Does this following code suggestion work for you instead? It prevents a fatal error from happening on my end:

try: schemas.schema_df_attr.validate(df_attr) print("✅ DataFrame validated successfully.") except (pa.errors.SchemaError, pa.errors.SchemaErrors) as e: print(f"❌ Validation failed: {e}") print(e.args)

Yes, it worked for me. Updated the code accordingly. Thanks!

…fs_pred accordingly

… and update the script

Soroush Sorourian added 23 commits May 7, 2025 09:11

Add pandera schema and validation for df_attr

4398a20

Add pandera schema and validation for gdf_comid

6edaf8c

Add pandera schema and validation for rslt_eval_df

53209a7

Add pandera schema and validation for df_attr in fs_pred

2e9a898

Add pandera schema and validation for df_pred in fs_pred

b53ed54

Add pandera schema and validation for attr_sel in fs_pred

462ed8d

Add pandera schema and validation for dat_resp in fs_proc

0e0c90c

validate pipeline_data in fs_pred with pydantic

28e162d

Update fs_proc for validation of dat_resp

a4b61d7

create a py file for all the schemas

504854e

add validation argument in fs_proc and remove all the schemas to sche…

755b010

…mas.py

update schemas.py to include fs_pred schemas

2e0bada

Update fs_pred to read schemas from py file and add validation argument

153e691

update schemas

80144ea

remove pydantic from fs_pred to the pydantic_schema file and update t…

d89f900

…he code

update schemas.py to read data_source from attr_source_types.ymal file

9314241

update schemas.py to read the metrics from xssa_prep_config.yaml

2fec409

update schemas.py to dynamically read metrics and apply into schemas_…

7df5880

…dat_resp

update fs_proc to dynamically assign metrics to dat_resp for validation

b4d35ec

create schemas and validate for fs_tfrm_attrs.py

424f31f

add dependency grapher to create html file and do coverage test

4c97e8d

Create a list of functions and prioriy to create unit tests based on …

edea4ad

…coverage and dependency grapher

deleted function_graph html and png outputs. Moved the dependency gra…

22eb40b

…pher to a subfolder

ssorou1 requested a review from glitt13 June 6, 2025 17:38

ssorou1 assigned robertbartel Jun 6, 2025

develop dependency grapher for fs_prep

1e3adba

glitt13 reviewed Jun 9, 2025

View reviewed changes

Soroush Sorourian added 2 commits June 9, 2025 13:42

remove the commented out section

1545f2d

argparse package. Allow user to specify the file save directory for t…

ea3205d

…he plot locations

update the png name to be the same as html name

a75f011

robertbartel suggested changes Jun 10, 2025

View reviewed changes

Soroush Sorourian added 5 commits June 10, 2025 10:51

Create the output directory if it does not exist. Issue a warning in …

dc3375c

…that case. Report the file save location as a normal terminal message when saving occurs.

develop multi dependency grapher to analyzes function calls outside t…

09619ef

…he file (external dependencies)

Add documentation and usage guide

1e5315e

delete unnecessary dependency_grapher.py and fs_prep_dependency_graph…

69d8fb3

…er.py files

remove unit test priority xlsx file

ddb7cb5

glitt13 reviewed Jun 12, 2025

View reviewed changes

Soroush Sorourian added 2 commits June 16, 2025 13:16

create file_schemas directory to move pydantic schemas to and update …

db9de9e

…fs_pred accordingly

refactor schemas. Move pydantic and pandera schemas to a new location…

088c1dc

… and update the script

ssorou1 requested review from glitt13 and robertbartel June 16, 2025 21:17

Ss clean dependency grapher #66

Are you sure you want to change the base?

Ss clean dependency grapher #66

Uh oh!

Conversation

ssorou1 commented Jun 6, 2025

Additions

Removals

Changes

Testing

Screenshots

Notes

pkg/fs_algo/fs_algo/Unit test priority.xlsx

Todos

Checklist

Testing checklist

Target Environment support

Accessibility

Other

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glitt13 Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertbartel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

old import

new import

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

`pkg/fs_algo/fs_algo/Unit test priority.xlsx`

glitt13 Jun 9, 2025 •

edited

Loading