Skip to content

Conversation

@ssorou1
Copy link
Collaborator

@ssorou1 ssorou1 commented Jun 6, 2025

Characterize code coverage, identify functions that need more unit tests and create a dependency mapping graph

Additions

Removals

Changes

Testing

Screenshots

Notes

This Excel spreadsheet is created denoting the name of the functions in fs_algo_train_eval.py, and the priority for creating/updating the unit tests (based on simplicity criteria): the higher the number, the lower the priority, and some remarks. The spreadsheet is in:

pkg/fs_algo/fs_algo/Unit test priority.xlsx

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

Soroush Sorourian added 23 commits May 7, 2025 09:11
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an analysis, so not something that should be part of the codebase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! This file is moved to google drive for Regionalization

G.add_edges_from(edges)
return G

# def draw_graph(G, title="Function Dependency Graph"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you feel comfortable removing this commented out code yet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Thanks

G.add_edges_from(edges)
return G

# def draw_graph(G, title="Function Dependency Graph"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented out code something we can discard, or something worth keeping? If you're on the fence, another option could be to save commented out code in a file outside of the repo/not tracked by git as a just-in-case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was initially developed to create the dependency graph for fs_prep. The code is now refactored to create any given .py file. Thus, this file is no longer needed and removed. Thanks.

@@ -0,0 +1,122 @@
import ast
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide documentation about what this does, who made it, when, and how you run it in comments at the top of this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! BTW, The dependency_grapher.py is replaced by multi_dependency_grapher.py following the needed external dependency analysis as commented below.

net.show(html_output, notebook=False)
print(f"Interactive graph saved as: {html_output}")

if __name__ == "__main__":
Copy link
Collaborator

@glitt13 glitt13 Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this using argparse and allow a user to run dependency grapher on multiple files so that functions in file A that call functions and file B may be graphed.

EDIT: allow a user to specify the file save directory for the plot locations so that the preferably do not write into the repo. Create the directory if it doesn't exist. Issue a warning in that case. Report the file save location as a normal terminal message when saving occurs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Please note that dependency_grapher.py is replaced by multi_dependency_grapher.py. Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the top, provide the description of this file and how it can be used as comments at a bare minimum.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! BTW, The dependency_grapher.py is replaced by multi_dependency_grapher.py following the needed external dependency analysis as commented below.

Copy link

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssorou1, Guy made some suggestions that I would have otherwise made, and I think there is a duplicated file. So at minimum I think those changes are needed.

Also, there are several changes in the PR that don't appear (on the surface at least) to be related to detecting code coverage, assessing testing, and dependency graphing. I commented on existing files with changes, but there also seem to be some new added files related to schemas and validation that I didn't comment on in-line.

Being less familiar with the project, it is possible I don't understand some of the ways things need to be connected, so please correct if those changes are within scope here. But otherwise, I suggest those be moved out of this PR to a separate branch or branches.

@@ -0,0 +1,122 @@
import ast

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this entire file just a duplicate of pkg/fs_algo/fs_algo/dependency_graph/dependency_grapher.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was initially developed to create the dependency graph for fs_prep. The code is now refactored to create any given .py file. Thus, this file is no longer needed and removed. Thanks.

args = parser.parse_args()

path_pred_config = Path(args.path_pred_config) #Path(f'~/git/formulation-selector/scripts/eval_ingest/xssa/xssa_pred_config.yaml')
config_dir = path_pred_config.parent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least at first glance, this doesn't look like it belongs in the scope of this PR. I think that may true of all the changes in this file. Let me know if I've misunderstood their purpose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, Bobby. These belong to PR#63 that are under review.

import fs_algo.fs_algo_train_eval as fsate
import ast
import numpy as np
import pandera as pa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the previous file, it appears these changes may relate to functionality separate from code coverage, testing, and dependency mapping.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, Bobby. These belong to PR#63 that are under review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed w/ Soroush and we're going to roll with the pandera component for convenience. Comment relevant to this line. Soroush, please address the following warning to help out with future pandera compatibility:

FutureWarning: Importing pandas-specific classes and functions from the
top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

old import

import pandera as pa

new import

import pandera.pandas as pa


If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Thanks.

if __name__ == "__main__":
parser = argparse.ArgumentParser(description = 'process the algorithm config file')
parser.add_argument('path_tfrm_cfig', type=str, help='Path to the YAML configuration file specific for algorithm training')
parser.add_argument('--validate', action='store_true', help='Enable schema loading for data validation')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the last two files, this seems to be adding functionality beyond the scope of this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, Bobby. These belong to PR#63 that are under review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts/eval_ingest/{dataset_shortname}/ directory is reserved for config files or scripts that are very specific to a single dataset.

This schema is very general, so it should live somewhere inside the package of interest. I suggest creating a new folder within a package directory for storing the pydantic schemas, e.g. pkg/fs_algo/fs_algo/file_schemas/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also move this file to schema-specific subdirectory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented. Thanks!


# Validating DataFrame object
if arg_val:
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing this out for a different set of data, I came across some problems that may have been related to internal errors in pandera. Does this following code suggestion work for you instead? It prevents a fatal error from happening on my end:

try:
     schemas.schema_df_attr.validate(df_attr)
     print("✅ DataFrame validated successfully.")
except (pa.errors.SchemaError, pa.errors.SchemaErrors) as e:
    print(f"❌ Validation failed: {e}")
    print(e.args)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it worked for me. Updated the code accordingly. Thanks!

@ssorou1 ssorou1 requested review from glitt13 and robertbartel June 16, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants