-
Notifications
You must be signed in to change notification settings - Fork 10
Ss clean dependency grapher #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…coverage and dependency grapher
…pher to a subfolder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an analysis, so not something that should be part of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! This file is moved to google drive for Regionalization
| G.add_edges_from(edges) | ||
| return G | ||
|
|
||
| # def draw_graph(G, title="Function Dependency Graph"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you feel comfortable removing this commented out code yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Thanks
| G.add_edges_from(edges) | ||
| return G | ||
|
|
||
| # def draw_graph(G, title="Function Dependency Graph"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this commented out code something we can discard, or something worth keeping? If you're on the fence, another option could be to save commented out code in a file outside of the repo/not tracked by git as a just-in-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was initially developed to create the dependency graph for fs_prep. The code is now refactored to create any given .py file. Thus, this file is no longer needed and removed. Thanks.
| @@ -0,0 +1,122 @@ | |||
| import ast | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide documentation about what this does, who made it, when, and how you run it in comments at the top of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! BTW, The dependency_grapher.py is replaced by multi_dependency_grapher.py following the needed external dependency analysis as commented below.
| net.show(html_output, notebook=False) | ||
| print(f"Interactive graph saved as: {html_output}") | ||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor this using argparse and allow a user to run dependency grapher on multiple files so that functions in file A that call functions and file B may be graphed.
EDIT: allow a user to specify the file save directory for the plot locations so that the preferably do not write into the repo. Create the directory if it doesn't exist. Issue a warning in that case. Report the file save location as a normal terminal message when saving occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Please note that dependency_grapher.py is replaced by multi_dependency_grapher.py. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the top, provide the description of this file and how it can be used as comments at a bare minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! BTW, The dependency_grapher.py is replaced by multi_dependency_grapher.py following the needed external dependency analysis as commented below.
robertbartel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssorou1, Guy made some suggestions that I would have otherwise made, and I think there is a duplicated file. So at minimum I think those changes are needed.
Also, there are several changes in the PR that don't appear (on the surface at least) to be related to detecting code coverage, assessing testing, and dependency graphing. I commented on existing files with changes, but there also seem to be some new added files related to schemas and validation that I didn't comment on in-line.
Being less familiar with the project, it is possible I don't understand some of the ways things need to be connected, so please correct if those changes are within scope here. But otherwise, I suggest those be moved out of this PR to a separate branch or branches.
| @@ -0,0 +1,122 @@ | |||
| import ast | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this entire file just a duplicate of pkg/fs_algo/fs_algo/dependency_graph/dependency_grapher.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was initially developed to create the dependency graph for fs_prep. The code is now refactored to create any given .py file. Thus, this file is no longer needed and removed. Thanks.
pkg/fs_algo/fs_algo/fs_pred_algo.py
Outdated
| args = parser.parse_args() | ||
|
|
||
| path_pred_config = Path(args.path_pred_config) #Path(f'~/git/formulation-selector/scripts/eval_ingest/xssa/xssa_pred_config.yaml') | ||
| config_dir = path_pred_config.parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least at first glance, this doesn't look like it belongs in the scope of this PR. I think that may true of all the changes in this file. Let me know if I've misunderstood their purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, Bobby. These belong to PR#63 that are under review.
pkg/fs_algo/fs_algo/fs_proc_algo.py
Outdated
| import fs_algo.fs_algo_train_eval as fsate | ||
| import ast | ||
| import numpy as np | ||
| import pandera as pa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the previous file, it appears these changes may relate to functionality separate from code coverage, testing, and dependency mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, Bobby. These belong to PR#63 that are under review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed w/ Soroush and we're going to roll with the pandera component for convenience. Comment relevant to this line. Soroush, please address the following warning to help out with future pandera compatibility:
FutureWarning: Importing pandas-specific classes and functions from the
top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:
old import
import pandera as pa
new import
import pandera.pandas as pa
If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:
https://pandera.readthedocs.io/en/stable/supported_libraries.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. Thanks.
| if __name__ == "__main__": | ||
| parser = argparse.ArgumentParser(description = 'process the algorithm config file') | ||
| parser.add_argument('path_tfrm_cfig', type=str, help='Path to the YAML configuration file specific for algorithm training') | ||
| parser.add_argument('--validate', action='store_true', help='Enable schema loading for data validation') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the last two files, this seems to be adding functionality beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, Bobby. These belong to PR#63 that are under review.
…that case. Report the file save location as a normal terminal message when saving occurs.
…he file (external dependencies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scripts/eval_ingest/{dataset_shortname}/ directory is reserved for config files or scripts that are very specific to a single dataset.
This schema is very general, so it should live somewhere inside the package of interest. I suggest creating a new folder within a package directory for storing the pydantic schemas, e.g. pkg/fs_algo/fs_algo/file_schemas/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also move this file to schema-specific subdirectory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented. Thanks!
|
|
||
| # Validating DataFrame object | ||
| if arg_val: | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing this out for a different set of data, I came across some problems that may have been related to internal errors in pandera. Does this following code suggestion work for you instead? It prevents a fatal error from happening on my end:
try:
schemas.schema_df_attr.validate(df_attr)
print("✅ DataFrame validated successfully.")
except (pa.errors.SchemaError, pa.errors.SchemaErrors) as e:
print(f"❌ Validation failed: {e}")
print(e.args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it worked for me. Updated the code accordingly. Thanks!
…fs_pred accordingly
… and update the script
Characterize code coverage, identify functions that need more unit tests and create a dependency mapping graph
Additions
Removals
Changes
Testing
Screenshots
Notes
This Excel spreadsheet is created denoting the name of the functions in
fs_algo_train_eval.py, and the priority for creating/updating the unit tests (based on simplicity criteria): the higher the number, the lower the priority, and some remarks. The spreadsheet is in:pkg/fs_algo/fs_algo/Unit test priority.xlsxTodos
Checklist
Testing checklist
Target Environment support
Accessibility
Other