Skip to content

Conversation

@bethac07
Copy link
Member

@bethac07 bethac07 commented Jun 29, 2023

Description

Collate handles the standard cases pretty well at this point, but there are a number of edge cases I've been writing extra code in my own fork to handle; it would be nice to add first-class support

What is the nature of your change?

  • Enhancement (adds functionality).
  • This change requires a documentation update.

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@codecov-commenter
Copy link

codecov-commenter commented Jun 29, 2023

Codecov Report

Attention: Patch coverage is 7.69231% with 36 lines in your changes missing coverage. Please review.

Project coverage is 93.48%. Comparing base (b5d4219) to head (3c80e99).

Files with missing lines Patch % Lines
pycytominer/cyto_utils/collate.py 8.57% 32 Missing ⚠️
pycytominer/cyto_utils/collate_cmd.py 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #302      +/-   ##
==========================================
- Coverage   94.41%   93.48%   -0.94%     
==========================================
  Files          57       57              
  Lines        3188     3223      +35     
==========================================
+ Hits         3010     3013       +3     
- Misses        178      210      +32     
Flag Coverage Δ
unittests 93.48% <7.69%> (-0.94%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d33bs
Copy link
Member

d33bs commented Aug 18, 2023

Hello, this is a courtesy notice that Pcytominer's master branch has been renamed to main as part of changes merged in #303. While changes here should be seamless, GitHub is unable to change your local git environment and offers the following guidance when it comes to this. Thank you for your understanding and please don't hesitate to reach out with any questions or concerns.

image

@d33bs
Copy link
Member

d33bs commented May 5, 2025

Hi @bethac07 - thanks for your efforts on this! I wanted to check in to see if you're still planning to move forward with this PR. If there's any way we may assist or if you have any questions just let us know. Thanks for taking a moment to make a contribution here!

@bethac07
Copy link
Member Author

Hi @d33bs,

Just for funsies, I've rebased it and cleaned up some docstrings, but if collate is being deprecated, spending a bunch of time writing tests (which is all this needed aside from those docstrings) is probably not worth my nor your time. In a perfect world, despite the deprecation and ruff comments I'd still want to merge it so that it can be accessed if needed in a pre 2.0 main commit, but if you don't want to do that we'll figure something else out (like move to a fork). Thanks!

@d33bs
Copy link
Member

d33bs commented May 26, 2025

Thanks @bethac07 ! I understand what you mean about this and feel similarly. Just the same, I'd like to keep our CI tests passing to help avoid confusion with future changes. Would it be alright with you for me to modify the code so it passes ruff checks (while retaining the existing operational results where possible)?

@bethac07
Copy link
Member Author

Sure, fine with me :)

@d33bs
Copy link
Member

d33bs commented May 30, 2025

Hi @bethac07 - I made some changes to help us pass linting checks. Double checking: is this ready for review and moving forward (the PR is still in draft state and includes [WIP])?

@bethac07
Copy link
Member Author

That was just the tests, if we can pull without, it's all set! Thanks much for the help

@bethac07 bethac07 changed the title Add extra args to collate to handle edge cases [WIP] Add extra args to collate to handle edge cases May 30, 2025
@bethac07 bethac07 marked this pull request as ready for review May 30, 2025 13:23
Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bethac07 - nice job, thanks for the additions. I wanted to make sure this code was readied in context with #301 in mind and provided some feedback that I'd like to get your thoughts on before proceeding with a merge. Thanks in advance for your attention to these.

shutil.rmtree(backend_dir)


def find_and_fix_metadata(path_to_plate_folder, overwrite=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a docstring and type hints to this function to help increase the documentation and understanding.

append_metadata(image_csv, overwrite)


def append_metadata(path_to_csv, overwrite=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a docstring and type hints to this function to help increase the documentation and understanding.

Comment on lines +39 to +40
download_flags=[],
upload_flags=[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a blank / empty list for default arguments somewhat dangerous for Python functions because when the function evaluates at runtime it quietly will couple all calls to the same list object (creating sometimes surprising results). Consider using None as a default here and creating a new list each time within the function instead, e.g.:

if download_flags is None:
        download_flags = [] 

Comment on lines +289 to +292
all_meta = path_to_csv.split("/")[-2]
plate = "-".join(all_meta.split("-")[:-2])
well = all_meta.split("-")[-2]
site = all_meta.split("-")[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These looked to have a very specific formatting expectation that could go awry. Consider adding an example of what's expected here with a code comment or adding to a docstring.


remote_aggregated_file = f"{aws_remote}/backend/{batch}/{plate}/{plate}.csv"

sync_cmd = f"aws s3 sync --exclude * --include */Cells.csv --include */Nuclei.csv --include */Cytoplasm.csv --include */Image.csv {remote_input_dir} {input_dir}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for run_check_errors to take str arguments anymore after these changes? If not, consider revising run_check_errors to only take list, reducing the complexity of that function (and potential for edge cases).

"*/Image.csv",
remote_input_dir,
input_dir,
*download_flags,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple test we could add to help describe and confirm the new download behaviors? I bring this up as this change may impact #301 (or vice versa) and it could be important to understand distinction.

print(f"Downloading CSVs from {remote_input_dir} to {input_dir}")
run_check_errors(sync_cmd)

if overwrite_metadata or append_metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple test we could add to help implement append_metadata? Similar to the above, I am concerned that if we don't validate the functionality that the code might become more difficult to troubleshoot (or less usable than it is currently).

errors="ignore",
)
edited = True
insertion_index = list(df.columns).index("ModuleError_01LoadData")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-coding could result in inflexibility for the function. Consider making this more flexible with a default index selection and leveraging df.columns.get_loc() to avoid a cast to list for df.columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants