Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Documentation for Non-CellProfiler Datasets in Pycytominer #430

Merged
merged 33 commits into from
Sep 26, 2024

Conversation

axiomcura
Copy link
Member

Description

Thank you for your contribution to pycytominer!
Please succinctly summarize your proposed change.
What motivated you to make this change?

This PR updates the README with instructions on how to handle morphological features extracted from software other than CellProfiler. These changes were conducted due to a reviewer’s comment/suggestion.

"When I used the function pycytominer.normalize() on a dataset originating from my own work, I encountered an error message saying
“No CP features found. Are you sure this dataframe is from CellProfiler?”.
I could easily fix the problem because I know the properties of a CellProfiler dataset, and the error message was hinting at some “Metadata_” missing.
The fix was very easy, I just needed to add the prefix “Metadata_” to my metadata columns. My dataset comes from another commonly used proprietary software called Harmony. I understand that the authors are in favor of using open-source resources, but I think that the pycytominer tool could be useful in a much larger context, and take as input different types of datasets. In the readme file, that I followed to perform the normalization, I couldn’t find anywhere that the dataset has to come from CellProfiler. The author should state clearly, that only cellprofiler datasets can be used. Instead, currently, the sentence in the readme files of the pycytominer github repository says “Image data flow from a microscope to cell segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler)”. And in the code chunck the sentence says “Each function takes as input a pandas DataFrame or file path”, but in reality, it has to be a specifically formatted pandas dataframe (at least regarding the column names). I think the authors should fix this misunderstanding in the documentation. Alternatively, the better option would be to support the use of different datasets, and explain how the column names should be formatted to be easily processed by the different functions. I can’t exclude that this is described somewhere in the tutorials, but I didn’t encounter it when following the instructions on the readmefile. I would also make the error message more self-explanatory because for now, it implies that I know what CellProfiler is and how its output is formatted. If I don’t have this knowledge, as a user, I might give up on using this tool. This would be unfortunate, as I believe that this tool is a fundamental step in image-based profiling. Considering how easy was to fix this issue for my own dataset, which allowed me to immediately see the outputs of your tool, I recommend reconsidering how to best document the formatting steps necessary to be able to use the different functions of the pycytominer too"

Please also link to any relevant issues that your code is associated with.

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@axiomcura axiomcura changed the title Update docs WIP: Improve Documentation for Non-CellProfiler Datasets in Pycytominer Sep 11, 2024
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
axiomcura and others added 5 commits September 21, 2024 18:42
Co-authored-by: Gregory Way <[email protected]>
Co-authored-by: Gregory Way <[email protected]>
Co-authored-by: Gregory Way <[email protected]>
Co-authored-by: Gregory Way <[email protected]>
Co-authored-by: Gregory Way <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.72%. Comparing base (409d2aa) to head (6caab11).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##           main     #430       +/-   ##
=========================================
+ Coverage      0   94.72%   +94.72%     
=========================================
  Files         0       57       +57     
  Lines         0     3145     +3145     
=========================================
+ Hits          0     2979     +2979     
- Misses        0      166      +166     
Flag Coverage Δ
unittests 94.72% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@axiomcura axiomcura marked this pull request as ready for review September 25, 2024 17:04
@axiomcura axiomcura changed the title WIP: Improve Documentation for Non-CellProfiler Datasets in Pycytominer Improve Documentation for Non-CellProfiler Datasets in Pycytominer Sep 25, 2024
@gwaybio
Copy link
Member

gwaybio commented Sep 26, 2024

@axiomcura - please take a look at my changes. They aim to simplify and clarify. I also removed the non-CellProfiler example that actually used CellProfiler data. This is too confusing and I think the updated function docstrings and error messages in #448 are sufficient. Lastly, I removed the table of contents which was in a strange place and I felt added more confusion than help

@gwaybio gwaybio requested review from d33bs and kenibrewer September 26, 2024 12:24
Copy link
Member

@kenibrewer kenibrewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good set of changes! I like adding a description for using these other feature sets front and center.

@axiomcura
Copy link
Member Author

I like the changes @gwaybio! I agree that having the table of contents in the middle of the README was a bit confusin. It's much cleaner and more readable now.

@d33bs, waiting on your approval if the updated documentation is ready to be merged! c:

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I left a few comments + suggestions and overall thought this LGTM when ready. Additionally, had these thoughts:

  • I noticed that the Citing pycytominer header might need to be capitalized to stay consistent with other uses in the document.
  • I noticed the image of the workflows could use maybe another block alongside CellProfiler (or some other method) to indicate the freedom to use other tools as desired (at the moment it visually communicates you can only use two tools to reach the workflows). Maybe there could be a mention of "Single-cell Profiling Software (e.g. CellProfiler, DeepProfiler, etc.)" (this is a bit too long but I think you can see what I mean!). See below:

image

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Co-authored-by: Dave Bunten <[email protected]>
Co-authored-by: Dave Bunten <[email protected]>
@gwaybio
Copy link
Member

gwaybio commented Sep 26, 2024

It looks like this is a duplicate of #448 ? @axiomcura - is this true? How are these different?

Taking a look at what's changed here, it looks like #448 is the correct PR and we should close this one.

@gwaybio
Copy link
Member

gwaybio commented Sep 26, 2024

(I'll plan on going through @d33bs comments and adding them to 448)

@gwaybio
Copy link
Member

gwaybio commented Sep 26, 2024

oh wait! I am the one confused. One sec

axiomcura and others added 4 commits September 26, 2024 13:17
@gwaybio
Copy link
Member

gwaybio commented Sep 26, 2024

Thanks for the review @d33bs !

One thing I left for @axiomcura to address is this comment:

I noticed the image of the workflows could use maybe another block alongside CellProfiler (or some other method) to indicate the freedom to use other tools as desired (at the moment it visually communicates you can only use two tools to reach the workflows). Maybe there could be a mention of "Single-cell Profiling Software (e.g. CellProfiler, DeepProfiler, etc.)" (this is a bit too long but I think you can see what I mean!). See below:

@axiomcura - what do you think about dropping this figure in favor of your new figure 1? I propose that you rename the existing file (maybe to something like pipeline_legacy.png) and add a high res version of figure one (naming it pipeline.png)

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
axiomcura and others added 4 commits September 26, 2024 15:11
Co-authored-by: Gregory Way <[email protected]>
Co-authored-by: Gregory Way <[email protected]>
Co-authored-by: Gregory Way <[email protected]>
Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉 - I'll merge this in once tests pass

@gwaybio
Copy link
Member

gwaybio commented Sep 26, 2024

update, looks like macos runners aren't picking up the jobs. Since ubuntu and other integration tests passed and this PR doesn't touch code, I will merge to reduce delays

@gwaybio gwaybio merged commit e0ab9aa into cytomining:main Sep 26, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants