In this repository, we perform image analysis to extract single cell morphological profiles and image-based profiling to format the profiles to be processed with machine learning and other analyses.
We performed a traditional Cell Painting assay on multiple different pediatric cancer cell lines. Images were acquired using an Opera Phenix 1 High Content Screening System.
In this assay, we have six channels (five Cell Painting + Brightfield). These channels are listed in the number order that you can find for the TIFF images, with the channel name in the XML file bolded:
- Brightfield high
- Concanavalin A Alexa 488 - Endoplasmic reticulum (ER)
- Phalloidin and WGA Alexa 568 - Actin, Golgi, and plasma membrane (AGP)
- MitoTracker Alexa 647 - Mitochondria
- Hoechst 33342 - DNA/nucleus
- SYTO14 Alexa 488 long (CP) - Cytoplasmic RNA and nucleoli
To assess the optimal seeding density, time point, and plating conditions across all cell lines, we acquired multiple rounds of preliminary data. Each round has two or three platemap layouts. There are three plates per layout, each at different time points (24, 48, and 72 hours). Per plate, there are five different seeding densities for each cell line with two replicate wells per density (1000, 2000, 4000, 8000, 12000). Each plate layout includes specific plate conditions, such as:
- Standard
- Synthemax coated
- Synthemax coated and double PFA fixation
- Laminin coated
Platemap files and visualizations can be found in the metadata folder inside the download data module.
We will use various methods to determine what are the best conditions per cell line. One method we will perform in this repository is single-cell quality control (QC), in which we will output a QC report that can tell us which seeding densities and time points yielded the worst quality segmentations. This can be due to high confluence or poor staining. Another method is pairwise Pearson's correlation, which computes how similar wells are within the same cell line.
NOTE: All plate layouts contain the U2-OS cell line, which is used as an "anchor" to compare profiles across plates and for developing segmentation parameters. Also, any empty portions of the layout contain media and no cells so they are not included on figure or in the platemap files.
Module | Purpose | Description |
---|---|---|
0.download_data | Download plates and platemaps | Download all relevant data (images, XML files, platemap files) to process. All metadata information will be found in this module. |
1.illumination_correction | Save illumination correction function | Perform illumination correction on the raw images and extract the function as an npy to apply during feature extraction |
2.feature_extraction | Extract morphology features | Using CellProfiler, images are corrected, segmented for cell compartments and features are extracted and outputted as SQLite files. |
3.preprocessing_features | Preprocess morphology profiles | Format the SQLite output into single-cell profiles and perform single-cell QC and pycytominer to get normalized and feature selected profiles. |
4.preliminary_results | Generate exploratory data analysis | Generate plots to explore the data for any interesting patterns or phenotypes (e.g., UMAP). |
5.optimization | Determine optimal conditions | Perform analyses to determine the best conditions for each cell line in the pilot datasets. |
In this module, we include four different environments found in the environments folder:
- CellProfiler environment: This environment is used for the illumination correction and feature extraction modules as we will be using CellProfiler v4.2.8 to perform these tasks.
- R environment: This environment is used for any notebook that requires visualization of results and figure generation in R language.
- Image profiling environment: This environment is used during the preprocessing module after we extract morphology features using CellProfiler, which includes installing relevant formatting software such as pycytominer, CytoTable, and coSMicQC.
-
- Optimization environment: This environment is optimization analyses to determine best conditions (seeding density, time point, media) per cell line.
These environments can be installed either via conda or mamba. Below is an example of how to install via the terminal.
# Make sure to be in the environments folder
mamba env create -f ...
If you use or reference this work in your own projects, please cite us.
You can find citation information in the 'cite this repository' link at the top right under about section within GitHub.
This information may also be referenced within the CITATION.cff
file.