Description
This repository may contain references to JUMP profile data that need to be updated to reflect the new directory structure.
Context
The JUMP Cell Painting profiles have been reorganized to a new, cleaner structure. See jump-cellpainting/datasets#155 for details.
Required Changes
Your repository may contain references to the old profile paths that need to be updated:
Old → New Path Mappings
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
→/workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier.parquet
→/workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/CRISPR/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected.parquet
→/workspace/profiles_assembled/CRISPR/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/CRISPR/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony_PCA_corrected/profiles_wellpos_cc_var_mad_outlier.parquet
→/workspace/profiles_assembled/CRISPR/v1.0a/profiles_wellpos_cc_var_mad_outlier.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/COMPOUND/profiles_var_mad_int_featselect_harmony/profiles_var_mad_int_featselect_harmony.parquet
→/workspace/profiles_assembled/COMPOUND/v1.0/profiles_var_mad_int_featselect_harmony.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_a917fa7/COMPOUND/profiles_var_mad_int_featselect_harmony/profiles_var_mad_int.parquet
→/workspace/profiles_assembled/COMPOUND/v1.0/profiles_var_mad_int.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
→/workspace/profiles_assembled/ALL/v1.0b/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
-
/workspace/profiles/jump-profiling-recipe_2024_0224e0f/ALL/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect.parquet
→/workspace/profiles_assembled/ALL/v1.0b/profiles_wellpos_cc_var_mad_outlier_featselect.parquet
Update Script
The following AWK script by @afermg provides a more comprehensive solution that handles all profile paths generically:
Create a file named update_cpg_location.awk
:
# Update the paths of cpg files
# /workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
# Is converted to
# /workspace/profiles_assembled/ORF/v1.0a/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet
BEGIN {
pattern = "/workspace/profiles/jump-profiling-recipe_2024_[a-z0-9]{7}/([A-Z]+)/.+/(.+[.]parquet)";
}
{
if (match($0, pattern, captures)){
version_name = "v1.0";
if (captures[1]=="ORF" || captures[1]=="CRISPR"){
version_name = version_name "a";
};
if (captures[1]=="ALL"){
version_name = version_name "b";
};
replacement = "/workspace/profiles_assembled/" captures[1] "/" version_name "/" captures[2];
gsub(pattern,replacement);
};
print $0
}
To update all relevant files in your codebase:
# Find and update all files containing old profile paths
rg "workspace/profiles/jump-profiling-recipe_2024" -t py -t json -t md -t sh -t org -t csv -t nix -l | xargs awk -i inplace -f update_cpg_location.awk
Note for macOS users: You'll need GNU awk for this script. Install it with brew install gawk
and use gawk
instead of awk
in the command above.
This command:
- Uses ripgrep (
rg
) to find files containing the old paths -t
selects specific file formats-l
provides a list of files onlyawk -i inplace
modifies files in place
Important: After running the AWK script, always review the changes with git diff
to ensure the transformations were applied correctly. The script handles most cases, but edge cases or typos in the original paths may require manual adjustment.
Additional Note
If your repository also references manifests/profile_index.csv
, note that the format has changed from CSV to JSON. See jump-cellpainting/datasets#152 and jump-cellpainting/datasets#155 for details.
Action Required
Please update your code to use the new profile paths. The old paths will be deprecated.
Feel free to reach out if you have any questions or need assistance with the migration.