Skip to content

Commit 2929ecb

Browse files
committed
📝 updates to account for v12 reload
1 parent b0f6282 commit 2929ecb

File tree

2 files changed

+12
-12
lines changed

2 files changed

+12
-12
lines changed

COLLABORATIONS/openTARGETS/README.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ To create the histologies file, recommended method is to:
3535
library("tidyr")
3636
```
3737

38-
1. Pull the OpenPedCan repo (warning, it's 12GB ): https://github.com/PediatricOpenTargets/OpenPedCan-analysis, or just download the script from `analyses/pedcbio-sample-name/pedcbio_sample_name_col.R`
38+
1. Pull the OpenPedCan repo (warning, it's 12GB ): https://github.com/d3b-center/OpenPedCan-analysis, or just download the script from `analyses/pedcbio-sample-name/pedcbio_sample_name_col.R`
3939
1. Export from D3b Warehouse the latest existing cBio IDs to use for population. Ensure that the output is csv double-quoted. Currently that can be obtained using the sql command:
4040
```sql
4141
@@ -73,7 +73,7 @@ To create the histologies file, recommended method is to:
7373
1. Get a blacklist from D3b Warehouse, exporting table `bix_workflows.cbio_hide_reasons
7474
7575
### Run as standalone
76-
1. Download from https://github.com/PediatricOpenTargets/OpenPedCan-analysis the `analyses/pedcbio-sample-name/pedcbio_sample_name_col.R` or run from repo if you have it
76+
1. Download from https://github.com/d3b-center/OpenPedCan-analysis the `analyses/pedcbio-sample-name/pedcbio_sample_name_col.R` or run from repo if you have it
7777
1. Run `Rscript --vanilla pedcbio_sample_name_col.R -i path-to-histolgies-file.tsv -n path-to-cbio-names.csv -b Methylation`
7878
OR
7979
### Run in repo
@@ -89,7 +89,7 @@ Rscript COLLABORATIONS/openTARGETS/merge_rsem_rds.R --first_file gene-expression
8989
9090
9191
### File Transformation
92-
It's recommended to put datasheets in a dir called `datasheets`, downloaded files in it's own dir (in v12 it's `GF_INPUTS`) and the rest of the processed outputs into it's own dir (`study_build` for v12) to keep things sane and also be able to leverage existing study build script in `scripts/organize_upload_packages.py`
92+
It's recommended to put datasheets in a dir called `datasheets`, downloaded files in it's own dir (in v12 it's `DOWNLOADED_INPUTS`) and the rest of the processed outputs into it's own dir (`study_build` for v12) to keep things sane and also be able to leverage existing study build script in `scripts/organize_upload_packages.py`
9393
#### 1. COLLABORATIONS/openTARGETS/clinical_to_datasheets.py
9494
```
9595
usage: clinical_to_datasheets.py [-h] [-f HEAD] [-c CLIN] [-s CL_SUPP]
@@ -117,7 +117,7 @@ optional arguments:
117117
Outputs a `data_clinical_sample.txt` and `data_clinical_patient.txt` for the cBio package, and a `bs_id_sample_map.txt` mapping file to link BS IDs to gnerated cBioPortal IDs based on the rules for creating a proper somatic event using column `parent_aliquot_id`
118118
119119
Example run:
120-
`python3 COLLABORATIONS/openTARGETS/clinical_to_datasheets.py -f COLLABORATIONS/openTARGETS/header_desc.tsv -c histologies-formatted-id-added.tsv -b cbio_hide_reasons.tsv 2> clin.errs`
120+
`python3 ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/clinical_to_datasheets.py -f ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/header_desc.tsv -c histologies-formatted-id-added.tsv -b cbio_hide_reasons.tsv 2> clin.errs`
121121
122122
#### 2. COLLABORATIONS/openTARGETS/rename_filter_maf.py
123123
@@ -140,10 +140,10 @@ optional arguments:
140140
```
141141
_NOTE_ for v11 input, I ran the following command `zcat snv-dgd.maf.tsv.gz | perl -e '$skip = <>; $skip= <>; while(<>){print $_;}' | gzip -c >> snv-consensus-plus-hotspots.maf.tsv.gz` to add DGD data
142142
143-
_NOTE_ for v12 input,I would have following command `python3 ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/add_dgd_maf_to_openpedcan.py -i /home/ubuntu/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/maf_openpedcan_v12_header.txt -c openpedcan_v12.maf -t ../bs_id_sample_map.txt -m ../GF_INPUTS/snv-dgd.maf.tsv.gz` to add DGD data, which is more robust - however, there are data issues with DGD, so it was left out
143+
_NOTE_ for v12 input,I would have following command `python3 ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/add_dgd_maf_to_openpedcan.py -i /home/ubuntu/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/maf_openpedcan_v12_header.txt -c openpedcan_v12.maf -t ../bs_id_sample_map.txt -m ../DOWNLOADED_INPUTS/snv-dgd.maf.tsv.gz` to add DGD data, which is more robust - however, there are data issues with DGD, so it was left out
144144
145145
Example run:
146-
`python3 COLLABORATIONS/openTARGETS/rename_filter_maf.py -m bs_id_sample_map.txt -v snv-consensus-plus-hotspots.maf.tsv.gz -s 1 -n openpedcan_v12`
146+
`python3 ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/rename_filter_maf.py -m bs_id_sample_map.txt -v snv-consensus-plus-hotspots.maf.tsv.gz -s 1 -n openpedcan_v12`
147147
148148
#### 3. COLLABORATIONS/openTARGETS/cnv_to_tables.py
149149
Convert cnv table to cBio format - genes as rows, samples as cols, one for absolute CN, another for GISTIC-style
@@ -163,7 +163,7 @@ optional arguments:
163163
```
164164
165165
Example run:
166-
`python3 COLLABORATIONS/openTARGETS/cnv_to_tables.py -m bs_id_sample_map.txt -c consensus_wgs_plus_cnvkit_wxs.tsv.gz -s openpedcan_v11`
166+
`python3 ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/cnv_to_tables.py -m bs_id_sample_map.txt -c consensus_wgs_plus_cnvkit_wxs.tsv.gz -s openpedcan_v12`
167167
168168
#### 4. COLLABORATIONS/openTARGETS/rename_export_rsem.R
169169
Note, I merged the tcga into the main rds. I also needed an instance with _64GB ram_ in order to calc z scores. Update: Can also achieve by setting up 32GB swap space
@@ -189,12 +189,12 @@ Options:
189189
Show this help message and exit
190190
```
191191
Example run:
192-
`Rscript COLLABORATIONS/openTARGETS/rename_export_rsem.R --rna_rds gene_tcga_expression_common_merge.rds --map_id bs_id_sample_map.txt --type openpedcan_v11 --computeZscore R 2> rna_convert.errs`
192+
`Rscript ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/rename_export_rsem.R --rna_rds gene_tcga_expression_common_merge.rds --map_id bs_id_sample_map.txt --type openpedcan_v12 --computeZscore C++ 2> rna_convert.errs`
193193
194194
#### 5. scripts/convert_fusion_as_sv.py
195195
196196
Before running, to leverage an existing fusion conversion, I first ran:
197-
`COLLABORATIONS/openTARGETS/reformat_cbio_sample_index.py -t bs_id_sample_map.txt -n openpedcan_v12 > fusion_sample_name_input.txt`
197+
`~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/reformat_cbio_sample_index.py -t bs_id_sample_map.txt -n openpedcan_v12 > fusion_sample_name_input.txt`
198198
to reformat the sample name index.
199199
```
200200
usage: convert_fusion_as_sv.py [-h] [-t TABLE] [-f FUSION_RESULTS] [-o OUT_DIR] -m MODE
@@ -229,7 +229,7 @@ optional arguments:
229229
json config file with meta information; see REFS/case_meta_config.json example
230230
```
231231
Example run:
232-
`python3 scripts/organize_upload_packages.py -o processed -c COLLABORATIONS/openTARGETS/openpedcan_v12_case_meta_config.json`
232+
`python3 scripts/organize_upload_packages.py -o processed -c ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/openpedcan_v12_case_meta_config.json`
233233
234234
#### 7. COLLABORATIONS/openTARGETS/case_list_from_datasheet.py
235235
Last step before validation and upload
@@ -251,4 +251,4 @@ optional arguments:
251251
```
252252
253253
Example run:
254-
`python3 COLLABORATIONS/openTARGETS/case_list_from_datasheet.py -d data_clinical_sample.txt -s openpedcan_v12 -c GTEx -m 3`
254+
`python3 ~/tools/kf-cbioportal-etl/COLLABORATIONS/openTARGETS/case_list_from_datasheet.py -d data_clinical_sample.txt -s openpedcan_v12 -c GTEx -m 3`

COLLABORATIONS/openTARGETS/openpedcan_v12_case_meta_config.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@
115115
},
116116
"study": {
117117
"_comment": "see https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#cancer-study for detailed specifics",
118-
"description": "<a href=\"https://github.com/PediatricOpenTargets/OpenPedCan-analysis\">OpenPedCan</a> is a collaborative project between the National Cancer Institute and the Children's Hospital of Philadelphia as part of the NCI's Childhood Cancer Data Initiative. Here, we harmonize pan-cancer data using <a href=\"https://kidsfirstdrc.org/\">KidsFirst Data Resource Center</a> workflows and harness <a href=\"https://github.com/AlexsLemonade/OpenPBTA-analysis/\">OpenPBTA analytics</a> workflows to scale and add modules across pediatric cancer datasets. This data has been integrated into the pediatric open targets platform to assist in development and query of the FDA's Relevant Pediatric Molecular Targets List (PMTL) to identify new therapeutics for children with cancer. This is the v12 release of this effort, for v10 please see <a href=\"https://pedcbioportal.kidsfirstdrc.org/study/summary?id=ped_opentargets_2021\">OpenPedCan v10</a>. For study release details, please see <a href=\"https://tinyurl.com/55cxz9am\">Release Notes</a>",
118+
"description": "<a href=\"https://github.com/d3b-center/OpenPedCan-analysis\">OpenPedCan</a> is a collaborative project between the National Cancer Institute and the Children's Hospital of Philadelphia as part of the NCI's Childhood Cancer Data Initiative. Here, we harmonize pan-cancer data using <a href=\"https://kidsfirstdrc.org\">KidsFirst Data Resource Center</a> workflows and harness <a href=\"https://github.com/AlexsLemonade/OpenPBTA-analysis\">OpenPBTA analytics</a> workflows to scale and add modules across pediatric cancer datasets. This data has been integrated into the pediatric open targets platform to assist in development and query of the FDA's Relevant Pediatric Molecular Targets List (PMTL) to identify new therapeutics for children with cancer. This is the v12 release of this effort underlying the <a href=\"https://moleculartargets.ccdi.cancer.gov\">NCI's molecular targets platform</a>. For study release details, please see <a href=\"https://tinyurl.com/55cxz9am\">Release Notes</a>",
119119
"groups": "PUBLIC",
120120
"cancer_study_identifier": "openpedcan_v12",
121121
"reference_genome": "hg38",

0 commit comments

Comments
 (0)