You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"_description": "This section of the document is meant to help understand the organization of key-value pairs and how it supports the creation of files from https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats as part of the cBioportal ETL",
4
+
"sections": {
5
+
"merged_x": {
6
+
"_description": "Support information for merged genomic data and supporting meta files. ",
7
+
"dir": "output directory for merged results. Should match output dir specified by etl conversion script at run time",
8
+
"dtypes": {
9
+
"_description": "Data types - cBio defined data types. _comment has link to detailed specifics in each section.",
10
+
"ext": "File extension of merged outputs from etl script",
11
+
"cbio_name": "cBio output file name - a soft link to the etl output created inside the study directory",
12
+
"meta_file_attributes": "Direct key-value paris used by cBio in a meta_x file used to describe data_x files"
13
+
}
14
+
},
15
+
"study": "Special meta file used to describe the cBio study",
16
+
"case_x": "cBio case lists - sample lists describing which samples have mutation data, sv data, cnv data, etc. See https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#case-lists for specifics",
17
+
"data_sheets": "Clinical patient and sample data as well as gene matrix if panel data present. Distinct from other data_x files in that in contains sample and patient metadata and not genomic data",
18
+
"database_pulls": {
19
+
"_description": "This section is used to support pulling auto-generated clinical data tables and supporting genomics etl information from the D3b Data Warehouse",
20
+
"manifests": {
21
+
"<manifest descriptor>": {
22
+
"_description": "The key for this field is meant to be a convenient descriptor of what sub-study files derive from, as a cBio study make come from many sources",
23
+
"table": "D3b warehouse table name with relevant file info",
24
+
"file_types": "Manifests typically contain all possible harmonization outputs. Specifying specific file_type(s) limits to relevant outputs. Exception is annotated_public_output, etl will pull only he maf as vcfs are included in that query.",
25
+
"out_file": "Desired output file name"
26
+
}
27
+
},
28
+
"x_head": "Special header file table for data_clinical(sample/patient). cBio data_clinical headers have 5 header rows, and which columns are used are determined by the x_file table",
29
+
"x_file": "sample or patient tables with corresponding metadata at the sample and patient levels",
30
+
"genomics_etl": "a helper file with relevant cBio sample names and individual genomic files names for ETL merging",
31
+
"seq_center": "only if project has RNA data, a helper file to fill in missing sequencing center information for genomics etl",
32
+
"gene_file": "Only if study has panel data, the source information for gene matrix in the data_sheets section"
33
+
34
+
}
35
+
}
36
+
},
37
+
"merged_rsem": {
38
+
"_comment": " see https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#expression-data for detailed specifics",
39
+
"dir": "merged_rsem",
40
+
"dtypes": {
41
+
"counts": {
42
+
"ext": "rsem_merged.txt",
43
+
"cbio_name": "data_rna_seq_v2_mrna.txt",
44
+
"meta_file_attr": {
45
+
"stable_id": "rna_seq_v2_mrna",
46
+
"profile_name": "RNA expression",
47
+
"profile_description": "Expression levels from RNA-Seq (rsem FPKM)",
"profile_description": "Expression levels from RNA-Seq, Z scores of log2(FPKM + 1) values",
60
+
"genetic_alteration_type": "MRNA_EXPRESSION",
61
+
"datatype": "Z-SCORE",
62
+
"show_profile_in_analysis_tab": "true"
63
+
}
64
+
}
65
+
}
66
+
},
67
+
"merged_fusion": {
68
+
"_comment": "see https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#fusion-data for detailed specifics",
69
+
"dir": "merged_fusion",
70
+
"dtypes": {
71
+
"fusion": {
72
+
"ext": "fusions.txt",
73
+
"cbio_name": "data_fusions.txt",
74
+
"meta_file_attr": {
75
+
"stable_id": "fusion",
76
+
"profile_name": "Predicted RNA fusions",
77
+
"profile_description": "PBTA fusion data using arriba and STAR Fusion, annotated and filtered using annoFuse. Also contains DGD custom filtered fusions",
78
+
"genetic_alteration_type": "FUSION",
79
+
"datatype": "FUSION",
80
+
"show_profile_in_analysis_tab": "true"
81
+
}
82
+
}
83
+
}
84
+
},
85
+
"data_sheets": {
86
+
"dir": "datasheets",
87
+
"dtypes": {
88
+
"patient": {
89
+
"_comment": "see https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#example-clinical-header for detailed specifics",
90
+
"cbio_name": "data_clinical_patient.txt",
91
+
"meta_file_attr": {
92
+
"genetic_alteration_type": "CLINICAL",
93
+
"datatype": "PATIENT_ATTRIBUTES"
94
+
}
95
+
},
96
+
"sample": {
97
+
"_comment": "see https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#clinical-sample-columns for detailed specifics",
98
+
"cbio_name": "data_clinical_sample.txt",
99
+
"meta_file_attr": {
100
+
"genetic_alteration_type": "CLINICAL",
101
+
"datatype": "SAMPLE_ATTRIBUTES"
102
+
}
103
+
}
104
+
}
105
+
},
106
+
"study": {
107
+
"_comment": "see https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#cancer-study for detailed specifics",
108
+
"description": "Although myeloid disorders in children may show morphologic similarities to that seen in adults, TARGET AML initiative (Meshinchi, PI) clearly demonstrated that somatic genomic and transcriptome variants are highly distinct in children and young adults, and in fact there are variants that are uniquely restricted to younger children. TARGET AML initiative, helped identify numerous somatic alterations with high therapeutic potential in younger AML patients. Clinical outcome in children with myeloid disorders have remained poor in part due to lack of deep understanding of the genomic makeup of the disease as well as the host. Comprehensive studies of the host and disease may enable more informed therapies in order to optimize targeting the leukemia while minimizing short and long term toxicities, leading to improved survival with minimal morbidities. For updates, please see here: <a href=\"https://tinyurl.com/55cxz9am\">Release Notes</a>",
0 commit comments