Additional Notes - tcga
Jump to navigation
Jump to search
Additional Notes
All the mapping files are available in the repository folder: `pipeline/convert_step2/mapping`
The mapping files used for converting TCGA are:
DOID:
- `tcga_doid_mapping.csv`
TCGA Projects were mapped to DOID parent terms using the following table (generated from previous Biomuta mapping):
| DO_slim_id | DO_slim_name | TCGA_project |
|---|---|---|
| DOID:5041 | esophageal cancer | TCGA-ESCA |
| DOID:2531 | hematologic cancer | TCGA-DLBC |
| DOID:9256 | colorectal cancer | TCGA-READ |
| DOID:1319 | brain cancer | TCGA-GBM |
| DOID:1319 | brain cancer | TCGA-LGG |
| DOID:1781 | thyroid cancer | TCGA-THCA |
| DOID:11054 | urinary bladder cancer | TCGA-BLCA |
| DOID:363 | uterine cancer | TCGA-UCEC |
| DOID:169 | neuroendocrine tumor | TCGA-PCPG |
| DOID:4362 | cervical cancer | TCGA-CESC |
| DOID:363 | uterine cancer | TCGA-UCS |
| DOID:3277 | thymus cancer | TCGA-THYM |
| DOID:3571 | liver cancer | TCGA-LIHC |
| DOID:11934 | head and neck cancer | TCGA-HNSC |
| DOID:2174 | ocular cancer | TCGA-UVM |
| DOID:4159 | skin cancer | TCGA-SKCM |
| DOID:9256 | colorectal cancer | TCGA-COAD |
| DOID:3953 | adrenal gland cancer | TCGA-ACC |
| DOID:1793 | pancreatic cancer | TCGA-PAAD |
| DOID:2994 | germ cell cancer | TCGA-TGCT |
| DOID:1324 | lung cancer | TCGA-LUSC |
| DOID:1790 | malignant mesothelioma | TCGA-MESO |
| DOID:2394 | ovarian cancer | TCGA-OV |
| DOID:1115 | sarcoma | TCGA-SARC |
| DOID:263 | kidney cancer | TCGA-KIRP |
| DOID:263 | kidney cancer | TCGA-KICH |
| DOID:10534 | stomach cancer | TCGA-STAD |
| DOID:2531 | hematologic cancer | TCGA-LAML |
| DOID:10283 | prostate cancer | TCGA-PRAD |
| DOID:1324 | lung cancer | TCGA-LUAD |
| DOID:1612 | breast cancer | TCGA-BRCA |
| DOID:263 | kidney cancer | TCGA-KIRC |
| DOID:263 | kidney cancer | TCGA-KICH |
Uniprot Accession:
- `human_protein_transcriptlocus.csv`
Peptide ID (starts with ENSP) was mapped to uniprot isoform accession.
- Mapping was NOT performed to uniprot canonical accession as this resulted in an issue with the final dataset in which a mutation for the same canonical accession would be listed with different amino acid changes.*