Additional Notes - tcga

From HIVE Lab
Jump to navigation Jump to search

Additional Notes

All the mapping files are available in the repository folder: `pipeline/convert_step2/mapping`

The mapping files used for converting TCGA are:

DOID:

  • `tcga_doid_mapping.csv`

TCGA Projects were mapped to DOID parent terms using the following table (generated from previous Biomuta mapping):


DO_slim_id DO_slim_name TCGA_project
DOID:5041 esophageal cancer TCGA-ESCA
DOID:2531 hematologic cancer TCGA-DLBC
DOID:9256 colorectal cancer TCGA-READ
DOID:1319 brain cancer TCGA-GBM
DOID:1319 brain cancer TCGA-LGG
DOID:1781 thyroid cancer TCGA-THCA
DOID:11054 urinary bladder cancer TCGA-BLCA
DOID:363 uterine cancer TCGA-UCEC
DOID:169 neuroendocrine tumor TCGA-PCPG
DOID:4362 cervical cancer TCGA-CESC
DOID:363 uterine cancer TCGA-UCS
DOID:3277 thymus cancer TCGA-THYM
DOID:3571 liver cancer TCGA-LIHC
DOID:11934 head and neck cancer TCGA-HNSC
DOID:2174 ocular cancer TCGA-UVM
DOID:4159 skin cancer TCGA-SKCM
DOID:9256 colorectal cancer TCGA-COAD
DOID:3953 adrenal gland cancer TCGA-ACC
DOID:1793 pancreatic cancer TCGA-PAAD
DOID:2994 germ cell cancer TCGA-TGCT
DOID:1324 lung cancer TCGA-LUSC
DOID:1790 malignant mesothelioma TCGA-MESO
DOID:2394 ovarian cancer TCGA-OV
DOID:1115 sarcoma TCGA-SARC
DOID:263 kidney cancer TCGA-KIRP
DOID:263 kidney cancer TCGA-KICH
DOID:10534 stomach cancer TCGA-STAD
DOID:2531 hematologic cancer TCGA-LAML
DOID:10283 prostate cancer TCGA-PRAD
DOID:1324 lung cancer TCGA-LUAD
DOID:1612 breast cancer TCGA-BRCA
DOID:263 kidney cancer TCGA-KIRC
DOID:263 kidney cancer TCGA-KICH

Uniprot Accession:

  • `human_protein_transcriptlocus.csv`

Peptide ID (starts with ENSP) was mapped to uniprot isoform accession.

  • Mapping was NOT performed to uniprot canonical accession as this resulted in an issue with the final dataset in which a mutation for the same canonical accession would be listed with different amino acid changes.*