Additional Notes

From HIVE Lab
Jump to navigation Jump to search

Additional Notes

All the mapping files are available in the scripts repository in the folder: `pipeline/convert_step2/mapping`

ICGC uses TCGA study terms, so the same TCGA to DOID parent terms are used for mapping (generated from previous Biomuta mapping):

DO_slim_id DO_slim_name TCGA_project
DOID:5041 esophageal cancer TCGA-ESCA
DOID:2531 hematologic cancer TCGA-DLBC
DOID:9256 colorectal cancer TCGA-READ
DOID:1319 brain cancer TCGA-GBM
DOID:1319 brain cancer TCGA-LGG
DOID:1781 thyroid cancer TCGA-THCA
DOID:11054 urinary bladder cancer TCGA-BLCA
DOID:363 uterine cancer TCGA-UCEC
DOID:169 neuroendocrine tumor TCGA-PCPG
DOID:4362 cervical cancer TCGA-CESC
DOID:363 uterine cancer TCGA-UCS
DOID:3277 thymus cancer TCGA-THYM
DOID:3571 liver cancer TCGA-LIHC
DOID:11934 head and neck cancer TCGA-HNSC
DOID:2174 ocular cancer TCGA-UVM
DOID:4159 skin cancer TCGA-SKCM
DOID:9256 colorectal cancer TCGA-COAD
DOID:3953 adrenal gland cancer TCGA-ACC
DOID:1793 pancreatic cancer TCGA-PAAD
DOID:2994 germ cell cancer TCGA-TGCT
DOID:1324 lung cancer TCGA-LUSC
DOID:1790 malignant mesothelioma TCGA-MESO
DOID:2394 ovarian cancer TCGA-OV
DOID:1115 sarcoma TCGA-SARC
DOID:263 kidney cancer TCGA-KIRP
DOID:10534 stomach cancer TCGA-STAD
DOID:2531 hematologic cancer TCGA-LAML
DOID:10283 prostate cancer TCGA-PRAD
DOID:1324 lung cancer TCGA-LUAD
DOID:1612 breast cancer TCGA-BRCA
DOID:263 kidney cancer TCGA-KIRC
DOID:263 kidney cancer TCGA-KICH

Uniprot Accession:

  • `human_protein_transcriptlocus.csv`

Transcript ID (starts with ENSP) was mapped to uniprot isoform accession.

Mapping was NOT performed to uniprot canonical accession as this resulted in an issue with the final dataset in which a mutation for the same canonical accession would be listed with different amino acid changes.