Additional Notes: Difference between revisions

From HIVE Lab
Jump to navigation Jump to search
Created page with "=== Additional Notes === All the mapping files are available in the scripts repository in the folder: `pipeline/convert_step2/mapping` The mapping files used for converting the COSMIC tsv are: '''DOID:''' * `cosmic_doid_mapping.csv` COSMIC tissue site terms were mapped to DOID parent terms using the following table (generated from previous Biomuta mapping): {| class="wikitable" ! Primary Site ! Top_Level_Organ_system |- | NS | NA |- | adrenal_gland | DOID:3953 / adre..."
 
 
Line 3: Line 3:
`pipeline/convert_step2/mapping`
`pipeline/convert_step2/mapping`


The mapping files used for converting the COSMIC tsv are:
ICGC uses TCGA study terms, so the same TCGA to DOID parent terms are used for mapping (generated from previous Biomuta mapping):
 
'''DOID:'''
* `cosmic_doid_mapping.csv`
 
COSMIC tissue site terms were mapped to DOID parent terms using the following table (generated from previous Biomuta mapping):


{| class="wikitable"
{| class="wikitable"
! Primary Site
! DO_slim_id
! Top_Level_Organ_system
! DO_slim_name
|-
! TCGA_project
| NS
| NA
|-
| adrenal_gland
| DOID:3953 / adrenal gland cancer
|-
| autonomic_ganglia
| NA
|-
| biliary_tract
| DOID:4606 / bile duct cancer
|-
| bone
| DOID:184 / bone cancer
|-
| breast
| DOID:1612 / breast cancer
|-
| central_nervous_system
| DOID:1319 / brain cancer
|-
| cervix
| DOID:4362 / cervical cancer
|-
| endometrium
| DOID:363 / uterine cancer
|-
| eye
| DOID:2174 / ocular cancer
|-
| fallopian_tube
| DOID:1964 / fallopian tube cancer
|-
| female_genital_tract_(site_indeterminate)
| NA
|-
| female_genitourinary_system
| NA
|-
| gastrointestinal_tract_(site_indeterminate)
| DOID:3119 / gastrointestinal system cancer
|-
| genital_tract
| NA
|-
| haematopoietic_and_lymphoid_tissue
| DOID:2531 / hematologic cancer
|-
| kidney
| DOID:263 / kidney cancer
|-
|-
| large_intestine
| DOID:5041
| DOID:9256 / colorectal cancer
| esophageal cancer
| TCGA-ESCA
|-
|-
| liver
| DOID:2531
| DOID:3571 / liver cancer
| hematologic cancer
| TCGA-DLBC
|-
|-
| lung
| DOID:9256
| DOID:1324 / lung cancer
| colorectal cancer
| TCGA-READ
|-
|-
| mediastinum
| DOID:1319
| DOID:3565 / meningioma
| brain cancer
| TCGA-GBM
|-
|-
| meninges
| DOID:1319
| DOID:3565 / meningioma
| brain cancer
| TCGA-LGG
|-
|-
| oesophagus
| DOID:1781
| DOID:5041 / esophageal cancer
| thyroid cancer
| TCGA-THCA
|-
|-
| ovary
| DOID:11054
| DOID:2394 / ovarian cancer
| urinary bladder cancer
| TCGA-BLCA
|-
|-
| pancreas
| DOID:363
| DOID:1793 / pancreatic cancer
| uterine cancer
| TCGA-UCEC
|-
|-
| paratesticular_tissues
| DOID:169
| NA
| neuroendocrine tumor
| TCGA-PCPG
|-
|-
| parathyroid
| DOID:4362
| DOID:1540 / parathyroid carcinoma
| cervical cancer
| TCGA-CESC
|-
|-
| penis
| DOID:363
| DOID:11615 / penile cancer
| uterine cancer
| TCGA-UCS
|-
|-
| pericardium
| DOID:3277
| NA
| thymus cancer
| TCGA-THYM
|-
|-
| perineum
| DOID:3571
| DOID:4045 / muscle cancer
| liver cancer
| TCGA-LIHC
|-
|-
| peritoneum
| DOID:11934
| DOID:1775 / peritoneum cancer
| head and neck cancer
| TCGA-HNSC
|-
|-
| pituitary
| DOID:2174
| DOID:1785 / pituitary cancer
| ocular cancer
| TCGA-UVM
|-
|-
| placenta
| DOID:4159
| DOID:2021 / placenta cancer
| skin cancer
| TCGA-SKCM
|-
|-
| pleura
| DOID:9256
| DOID:5158 / pleural cancer
| colorectal cancer
| TCGA-COAD
|-
|-
| prostate
| DOID:3953
| DOID:10283 / prostate cancer
| adrenal gland cancer
| TCGA-ACC
|-
|-
| retroperitoneum
| DOID:1793
| DOID:5875 / retroperitoneal cancer
| pancreatic cancer
| TCGA-PAAD
|-
|-
| salivary_gland
| DOID:2994
| DOID:8618 / oral cavity cancer
| germ cell cancer
| TCGA-TGCT
|-
|-
| skin
| DOID:1324
| DOID:4159 / skin cancer
| lung cancer
| TCGA-LUSC
|-
|-
| small_intestine
| DOID:1790
| DOID:9253 / gastrointestinal stromal tumor
| malignant mesothelioma
| TCGA-MESO
|-
|-
| soft_tissue
| DOID:2394
| NA
| ovarian cancer
| TCGA-OV
|-
|-
| stomach
| DOID:1115
| DOID:10534 / stomach cancer
| sarcoma
| TCGA-SARC
|-
|-
| testis
| DOID:263
| DOID:2998 / testicular cancer
| kidney cancer
| TCGA-KIRP
|-
|-
| thymus
| DOID:10534
| DOID:3277 / thymus cancer
| stomach cancer
| TCGA-STAD
|-
|-
| thyroid
| DOID:2531
| DOID:1781 / thyroid gland cancer
| hematologic cancer
| TCGA-LAML
|-
|-
| upper_aerodigestive_tract
| DOID:10283
| DOID:8618 / oral cavity cancer
| prostate cancer
| TCGA-PRAD
|-
|-
| urinary_tract
| DOID:1324
| DOID:11054 / urinary bladder cancer
| lung cancer
| TCGA-LUAD
|-
|-
| uterine_adnexa
| DOID:1612
| NA
| breast cancer
| TCGA-BRCA
|-
|-
| vagina
| DOID:263
| DOID:119 / vaginal cancer
| kidney cancer
| TCGA-KIRC
|-
|-
| vulva
| DOID:263
| DOID:1245 / vulva cancer
| kidney cancer
| TCGA-KICH
|}
|}



Latest revision as of 21:50, 9 October 2024

Additional Notes

All the mapping files are available in the scripts repository in the folder: `pipeline/convert_step2/mapping`

ICGC uses TCGA study terms, so the same TCGA to DOID parent terms are used for mapping (generated from previous Biomuta mapping):

DO_slim_id DO_slim_name TCGA_project
DOID:5041 esophageal cancer TCGA-ESCA
DOID:2531 hematologic cancer TCGA-DLBC
DOID:9256 colorectal cancer TCGA-READ
DOID:1319 brain cancer TCGA-GBM
DOID:1319 brain cancer TCGA-LGG
DOID:1781 thyroid cancer TCGA-THCA
DOID:11054 urinary bladder cancer TCGA-BLCA
DOID:363 uterine cancer TCGA-UCEC
DOID:169 neuroendocrine tumor TCGA-PCPG
DOID:4362 cervical cancer TCGA-CESC
DOID:363 uterine cancer TCGA-UCS
DOID:3277 thymus cancer TCGA-THYM
DOID:3571 liver cancer TCGA-LIHC
DOID:11934 head and neck cancer TCGA-HNSC
DOID:2174 ocular cancer TCGA-UVM
DOID:4159 skin cancer TCGA-SKCM
DOID:9256 colorectal cancer TCGA-COAD
DOID:3953 adrenal gland cancer TCGA-ACC
DOID:1793 pancreatic cancer TCGA-PAAD
DOID:2994 germ cell cancer TCGA-TGCT
DOID:1324 lung cancer TCGA-LUSC
DOID:1790 malignant mesothelioma TCGA-MESO
DOID:2394 ovarian cancer TCGA-OV
DOID:1115 sarcoma TCGA-SARC
DOID:263 kidney cancer TCGA-KIRP
DOID:10534 stomach cancer TCGA-STAD
DOID:2531 hematologic cancer TCGA-LAML
DOID:10283 prostate cancer TCGA-PRAD
DOID:1324 lung cancer TCGA-LUAD
DOID:1612 breast cancer TCGA-BRCA
DOID:263 kidney cancer TCGA-KIRC
DOID:263 kidney cancer TCGA-KICH

Uniprot Accession:

  • `human_protein_transcriptlocus.csv`

Transcript ID (starts with ENSP) was mapped to uniprot isoform accession.

Mapping was NOT performed to uniprot canonical accession as this resulted in an issue with the final dataset in which a mutation for the same canonical accession would be listed with different amino acid changes.