BioMuta pipeline README: Difference between revisions

From HIVE Lab
Jump to navigation Jump to search
Initial edit
 
Download: TCGA
Line 39: Line 39:
== Step 1: Download ==
== Step 1: Download ==
In the downloader step, mutation lists will be downloaded from each source. Refer to each individual source below for downloading instructions.
In the downloader step, mutation lists will be downloaded from each source. Refer to each individual source below for downloading instructions.
=== Download: TCGA ===
There are 2 parts to obtaining data from TCGA:
1. Primary TCGA data
* Available at [https://gdc.cancer.gov/ NCI Genomic Data Commons].
* Accessible through the ISB-CGC Big Query repository (more info TBA).
2. TCGA controlled-access data
* Hosted at dbGaP. For access, see Sharepoint (link TBA).
=== Download: CIViC ===
=== Download: COSMIC ===


== Step 2: Convert ==
== Step 2: Convert ==

Revision as of 15:56, 30 September 2024

Under construction

This article is still under construction and should not be nominated for deletion.

Originally updated by Ned Cauley (August 2022); currently maintained by Maria Kim (September 2024).

This page will contain an updated version of this BioMuta documentation page.

Description

The Biomuta pipeline gathers mutation data from various sources and combines them into a single dataset under common field structure.

The sources included in BioMuta are:

BioMuta gathers mutation data for the following cancers:

  • Urinary Bladder Cancer (DOID:11054)
  • Breast Cancer (DOID:1612)
  • Colorectal (DOID:9256)
  • Esophageal Cancer (DOID:5041)
  • Head and Neck Cancer (DOID:11934)
  • Kidney Cancer (DOID:263)
  • Liver Cancer (DOID:3571)
  • Lung Cancer (DOID:1324)
  • Prostate Cancer (DOID:10283)
  • Stomach Cancer (DOID:10534)
  • Thyroid Gland Cancer (DOID:1781)
  • Uterine Cancer (DOID:363)
  • Cervical Cancer (DOID:4362)
  • Brain Cancer (DOID:1319)
  • Hematologic Cancer (DOID:2531)
  • Adrenal Gland Cancer (DOID:3953)
  • Pancreatic Cancer (DOID:1793)
  • Ovarian Cancer (DOID:2394)
  • Skin Cancer (DOID:4159)

Running the Pipeline

To run the BioMuta pipeine, download the scripts from the HIVE Lab github repo: GW HIVE BioMuta Repository.

Pipeline Overview

Step 1: Download

In the downloader step, mutation lists will be downloaded from each source. Refer to each individual source below for downloading instructions.

Download: TCGA

There are 2 parts to obtaining data from TCGA:

1. Primary TCGA data

2. TCGA controlled-access data

  • Hosted at dbGaP. For access, see Sharepoint (link TBA).

Download: CIViC

Download: COSMIC

Step 2: Convert

In the convert step, all resources are formatted to the Biomuta standard for both data and field structure.

Step 3: Combine

In the combined step, all resources are combined into a master dataset.