Dataset Resources

From HIVE Lab
Jump to navigation Jump to search

HIVE Team Datasets

BCO HCV

We demonstrated that the use of the IEEE 2791-2020 Standard, (BioCompute objects [BCO]) enables complete and concise communication of NGS data analysis results. One arm of a clinical trial4 was replicated using synthetically generated data made to resemble real biological data. Two separate, independent analyses were then carried out using BCOs as the tool for communication of analysis: one to simulate a pharmaceutical regulatory submission to the FDA, and another to simulate the FDA review. The two results were compared and tabulated for concordance analysis: of the 118 simulated patient samples generated, the final results of 117 (99.15%) were in agreement. This high concordance rate demonstrates the ability of a BCO, when a verification kit is included, to effectively capture and clearly communicate NGS analyses within regulatory submissions. BCO promotes transparency and induces reproducibility, thereby reinforcing trust in the regulatory submission process.

  • Base line 1: Version: 1.0, Size: 48 MB, Format: FASTQ
  • Base line 2: Version: 1.0, Size: 48 MB, Format: FASTQ
  • Treatment Failure 1: Version: 1.0, Size: 48 MB, Format: FASTQ
  • Treatment Failure 2: Version: 1.0, Size: 48 MB, Format: FASTQ

Full Dataset Download:

  • manifest.json: Version: 1.0, Size: 163K, Format: JSON
  • hcvALL.zip: Version: 1.0, Size: 4.4G, Format: ZIP

Citation: https://doi.org/10.1101/2020.12.07.415059

GFKB

Gut feeling knowledgebase is a reference database of healthy human gut microbiome. It is generated by a metagenomic analysis pipeline described in our paper https://doi.org/10.1371/journal.pone.0206484, and includes three tools which are integrated in the HIVE platform. 49 healthy samples sequenced at GWU and 49 healthy samples taken from The Human Microbiome Project were analyzed to create GutFeelingKB.

Polyester Simulated RNA-seq Reads for Chromosome 22

Simulated RNA-seq reads were generated using the R package polyester for Chromosome 22 of the human reference genome GRCh38. Two samples were generated, with each sample containing a unique 2 transcripts that are expressed at 20 fold higher than normal to serve as positive controls. These reads can be used for testing RNA-seq analysis pipelines and to gauge any variability an analysis has on validating the 20 fold difference of the positive control transcripts between samples.

  • <a href="#"> sample_01_01.fasta</a>: Forward reads for sample 1. Size: 1.7 GB, Format: FASTA
  • <a href="#"> sample_01_2.fasta</a>: Reverse reads for sample 1. Size: 1.7 GB, Format: FASTA
  • <a href="#">sample_02_01.fasta</a>: Forward reads for sample 2. Size: 1.8 GB, Format: FASTA
  • <a href="#">sample_02_02.fasta</a>: Reverse reads for sample 2. Size: 1.8 GB, Format: FASTA
  • <a href="#">sim_tx_info.txt</a>: Summary of fold changes per transcript. Size: 142 KB, Format: TXT

More info: https://bioconductor.org/packages/release/bioc/html/polyester.html