Metagenomic resources

From HIVE Lab
Revision as of 17:58, 10 September 2025 by Jkeeney (talk | contribs) (Formatting updates)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.


GutFeeling Knowledgebase (GFKB)

We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

Objective

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: 10.1371/journal.pone.0206484), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

GFKB downloads

Version Content Files Format File Size Release Notes (Plain Text) Date Created
v1.0 RLDA KI Analysis pdf 393KB N/A May 14 2021
v1.0 ML MatLab Tutorial pdf 5.2KB N/A January 6 2021
v5.0 GFKB_v5-PreDiabetes.csv csv 57KB GutFeeling Knowledge Base Notes v5.0 January 18 2023
v4.0 GutFeelingKnowledgeBase-v4-Master_List.csv
GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv
csv 290KB
99KB
GutFeeling Knowledge Base Notes v4.0 March 31 2020
v3.0 GutFeelingKnowledgeBase-v3.csv csv 44KB GutFeeling Knowledge Base Notes v3.0 August 30 2019
v2.6 GutFeelingKnowledgeBase-v2.6.csv csv 44KB GutFeeling Knowledge Base Notes v2.6 July 23 2018
v2.6 HumanGutDB-v2.6.fasta-v2.6.csv fasta 549MB HumanGutDB v2.6 Notes July 23 2018
v2.0 GutFeelingKnowledgeBase-v2.0.csv csv 249KB GutFeeling Knowledge Base Notes v2.0 2017
v2.0 HumanGutDB-v2.0.fasta csv 533MB HumanGutDB v2.0 Notes 2017
v2.0 blockList-v2.0.csv csv 16KB Black List Notes v2.0 2017
v2.0 unalignedContigsGFKB-v2.0.fasta fasta 3.2GB Unaligned Contigs GFKB Notes 2017



Filtered NT

The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

Metagenomics Pipeline

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

Publications

Please use one or more of the following for citation(s):

  1. King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. doi: 10.1371/journal.pone.0206484
  2. Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: 25232094
  3. Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: 24918764
  4. Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: 271953


Funding

Current/past: NSF, Otsuka, MGPC

Acknowledgements

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

Stephanie Singleton
Lindsay Hopson
Jiuge (April) Yang
Tyson Dawson
Cameron Sabet
Yukta Chidanandan
Valery Simonyan
Nicole Post
Ben Osborne
Sophie Halkett
Miguel Mazumder

Questions / Comments

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).

CensuScope

Description coming soon.

Code repository: https://github.com/GW-HIVE/CensuScope


Publication:

Amirhossein et al.



slimNT

Description coming soon.

Current Slim NT database:

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

Current Slim NT taxonomy database:

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz

Publications:

Shamsaddini et al.

Santana-Quintero et al.

Simonyan et al.

Simonyan V, Mazumder R

Funding

LOI_ID#L02496974, NSF_Lineage_Award #1546491