Metagenomic resources
The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.
GutFeeling Knowledgebase (GFKB)
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).
We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.
Objective
The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: 10.1371/journal.pone.0206484), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.
GFKB downloads
Filtered NT
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.
Metagenomics Pipeline
We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.
Publications
Please use one or more of the following for citation(s):
- King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. doi: 10.1371/journal.pone.0206484
- Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: 25232094
- Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: 24918764
- Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: 271953
Funding
Current/past: NSF, Otsuka, MGPC
Acknowledgements
We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:
Stephanie Singleton
Lindsay Hopson
Jiuge (April) Yang
Tyson Dawson
Cameron Sabet
Yukta Chidanandan
Valery Simonyan
Nicole Post
Ben Osborne
Sophie Halkett
Miguel Mazumder
Questions / Comments
If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
CensuScope
Description coming soon.
Code repository: https://github.com/GW-HIVE/CensuScope
Publication:
slimNT
Description coming soon.
Current Slim NT database:
https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz
Current Slim NT taxonomy database:
https://hive.biochemistry.gwu.edu/static/slimNT.db.gz
Publications:
Simonyan V, Mazumder R
Funding
LOI_ID#L02496974, NSF_Lineage_Award #1546491