HIVE Lab - User contributions [en]

Metagenomic resources

2025-09-17T13:11:36Z

Jkeeney: /* slimNT */

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.

 

== GutFeeling Knowledgebase (GFKB) ==
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''Objective'''

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206484 10.1371/journal.pone.0206484]), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

'''GFKB downloads'''

{| class="wikitable" style="margin:auto"
|-
! Version !! Content Files !! Format !! File Size !! Release Notes (Plain Text) !! Date Created
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/RLDA_KI_Analysis.pdf RLDA KI Analysis] || pdf || 393KB || N/A || May 14 2021
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf ML MatLab Tutorial] || pdf || 5.2KB || N/A || January 6 2021
|-
| v5.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf GFKB_v5-PreDiabetes.csv] || csv || 57KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v5.0] || January 18 2023
|-
| v4.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Master_List.csv GutFeelingKnowledgeBase-v4-Master_List.csv] [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv] || csv || 290KB 99KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v4.0] || March 31 2020
|-
| v3.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v3.csv GutFeelingKnowledgeBase-v3.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV3 GutFeeling Knowledge Base Notes v3.0] || August 30 2019
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.6.csv GutFeelingKnowledgeBase-v2.6.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2_6 GutFeeling Knowledge Base Notes v2.6] || July 23 2018
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.6.fasta HumanGutDB-v2.6.fasta-v2.6.csv] || fasta || 549MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2_6 HumanGutDB v2.6 Notes] || July 23 2018
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.0.csv GutFeelingKnowledgeBase-v2.0.csv] || csv || 249KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2 GutFeeling Knowledge Base Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.0.fasta HumanGutDB-v2.0.fasta] || csv || 533MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2 HumanGutDB v2.0 Notes] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/blackList-v2.0.csv blockList-v2.0.csv] || csv || 16KB || [https://hivelab.biochemistry.gwu.edu/blackListNotesV2 Black List Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/unalignedContigsGFKB-v2.0.fasta unalignedContigsGFKB-v2.0.fasta] || fasta || 3.2GB || [https://hivelab.biochemistry.gwu.edu/unalignedContigsGFKBNotesV2 Unaligned Contigs GFKB Notes] || 2017
|-
|}

 
 
== Filtered NT ==
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

'''Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

Please use one or more of the following for citation(s):

# [https://orcid.org/0000-0003-1409-4549 King CH], Desai H, Sylvetsky AC, [http://orcid.org/0000-0001-6897-5419 LoTempio J], Ayanyan S, Carrie J, [http://orcid.org/0000-0002-0836-3389 Crandall K], Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, [http://orcid.org/0000-0001-8823-9945 Mazumder R]. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. [https://doi.org/10.1371/journal.pone.0206484 doi: 10.1371/journal.pone.0206484]
# Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/25336203 25232094]
# Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/24918764 24918764]
# Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/2527195325 271953]

'''Funding'''

Current/past: NSF, Otsuka, MGPC

'''Acknowledgements'''

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

[https://orcid.org/0000-0003-0256-9834 Stephanie Singleton] 
[https://orcid.org/0000-0002-1586-1693 Lindsay Hopson] 
[https://orcid.org/0000-0002-6497-0714 Jiuge (April) Yang] 
[https://orcid.org/0000-0003-4888-9673 Tyson Dawson] 
[https://orcid.org/0000-0003-2299-1426 Cameron Sabet] 
[https://orcid.org/0000-0001-5703-5667 Yukta Chidanandan] 
[https://orcid.org/0000-0002-2577-3240 Valery Simonyan] 
[https://orcid.org/0000-0002-0457-7056 Nicole Post] 
[https://orcid.org/0000-0002-9007-8746 Ben Osborne] 
[https://orcid.org/0000-0001-9721-3181 Sophie Halkett] 
[https://orcid.org/0000-0003-1181-8118 Miguel Mazumder] 

'''Questions / Comments'''

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
 
 

== CensuScope ==
CensuScope is a tool to rapidly profile metagenomic samples. The tool works by bootstrapping the data, then carrying out subsample aggregation to estimate sample composition. The tool is many orders of magnitude faster than brute force alignment against the NT database, and has greater than 99% accuracy for species present at 1% of the composition or higher. Because the tool is so lightweight, the computational resources needed to run it are minimal. A typical consumer laptop is capable of running the tool (assuming the database to be searched exists on the laptop. The user can adjust the number of iterations and samples per iteration used, or they can use machine learning to determine how many cycles to run.

'''Code repository:'''
https://github.com/GW-HIVE/CensuScope

'''Publication:'''

[https://doi.org/10.1186/1471-2164-15-918 Amirhossein ''et al''.]

 
 

== slimNT ==
Because the NCBI nucleotide database ("NT") has grown so big in recent years, it has become difficult to work with. slimNT is an attempt to take a contextually relevant slice of that database, using a hierarchical clustering approach. The steps of this approach are as follows:

1. Take the representative non-viral proteomes at 75% cutoff [https://proteininformationresource.org/rps/ from PIR] 
2. Map their proteome IDs to genome accessions and retrieve the genomes 
3. If the genus and species are not present in the above list, get the UniProt reference proteome ID 
4. Map these proteome IDs to genome accessions and retrieve the genomes 

'''Current Slim NT database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz (32.1GB)

'''Current Slim NT taxonomy database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz (16.8GB)
 
 
'''Publications:'''

[https://doi.org/10.1186/1471-2164-15-918 Shamsaddini ''et al''.]

[https://doi.org/10.1371/journal.pone.0099033 Santana-Quintero ''et al''.]

[https://doi.org/10.1093/database/baw022 Simonyan ''et al''.]

[https://doi.org/10.3390/genes5040957 Simonyan V, Mazumder R]
 
 
'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Metagenomic resources

2025-09-17T13:10:38Z

Jkeeney: /* slimNT */

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.

 

== GutFeeling Knowledgebase (GFKB) ==
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''Objective'''

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206484 10.1371/journal.pone.0206484]), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

'''GFKB downloads'''

{| class="wikitable" style="margin:auto"
|-
! Version !! Content Files !! Format !! File Size !! Release Notes (Plain Text) !! Date Created
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/RLDA_KI_Analysis.pdf RLDA KI Analysis] || pdf || 393KB || N/A || May 14 2021
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf ML MatLab Tutorial] || pdf || 5.2KB || N/A || January 6 2021
|-
| v5.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf GFKB_v5-PreDiabetes.csv] || csv || 57KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v5.0] || January 18 2023
|-
| v4.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Master_List.csv GutFeelingKnowledgeBase-v4-Master_List.csv] [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv] || csv || 290KB 99KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v4.0] || March 31 2020
|-
| v3.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v3.csv GutFeelingKnowledgeBase-v3.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV3 GutFeeling Knowledge Base Notes v3.0] || August 30 2019
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.6.csv GutFeelingKnowledgeBase-v2.6.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2_6 GutFeeling Knowledge Base Notes v2.6] || July 23 2018
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.6.fasta HumanGutDB-v2.6.fasta-v2.6.csv] || fasta || 549MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2_6 HumanGutDB v2.6 Notes] || July 23 2018
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.0.csv GutFeelingKnowledgeBase-v2.0.csv] || csv || 249KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2 GutFeeling Knowledge Base Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.0.fasta HumanGutDB-v2.0.fasta] || csv || 533MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2 HumanGutDB v2.0 Notes] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/blackList-v2.0.csv blockList-v2.0.csv] || csv || 16KB || [https://hivelab.biochemistry.gwu.edu/blackListNotesV2 Black List Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/unalignedContigsGFKB-v2.0.fasta unalignedContigsGFKB-v2.0.fasta] || fasta || 3.2GB || [https://hivelab.biochemistry.gwu.edu/unalignedContigsGFKBNotesV2 Unaligned Contigs GFKB Notes] || 2017
|-
|}

 
 
== Filtered NT ==
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

'''Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

Please use one or more of the following for citation(s):

# [https://orcid.org/0000-0003-1409-4549 King CH], Desai H, Sylvetsky AC, [http://orcid.org/0000-0001-6897-5419 LoTempio J], Ayanyan S, Carrie J, [http://orcid.org/0000-0002-0836-3389 Crandall K], Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, [http://orcid.org/0000-0001-8823-9945 Mazumder R]. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. [https://doi.org/10.1371/journal.pone.0206484 doi: 10.1371/journal.pone.0206484]
# Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/25336203 25232094]
# Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/24918764 24918764]
# Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/2527195325 271953]

'''Funding'''

Current/past: NSF, Otsuka, MGPC

'''Acknowledgements'''

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

[https://orcid.org/0000-0003-0256-9834 Stephanie Singleton] 
[https://orcid.org/0000-0002-1586-1693 Lindsay Hopson] 
[https://orcid.org/0000-0002-6497-0714 Jiuge (April) Yang] 
[https://orcid.org/0000-0003-4888-9673 Tyson Dawson] 
[https://orcid.org/0000-0003-2299-1426 Cameron Sabet] 
[https://orcid.org/0000-0001-5703-5667 Yukta Chidanandan] 
[https://orcid.org/0000-0002-2577-3240 Valery Simonyan] 
[https://orcid.org/0000-0002-0457-7056 Nicole Post] 
[https://orcid.org/0000-0002-9007-8746 Ben Osborne] 
[https://orcid.org/0000-0001-9721-3181 Sophie Halkett] 
[https://orcid.org/0000-0003-1181-8118 Miguel Mazumder] 

'''Questions / Comments'''

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
 
 

== CensuScope ==
CensuScope is a tool to rapidly profile metagenomic samples. The tool works by bootstrapping the data, then carrying out subsample aggregation to estimate sample composition. The tool is many orders of magnitude faster than brute force alignment against the NT database, and has greater than 99% accuracy for species present at 1% of the composition or higher. Because the tool is so lightweight, the computational resources needed to run it are minimal. A typical consumer laptop is capable of running the tool (assuming the database to be searched exists on the laptop. The user can adjust the number of iterations and samples per iteration used, or they can use machine learning to determine how many cycles to run.

'''Code repository:'''
https://github.com/GW-HIVE/CensuScope

'''Publication:'''

[https://doi.org/10.1186/1471-2164-15-918 Amirhossein ''et al''.]

 
 

== slimNT ==
Because the NCBI nucleotide database ("NT") has grown so big in recent years, it has become difficult to work with. slimNT is an attempt to take a contextually relevant slice of that database, using a hierarchical clustering approach. The steps of this approach are as follows:

1. Take the representative non-viral proteomes at 75% cutoff [https://proteininformationresource.org/rps/ from PIR] 
2. Map their proteome IDs to genome accessions and retrieve the genomes 
3. If the genus and species are not present in the above list, get the UniProt reference proteome ID 
4. Map these proteome IDs to genome accessions and retrieve the genomes 

'''Current Slim NT database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

'''Current Slim NT taxonomy database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz
 
 
'''Publications:'''

[https://doi.org/10.1186/1471-2164-15-918 Shamsaddini ''et al''.]

[https://doi.org/10.1371/journal.pone.0099033 Santana-Quintero ''et al''.]

[https://doi.org/10.1093/database/baw022 Simonyan ''et al''.]

[https://doi.org/10.3390/genes5040957 Simonyan V, Mazumder R]
 
 
'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Metagenomic resources

2025-09-17T13:10:05Z

Jkeeney: /* slimNT */

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.

 

== GutFeeling Knowledgebase (GFKB) ==
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''Objective'''

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206484 10.1371/journal.pone.0206484]), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

'''GFKB downloads'''

{| class="wikitable" style="margin:auto"
|-
! Version !! Content Files !! Format !! File Size !! Release Notes (Plain Text) !! Date Created
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/RLDA_KI_Analysis.pdf RLDA KI Analysis] || pdf || 393KB || N/A || May 14 2021
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf ML MatLab Tutorial] || pdf || 5.2KB || N/A || January 6 2021
|-
| v5.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf GFKB_v5-PreDiabetes.csv] || csv || 57KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v5.0] || January 18 2023
|-
| v4.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Master_List.csv GutFeelingKnowledgeBase-v4-Master_List.csv] [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv] || csv || 290KB 99KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v4.0] || March 31 2020
|-
| v3.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v3.csv GutFeelingKnowledgeBase-v3.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV3 GutFeeling Knowledge Base Notes v3.0] || August 30 2019
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.6.csv GutFeelingKnowledgeBase-v2.6.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2_6 GutFeeling Knowledge Base Notes v2.6] || July 23 2018
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.6.fasta HumanGutDB-v2.6.fasta-v2.6.csv] || fasta || 549MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2_6 HumanGutDB v2.6 Notes] || July 23 2018
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.0.csv GutFeelingKnowledgeBase-v2.0.csv] || csv || 249KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2 GutFeeling Knowledge Base Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.0.fasta HumanGutDB-v2.0.fasta] || csv || 533MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2 HumanGutDB v2.0 Notes] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/blackList-v2.0.csv blockList-v2.0.csv] || csv || 16KB || [https://hivelab.biochemistry.gwu.edu/blackListNotesV2 Black List Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/unalignedContigsGFKB-v2.0.fasta unalignedContigsGFKB-v2.0.fasta] || fasta || 3.2GB || [https://hivelab.biochemistry.gwu.edu/unalignedContigsGFKBNotesV2 Unaligned Contigs GFKB Notes] || 2017
|-
|}

 
 
== Filtered NT ==
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

'''Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

Please use one or more of the following for citation(s):

# [https://orcid.org/0000-0003-1409-4549 King CH], Desai H, Sylvetsky AC, [http://orcid.org/0000-0001-6897-5419 LoTempio J], Ayanyan S, Carrie J, [http://orcid.org/0000-0002-0836-3389 Crandall K], Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, [http://orcid.org/0000-0001-8823-9945 Mazumder R]. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. [https://doi.org/10.1371/journal.pone.0206484 doi: 10.1371/journal.pone.0206484]
# Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/25336203 25232094]
# Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/24918764 24918764]
# Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/2527195325 271953]

'''Funding'''

Current/past: NSF, Otsuka, MGPC

'''Acknowledgements'''

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

[https://orcid.org/0000-0003-0256-9834 Stephanie Singleton] 
[https://orcid.org/0000-0002-1586-1693 Lindsay Hopson] 
[https://orcid.org/0000-0002-6497-0714 Jiuge (April) Yang] 
[https://orcid.org/0000-0003-4888-9673 Tyson Dawson] 
[https://orcid.org/0000-0003-2299-1426 Cameron Sabet] 
[https://orcid.org/0000-0001-5703-5667 Yukta Chidanandan] 
[https://orcid.org/0000-0002-2577-3240 Valery Simonyan] 
[https://orcid.org/0000-0002-0457-7056 Nicole Post] 
[https://orcid.org/0000-0002-9007-8746 Ben Osborne] 
[https://orcid.org/0000-0001-9721-3181 Sophie Halkett] 
[https://orcid.org/0000-0003-1181-8118 Miguel Mazumder] 

'''Questions / Comments'''

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
 
 

== CensuScope ==
CensuScope is a tool to rapidly profile metagenomic samples. The tool works by bootstrapping the data, then carrying out subsample aggregation to estimate sample composition. The tool is many orders of magnitude faster than brute force alignment against the NT database, and has greater than 99% accuracy for species present at 1% of the composition or higher. Because the tool is so lightweight, the computational resources needed to run it are minimal. A typical consumer laptop is capable of running the tool (assuming the database to be searched exists on the laptop. The user can adjust the number of iterations and samples per iteration used, or they can use machine learning to determine how many cycles to run.

'''Code repository:'''
https://github.com/GW-HIVE/CensuScope

'''Publication:'''

[https://doi.org/10.1186/1471-2164-15-918 Amirhossein ''et al''.]

 
 

== slimNT ==
Because the NCBI nucleotide database ("NT") has grown so big in recent years, it has become difficult to work with. slimNT is an attempt to take a contextually relevant slice of that database, using a hierarchical clustering approach. The steps of this approach are as follows:

1. Take the representative non-viral proteomes at 75% cutoff [https://proteininformationresource.org/rps/ from PIR]
2. Map their proteome IDs to genome accessions and retrieve the genomes
3. If the genus and species are not present in the above list, get the UniProt reference proteome ID
4. Map these proteome IDs to genome accessions and retrieve the genomes

'''Current Slim NT database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

'''Current Slim NT taxonomy database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz
 
 
'''Publications:'''

[https://doi.org/10.1186/1471-2164-15-918 Shamsaddini ''et al''.]

[https://doi.org/10.1371/journal.pone.0099033 Santana-Quintero ''et al''.]

[https://doi.org/10.1093/database/baw022 Simonyan ''et al''.]

[https://doi.org/10.3390/genes5040957 Simonyan V, Mazumder R]
 
 
'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Metagenomic resources

2025-09-17T12:59:53Z

Jkeeney: /* CensuScope */

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.

 

== GutFeeling Knowledgebase (GFKB) ==
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''Objective'''

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206484 10.1371/journal.pone.0206484]), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

'''GFKB downloads'''

{| class="wikitable" style="margin:auto"
|-
! Version !! Content Files !! Format !! File Size !! Release Notes (Plain Text) !! Date Created
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/RLDA_KI_Analysis.pdf RLDA KI Analysis] || pdf || 393KB || N/A || May 14 2021
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf ML MatLab Tutorial] || pdf || 5.2KB || N/A || January 6 2021
|-
| v5.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf GFKB_v5-PreDiabetes.csv] || csv || 57KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v5.0] || January 18 2023
|-
| v4.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Master_List.csv GutFeelingKnowledgeBase-v4-Master_List.csv] [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv] || csv || 290KB 99KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v4.0] || March 31 2020
|-
| v3.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v3.csv GutFeelingKnowledgeBase-v3.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV3 GutFeeling Knowledge Base Notes v3.0] || August 30 2019
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.6.csv GutFeelingKnowledgeBase-v2.6.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2_6 GutFeeling Knowledge Base Notes v2.6] || July 23 2018
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.6.fasta HumanGutDB-v2.6.fasta-v2.6.csv] || fasta || 549MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2_6 HumanGutDB v2.6 Notes] || July 23 2018
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.0.csv GutFeelingKnowledgeBase-v2.0.csv] || csv || 249KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2 GutFeeling Knowledge Base Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.0.fasta HumanGutDB-v2.0.fasta] || csv || 533MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2 HumanGutDB v2.0 Notes] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/blackList-v2.0.csv blockList-v2.0.csv] || csv || 16KB || [https://hivelab.biochemistry.gwu.edu/blackListNotesV2 Black List Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/unalignedContigsGFKB-v2.0.fasta unalignedContigsGFKB-v2.0.fasta] || fasta || 3.2GB || [https://hivelab.biochemistry.gwu.edu/unalignedContigsGFKBNotesV2 Unaligned Contigs GFKB Notes] || 2017
|-
|}

 
 
== Filtered NT ==
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

'''Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

Please use one or more of the following for citation(s):

# [https://orcid.org/0000-0003-1409-4549 King CH], Desai H, Sylvetsky AC, [http://orcid.org/0000-0001-6897-5419 LoTempio J], Ayanyan S, Carrie J, [http://orcid.org/0000-0002-0836-3389 Crandall K], Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, [http://orcid.org/0000-0001-8823-9945 Mazumder R]. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. [https://doi.org/10.1371/journal.pone.0206484 doi: 10.1371/journal.pone.0206484]
# Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/25336203 25232094]
# Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/24918764 24918764]
# Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/2527195325 271953]

'''Funding'''

Current/past: NSF, Otsuka, MGPC

'''Acknowledgements'''

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

[https://orcid.org/0000-0003-0256-9834 Stephanie Singleton] 
[https://orcid.org/0000-0002-1586-1693 Lindsay Hopson] 
[https://orcid.org/0000-0002-6497-0714 Jiuge (April) Yang] 
[https://orcid.org/0000-0003-4888-9673 Tyson Dawson] 
[https://orcid.org/0000-0003-2299-1426 Cameron Sabet] 
[https://orcid.org/0000-0001-5703-5667 Yukta Chidanandan] 
[https://orcid.org/0000-0002-2577-3240 Valery Simonyan] 
[https://orcid.org/0000-0002-0457-7056 Nicole Post] 
[https://orcid.org/0000-0002-9007-8746 Ben Osborne] 
[https://orcid.org/0000-0001-9721-3181 Sophie Halkett] 
[https://orcid.org/0000-0003-1181-8118 Miguel Mazumder] 

'''Questions / Comments'''

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
 
 

== CensuScope ==
CensuScope is a tool to rapidly profile metagenomic samples. The tool works by bootstrapping the data, then carrying out subsample aggregation to estimate sample composition. The tool is many orders of magnitude faster than brute force alignment against the NT database, and has greater than 99% accuracy for species present at 1% of the composition or higher. Because the tool is so lightweight, the computational resources needed to run it are minimal. A typical consumer laptop is capable of running the tool (assuming the database to be searched exists on the laptop. The user can adjust the number of iterations and samples per iteration used, or they can use machine learning to determine how many cycles to run.

'''Code repository:'''
https://github.com/GW-HIVE/CensuScope

'''Publication:'''

[https://doi.org/10.1186/1471-2164-15-918 Amirhossein ''et al''.]

 
 

== slimNT ==
Description coming soon.

'''Current Slim NT database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

'''Current Slim NT taxonomy database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz
 
 
'''Publications:'''

[https://doi.org/10.1186/1471-2164-15-918 Shamsaddini ''et al''.]

[https://doi.org/10.1371/journal.pone.0099033 Santana-Quintero ''et al''.]

[https://doi.org/10.1093/database/baw022 Simonyan ''et al''.]

[https://doi.org/10.3390/genes5040957 Simonyan V, Mazumder R]
 
 
'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Metagenomic resources

2025-09-10T17:58:59Z

Jkeeney: Formatting updates

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.

 

== GutFeeling Knowledgebase (GFKB) ==
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''Objective'''

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206484 10.1371/journal.pone.0206484]), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

'''GFKB downloads'''

{| class="wikitable" style="margin:auto"
|-
! Version !! Content Files !! Format !! File Size !! Release Notes (Plain Text) !! Date Created
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/RLDA_KI_Analysis.pdf RLDA KI Analysis] || pdf || 393KB || N/A || May 14 2021
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf ML MatLab Tutorial] || pdf || 5.2KB || N/A || January 6 2021
|-
| v5.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf GFKB_v5-PreDiabetes.csv] || csv || 57KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v5.0] || January 18 2023
|-
| v4.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Master_List.csv GutFeelingKnowledgeBase-v4-Master_List.csv] [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv] || csv || 290KB 99KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v4.0] || March 31 2020
|-
| v3.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v3.csv GutFeelingKnowledgeBase-v3.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV3 GutFeeling Knowledge Base Notes v3.0] || August 30 2019
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.6.csv GutFeelingKnowledgeBase-v2.6.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2_6 GutFeeling Knowledge Base Notes v2.6] || July 23 2018
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.6.fasta HumanGutDB-v2.6.fasta-v2.6.csv] || fasta || 549MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2_6 HumanGutDB v2.6 Notes] || July 23 2018
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.0.csv GutFeelingKnowledgeBase-v2.0.csv] || csv || 249KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2 GutFeeling Knowledge Base Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.0.fasta HumanGutDB-v2.0.fasta] || csv || 533MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2 HumanGutDB v2.0 Notes] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/blackList-v2.0.csv blockList-v2.0.csv] || csv || 16KB || [https://hivelab.biochemistry.gwu.edu/blackListNotesV2 Black List Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/unalignedContigsGFKB-v2.0.fasta unalignedContigsGFKB-v2.0.fasta] || fasta || 3.2GB || [https://hivelab.biochemistry.gwu.edu/unalignedContigsGFKBNotesV2 Unaligned Contigs GFKB Notes] || 2017
|-
|}

 
 
== Filtered NT ==
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

'''Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

Please use one or more of the following for citation(s):

# [https://orcid.org/0000-0003-1409-4549 King CH], Desai H, Sylvetsky AC, [http://orcid.org/0000-0001-6897-5419 LoTempio J], Ayanyan S, Carrie J, [http://orcid.org/0000-0002-0836-3389 Crandall K], Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, [http://orcid.org/0000-0001-8823-9945 Mazumder R]. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. [https://doi.org/10.1371/journal.pone.0206484 doi: 10.1371/journal.pone.0206484]
# Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/25336203 25232094]
# Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/24918764 24918764]
# Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/2527195325 271953]

'''Funding'''

Current/past: NSF, Otsuka, MGPC

'''Acknowledgements'''

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

[https://orcid.org/0000-0003-0256-9834 Stephanie Singleton] 
[https://orcid.org/0000-0002-1586-1693 Lindsay Hopson] 
[https://orcid.org/0000-0002-6497-0714 Jiuge (April) Yang] 
[https://orcid.org/0000-0003-4888-9673 Tyson Dawson] 
[https://orcid.org/0000-0003-2299-1426 Cameron Sabet] 
[https://orcid.org/0000-0001-5703-5667 Yukta Chidanandan] 
[https://orcid.org/0000-0002-2577-3240 Valery Simonyan] 
[https://orcid.org/0000-0002-0457-7056 Nicole Post] 
[https://orcid.org/0000-0002-9007-8746 Ben Osborne] 
[https://orcid.org/0000-0001-9721-3181 Sophie Halkett] 
[https://orcid.org/0000-0003-1181-8118 Miguel Mazumder] 

'''Questions / Comments'''

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
 
 

== CensuScope ==
Description coming soon.

'''Code repository:'''
https://github.com/GW-HIVE/CensuScope

'''Publication:'''

[https://doi.org/10.1186/1471-2164-15-918 Amirhossein ''et al''.]

 
 

== slimNT ==
Description coming soon.

'''Current Slim NT database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

'''Current Slim NT taxonomy database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz
 
 
'''Publications:'''

[https://doi.org/10.1186/1471-2164-15-918 Shamsaddini ''et al''.]

[https://doi.org/10.1371/journal.pone.0099033 Santana-Quintero ''et al''.]

[https://doi.org/10.1093/database/baw022 Simonyan ''et al''.]

[https://doi.org/10.3390/genes5040957 Simonyan V, Mazumder R]
 
 
'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Dataset Resources

2025-09-10T17:49:52Z

Jkeeney: Update GFKB download link

<h2>HIVE Team Datasets</h2>


<h2>BCO HCV</h2>

We demonstrated that the use of the IEEE 2791-2020 Standard, (BioCompute objects [BCO]) enables complete and concise communication of NGS data analysis results. One arm of a clinical trial4 was replicated using synthetically generated data made to resemble real biological data. Two separate, independent analyses were then carried out using BCOs as the tool for communication of analysis: one to simulate a pharmaceutical regulatory submission to the FDA, and another to simulate the FDA review. The two results were compared and tabulated for concordance analysis: of the 118 simulated patient samples generated, the final results of 117 (99.15%) were in agreement. This high concordance rate demonstrates the ability of a BCO, when a verification kit is included, to effectively capture and clearly communicate NGS analyses within regulatory submissions. BCO promotes transparency and induces reproducibility, thereby reinforcing trust in the regulatory submission process.

<ul>
<li>Base line 1: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
<li>Base line 2: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
<li>Treatment Failure 1: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
<li>Treatment Failure 2: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
</ul>
<h3>Full Dataset Download:</h3>
<ul>
<li>manifest.json: Version: 1.0, Size: 163K, Format: JSON</li>
<li>hcvALL.zip: Version: 1.0, Size: 4.4G, Format: ZIP</li>
</ul>
Citation: https://doi.org/10.1101/2020.12.07.415059


<h2>GFKB</h2>

Gut feeling knowledgebase is a reference database of healthy human gut microbiome. It is generated by a metagenomic analysis pipeline described in our paper https://doi.org/10.1371/journal.pone.0206484, and includes three tools which are integrated in the HIVE platform. 49 healthy samples sequenced at GWU and 49 healthy samples taken from The Human Microbiome Project were analyzed to create GutFeelingKB.

<ul>
<li>Version: 4.0</li>
<li>Citation: https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535</li>
<li>Downloads Link: https://hivelab.biochemistry.gwu.edu/wiki/Metagenomic_resources#GutFeeling_Knowledgebase_(GFKB)</li>
</ul>


<h2>Polyester Simulated RNA-seq Reads for Chromosome 22</h2>


Simulated RNA-seq reads were generated using the R package polyester for Chromosome 22 of the human reference genome GRCh38. Two samples were generated, with each sample containing a unique 2 transcripts that are expressed at 20 fold higher than normal to serve as positive controls. These reads can be used for testing RNA-seq analysis pipelines and to gauge any variability an analysis has on validating the 20 fold difference of the positive control transcripts between samples.


<ul style="font-weight: normal;">
<li>Version: 1.0 More info: [https://bioconductor.org/packages/release/bioc/html/polyester.html Bioconductor polyester page]</li>
<li>[sample_01_01.fasta]: Forward reads for sample 1. Size: 1.7 GB, Format: FASTA</li>
<li>[sample_01_2.fasta]: Reverse reads for sample 1. Size: 1.7 GB, Format: FASTA</li>
<li>[sample_02_01.fasta]: Forward reads for sample 2. Size: 1.8 GB, Format: FASTA</li>
<li>[sample_02_02.fasta]: Reverse reads for sample 2. Size: 1.8 GB, Format: FASTA</li>
<li>[sim_tx_info.txt]: Summary of fold changes per transcript. Size: 142 KB, Format: TXT</li>
</ul style="font-weight: normal;">

Metagenomic resources

2025-09-10T17:48:05Z

Jkeeney: Added links to censuscope and slimnt

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.

 

=== GutFeeling Knowledgebase (GFKB) ===
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''Objective'''

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206484 10.1371/journal.pone.0206484]), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

'''GFKB downloads'''

{| class="wikitable" style="margin:auto"
|-
! Version !! Content Files !! Format !! File Size !! Release Notes (Plain Text) !! Date Created
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/RLDA_KI_Analysis.pdf RLDA KI Analysis] || pdf || 393KB || N/A || May 14 2021
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf ML MatLab Tutorial] || pdf || 5.2KB || N/A || January 6 2021
|-
| v5.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf GFKB_v5-PreDiabetes.csv] || csv || 57KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v5.0] || January 18 2023
|-
| v4.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Master_List.csv GutFeelingKnowledgeBase-v4-Master_List.csv] [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv] || csv || 290KB 99KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v4.0] || March 31 2020
|-
| v3.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v3.csv GutFeelingKnowledgeBase-v3.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV3 GutFeeling Knowledge Base Notes v3.0] || August 30 2019
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.6.csv GutFeelingKnowledgeBase-v2.6.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2_6 GutFeeling Knowledge Base Notes v2.6] || July 23 2018
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.6.fasta HumanGutDB-v2.6.fasta-v2.6.csv] || fasta || 549MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2_6 HumanGutDB v2.6 Notes] || July 23 2018
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.0.csv GutFeelingKnowledgeBase-v2.0.csv] || csv || 249KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2 GutFeeling Knowledge Base Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.0.fasta HumanGutDB-v2.0.fasta] || csv || 533MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2 HumanGutDB v2.0 Notes] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/blackList-v2.0.csv blockList-v2.0.csv] || csv || 16KB || [https://hivelab.biochemistry.gwu.edu/blackListNotesV2 Black List Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/unalignedContigsGFKB-v2.0.fasta unalignedContigsGFKB-v2.0.fasta] || fasta || 3.2GB || [https://hivelab.biochemistry.gwu.edu/unalignedContigsGFKBNotesV2 Unaligned Contigs GFKB Notes] || 2017
|-
|}

 
 

=== Filtered NT ===
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

'''Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

Please use one or more of the following for citation(s):

# [https://orcid.org/0000-0003-1409-4549 King CH], Desai H, Sylvetsky AC, [http://orcid.org/0000-0001-6897-5419 LoTempio J], Ayanyan S, Carrie J, [http://orcid.org/0000-0002-0836-3389 Crandall K], Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, [http://orcid.org/0000-0001-8823-9945 Mazumder R]. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. [https://doi.org/10.1371/journal.pone.0206484 doi: 10.1371/journal.pone.0206484]
# Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/25336203 25232094]
# Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/24918764 24918764]
# Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/2527195325 271953]

'''Funding'''

Current/past: NSF, Otsuka, MGPC

'''Acknowledgements'''

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

[https://orcid.org/0000-0003-0256-9834 Stephanie Singleton] 
[https://orcid.org/0000-0002-1586-1693 Lindsay Hopson] 
[https://orcid.org/0000-0002-6497-0714 Jiuge (April) Yang] 
[https://orcid.org/0000-0003-4888-9673 Tyson Dawson] 
[https://orcid.org/0000-0003-2299-1426 Cameron Sabet] 
[https://orcid.org/0000-0001-5703-5667 Yukta Chidanandan] 
[https://orcid.org/0000-0002-2577-3240 Valery Simonyan] 
[https://orcid.org/0000-0002-0457-7056 Nicole Post] 
[https://orcid.org/0000-0002-9007-8746 Ben Osborne] 
[https://orcid.org/0000-0001-9721-3181 Sophie Halkett] 
[https://orcid.org/0000-0003-1181-8118 Miguel Mazumder] 

'''Questions / Comments'''

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
 
 

=== CensuScope ===
Description coming soon.

'''Code repository:'''
https://github.com/GW-HIVE/CensuScope

'''Publication:'''

[https://doi.org/10.1186/1471-2164-15-918 Amirhossein ''et al''.]

 
 

=== slimNT ===
Description coming soon.

'''Current Slim NT database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

'''Current Slim NT taxonomy database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz
 
 
'''Publications:'''

[https://doi.org/10.1186/1471-2164-15-918 Shamsaddini ''et al''.]

[https://doi.org/10.1371/journal.pone.0099033 Santana-Quintero ''et al''.]

[https://doi.org/10.1093/database/baw022 Simonyan ''et al''.]

[https://doi.org/10.3390/genes5040957 Simonyan V, Mazumder R]
 
 
'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Metagenomic resources

2025-09-10T17:35:26Z

Jkeeney: Built page for representing all Mazumder lab metagenomic resources

The Mazumder lab has developed several open source resources for metagenomic analysis, listed on this page.

 

=== GutFeeling Knowledgebase (GFKB) ===
We have developed a proof-of-concept gut microbiome monitoring system using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''Objective'''

The Gut Feeling Knowledgebase (GFKB) is a reference database of human gut microbiomes from both healthy individuals and those diagnosed with a disease or condition. The GFKB is generated by a metagenomic analysis pipeline described in our paper (doi: [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206484 10.1371/journal.pone.0206484]), and includes three tools which are integrated in the HIVE platform. The aim of this database is to catalog bacterial organisms found within the human digestive tract as the lab continues to conduct metagenomic research on various diseases and conditions (e.g. epilepsy and pre-diabetes). Our hope is to identify key organisms found within the gut as a means to understand how their imbalances may impact human health. Currently, the HIVE lab has documented over 500 bacterial organisms within this database, and we hope to continue adding more organisms as we proceed with our current project in predicting intervention outcomes of pre-diabetic patients using their relative gut microbiome abundances.

'''GFKB downloads'''

{| class="wikitable" style="margin:auto"
|-
! Version !! Content Files !! Format !! File Size !! Release Notes (Plain Text) !! Date Created
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/RLDA_KI_Analysis.pdf RLDA KI Analysis] || pdf || 393KB || N/A || May 14 2021
|-
| v1.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf ML MatLab Tutorial] || pdf || 5.2KB || N/A || January 6 2021
|-
| v5.0 || [https://hivelab.biochemistry.gwu.edu/docs/ML_Matlab_Tutorial.pdf GFKB_v5-PreDiabetes.csv] || csv || 57KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v5.0] || January 18 2023
|-
| v4.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Master_List.csv GutFeelingKnowledgeBase-v4-Master_List.csv] [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv GutFeelingKnowledgeBase-v4-Epilepsy_Data.csv] || csv || 290KB 99KB || [https://hivelab.biochemistry.gwu.edu/docs/GFKB_v5_PreDiabetes.csv GutFeeling Knowledge Base Notes v4.0] || March 31 2020
|-
| v3.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v3.csv GutFeelingKnowledgeBase-v3.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV3 GutFeeling Knowledge Base Notes v3.0] || August 30 2019
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.6.csv GutFeelingKnowledgeBase-v2.6.csv] || csv || 44KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2_6 GutFeeling Knowledge Base Notes v2.6] || July 23 2018
|-
| v2.6 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.6.fasta HumanGutDB-v2.6.fasta-v2.6.csv] || fasta || 549MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2_6 HumanGutDB v2.6 Notes] || July 23 2018
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/GutFeelingKnowledgeBase-v2.0.csv GutFeelingKnowledgeBase-v2.0.csv] || csv || 249KB || [https://hivelab.biochemistry.gwu.edu/GutFeelingKnowledgeBaseNotesV2 GutFeeling Knowledge Base Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/HumanGutDB-v2.0.fasta HumanGutDB-v2.0.fasta] || csv || 533MB || [https://hivelab.biochemistry.gwu.edu/HumanGutDBv2 HumanGutDB v2.0 Notes] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/blackList-v2.0.csv blockList-v2.0.csv] || csv || 16KB || [https://hivelab.biochemistry.gwu.edu/blackListNotesV2 Black List Notes v2.0] || 2017
|-
| v2.0 || [https://hivelab.biochemistry.gwu.edu/docs/unalignedContigsGFKB-v2.0.fasta unalignedContigsGFKB-v2.0.fasta] || fasta || 3.2GB || [https://hivelab.biochemistry.gwu.edu/unalignedContigsGFKBNotesV2 Unaligned Contigs GFKB Notes] || 2017
|-
|}

 
 

=== Filtered NT ===
The Filtered NT dataset is generated by excluding sequences from the whole nucleotide file provided by NCBI, based on whether they have unwanted taxonomy names or any child taxonomy name of these unwanted ones.

'''Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

Please use one or more of the following for citation(s):

# [https://orcid.org/0000-0003-1409-4549 King CH], Desai H, Sylvetsky AC, [http://orcid.org/0000-0001-6897-5419 LoTempio J], Ayanyan S, Carrie J, [http://orcid.org/0000-0002-0836-3389 Crandall K], Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, [http://orcid.org/0000-0001-8823-9945 Mazumder R]. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE 2019. [https://doi.org/10.1371/journal.pone.0206484 doi: 10.1371/journal.pone.0206484]
# Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014;15(1):918. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/25336203 25232094]
# Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014;9(6):e99033. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/24918764 24918764]
# Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. PMID: [http://www.ncbi.nlm.nih.gov/pubmed/2527195325 271953]

'''Funding'''

Current/past: NSF, Otsuka, MGPC

'''Acknowledgements'''

We would like to thank the following individuals for their significant work in curation and annotation of the GFKB:

[https://orcid.org/0000-0003-0256-9834 Stephanie Singleton] 
[https://orcid.org/0000-0002-1586-1693 Lindsay Hopson] 
[https://orcid.org/0000-0002-6497-0714 Jiuge (April) Yang] 
[https://orcid.org/0000-0003-4888-9673 Tyson Dawson] 
[https://orcid.org/0000-0003-2299-1426 Cameron Sabet] 
[https://orcid.org/0000-0001-5703-5667 Yukta Chidanandan] 
[https://orcid.org/0000-0002-2577-3240 Valery Simonyan] 
[https://orcid.org/0000-0002-0457-7056 Nicole Post] 
[https://orcid.org/0000-0002-9007-8746 Ben Osborne] 
[https://orcid.org/0000-0001-9721-3181 Sophie Halkett] 
[https://orcid.org/0000-0003-1181-8118 Miguel Mazumder] 

'''Questions / Comments'''

If you have any questions or comments regarding GutFeelingKB, please contact Raja Mazumder (mazumder@gwmail.gwu.edu).
 
 

=== CensuScope ===
Coming soon!
 
 

=== slimNT ===
Coming soon!

Dataset Resources

2025-08-29T14:29:41Z

Jkeeney:

<h2>HIVE Team Datasets</h2>


<h2>BCO HCV</h2>

We demonstrated that the use of the IEEE 2791-2020 Standard, (BioCompute objects [BCO]) enables complete and concise communication of NGS data analysis results. One arm of a clinical trial4 was replicated using synthetically generated data made to resemble real biological data. Two separate, independent analyses were then carried out using BCOs as the tool for communication of analysis: one to simulate a pharmaceutical regulatory submission to the FDA, and another to simulate the FDA review. The two results were compared and tabulated for concordance analysis: of the 118 simulated patient samples generated, the final results of 117 (99.15%) were in agreement. This high concordance rate demonstrates the ability of a BCO, when a verification kit is included, to effectively capture and clearly communicate NGS analyses within regulatory submissions. BCO promotes transparency and induces reproducibility, thereby reinforcing trust in the regulatory submission process.

<ul>
<li>Base line 1: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
<li>Base line 2: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
<li>Treatment Failure 1: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
<li>Treatment Failure 2: Version: 1.0, Size: 48 MB, Format: FASTQ</li>
</ul>
<h3>Full Dataset Download:</h3>
<ul>
<li>manifest.json: Version: 1.0, Size: 163K, Format: JSON</li>
<li>hcvALL.zip: Version: 1.0, Size: 4.4G, Format: ZIP</li>
</ul>
Citation: https://doi.org/10.1101/2020.12.07.415059


<h2>GFKB</h2>

Gut feeling knowledgebase is a reference database of healthy human gut microbiome. It is generated by a metagenomic analysis pipeline described in our paper https://doi.org/10.1371/journal.pone.0206484, and includes three tools which are integrated in the HIVE platform. 49 healthy samples sequenced at GWU and 49 healthy samples taken from The Human Microbiome Project were analyzed to create GutFeelingKB.

<ul>
<li>Version: 4.0</li>
<li>Citation: https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535</li>
<li>Downloads Link: https://hivelab.biochemistry.gwu.edu/wiki/Gfkb</li>
</ul>


<h2>Polyester Simulated RNA-seq Reads for Chromosome 22</h2>


Simulated RNA-seq reads were generated using the R package polyester for Chromosome 22 of the human reference genome GRCh38. Two samples were generated, with each sample containing a unique 2 transcripts that are expressed at 20 fold higher than normal to serve as positive controls. These reads can be used for testing RNA-seq analysis pipelines and to gauge any variability an analysis has on validating the 20 fold difference of the positive control transcripts between samples.


<ul style="font-weight: normal;">
<li>Version: 1.0 More info: [https://bioconductor.org/packages/release/bioc/html/polyester.html Bioconductor polyester page]</li>
<li>[sample_01_01.fasta]: Forward reads for sample 1. Size: 1.7 GB, Format: FASTA</li>
<li>[sample_01_2.fasta]: Reverse reads for sample 1. Size: 1.7 GB, Format: FASTA</li>
<li>[sample_02_01.fasta]: Forward reads for sample 2. Size: 1.8 GB, Format: FASTA</li>
<li>[sample_02_02.fasta]: Reverse reads for sample 2. Size: 1.8 GB, Format: FASTA</li>
<li>[sim_tx_info.txt]: Summary of fold changes per transcript. Size: 142 KB, Format: TXT</li>
</ul style="font-weight: normal;">

Gfkb

2025-08-29T14:18:36Z

Jkeeney:

<table width=1100><tr><td>
<h5>Gutfeeling KnowledgeBase</h5> We have developed a proof-of-concept gut microbiome monitoring system prototype using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

 We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''HIVE Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.
 
 

'''Current Slim NT database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

'''Current Slim NT taxonomy database:'''

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz
 
 
'''Publications'''

[https://doi.org/10.1186/1471-2164-15-918 Shamsaddini ''et al''.]

[https://doi.org/10.1371/journal.pone.0099033 Santana-Quintero ''et al''.]

[https://doi.org/10.1093/database/baw022 Simonyan ''et al''.]

[https://doi.org/10.3390/genes5040957 Simonyan V, Mazumder R]
 
 
'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Gfkb

2025-08-29T14:06:31Z

Jkeeney: Added links to current SlimNT files.

<table width=1100><tr><td>
<h5>Gutfeeling KnowledgeBase</h5> We have developed a proof-of-concept gut microbiome monitoring system prototype using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).

 We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

'''HIVE Metagenomics Pipeline'''

We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

'''Publications'''

<a target="_blank" href="https://doi.org/10.1186/1471-2164-15-918">Shamsaddini ''et al''.</a>

<a target="_blank" href="https://doi.org/10.1371/journal.pone.0099033">Santana-Quintero ''et al''.</a>

<a target="_blank" href="https://doi.org/10.1093/database/baw022">Simonyan ''et al''.</a>

<a target="_blank" href="https://doi.org/10.3390/genes5040957">Simonyan V, Mazumder R</a>

Current Slim NT database:

https://hive.biochemistry.gwu.edu/static/slimNT.fa.gz

Current Slim NT taxonomy database:

https://hive.biochemistry.gwu.edu/static/slimNT.db.gz

'''Funding'''

LOI_ID#L02496974, NSF_Lineage_Award #1546491
</td></tr></table>

Gfkb

2025-08-27T18:26:48Z

Jkeeney: Created page with "<table width=1100><tr><td> <h5>Gutfeeling KnowledgeBase</h5> We have developed a proof-of-concept gut microbiome monitoring system prototype using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below). We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questio..."

<table width=1100><tr><td>
 
<h5>Gutfeeling KnowledgeBase</h5>
 

We have developed a proof-of-concept gut microbiome monitoring system prototype using a sequencing and analysis pipeline implemented during our previous I-Corps award (see below).
 
We have collected from the individuals enrolled in our study the following: three separate fecal samples for metagenomic sequencing, anthropometric measurements, a diet history questionnaire, gastrointestinal symptoms questionnaires, perceived stress questionnaires, physical activity questionnaires, and sleep questionnaires. We have also begun the analysis of fecal samples from the Human Microbiome Project and the associated metadata. The integration of this data into a single knowledgebase of comparable samples using our optimized pipeline will provide the real value of our prototype.

 
HIVE Metagenomics Pipeline
 
We use a two-step pipeline for metagenomic analysis; CensuScope and Hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB.
Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing
Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing.

 
Publications
 
<a target="_blank" href="https://doi.org/10.1186/1471-2164-15-918">Shamsaddini et al.</a>
 
<a target="_blank" href="https://doi.org/10.1371/journal.pone.0099033">Santana-Quintero et al.</a>
 
<a target="_blank" href="https://doi.org/10.1093/database/baw022">Simonyan et al.</a>
 
<a target="_blank" href="https://doi.org/10.3390/genes5040957">Simonyan V, Mazumder R</a>
 

 
Funding
 
LOI_ID#L02496974, NSF_Lineage_Award #1546491



</td></tr></table>

2025 Bioinformatics Symposium

2025-04-21T19:07:23Z

Jkeeney: Update discussion panel

{{DISPLAYTITLE: 2025 Bioinformatics Symposium}}

'''Title''': 2025 Inaugural GW Bioinformatics Symposium

'''When''': April 29th 2025, 9am to 6pm

'''Venue''': Talks: SEH B1220. Refreshments, lunch, and posters: Green wall area (SEH B1167)

'''Join us for a full-day, all-hands GW Bioinformatics Symposium featuring posters, talks, and roundtable discussions. Open to GW students, staff, and faculty!'''

'''REGISTRATION IS CLOSED. Event Registration:''' Space is limited. Please register by '''April 12th. 2025''' for the event through this '''<big>[https://docs.google.com/forms/d/e/1FAIpQLSd_VfXIL_S59cVgOBxx_0b0E-wMBphWbBVuK6-JOSm9-cqiJA/viewform?usp=sharing form]</big>'''. If you encounter any issues, please email Raja Mazumder (mazumder@gwu.edu) your name and the lab you are representing and he will register you.

'''ABSTRACT SUBMISSION IS CLOSED. Abstract submission:''' Please submit your abstract by '''April 12th. 2025''' through [https://cri-datacap.org/surveys/?s=8LFM3TKDWY338KDC '''<big>REDCap</big>''']

'''OPT-OUT LUNCH PICKUP.''' We’re excited by the overwhelming response. Over 120 participants from nearly all GW schools have signed up. Please note that seating in the main room is limited to the first 90 attendees. We encourage you to arrive early to secure a seat. For those who arrive later, we’re working to set up a spillover room with TV monitors so everyone can still follow the sessions. Lunch will be provided for all attendees who plan to stay for most of the day. If you’re attending only briefly and do not plan to pick up lunch, please let us know to help minimize food waste. You can either fill out this [https://docs.google.com/forms/d/e/1FAIpQLScWaw6JFsbtQgcdsk9XCZQMJ0mMb3Z0UR5uHZnTZ1DgBYJlDQ/viewform?usp=header form] or email mazumder@gwu.edu. We appreciate your understanding and cooperation.

== Abstract/Overview ==

The GW Bioinformatics Symposium on April 29, 2025, a full-day event is designed to bring together faculty, staff, and student bioinformatics researchers and also researchers who use bioinformatics in their labs, from across GW to foster networking, collaboration, and knowledge exchange. The symposium will feature talks from GW labs that focus on bioinformatics and related research, poster presentations, roundtable discussions, and sessions on resources, funding and career opportunities in bioinformatics. Topics will span bioinformatics, computational methods, IT/security, and training, highlighting the breadth of bioinformatics in various GW schools and centers. This event offers a unique opportunity for attendees to engage in meaningful discussions, explore potential collaborations, and stay informed about the latest advancements in the field. The symposium is a great way to connect with the GW bioinformatics community.

== Poster ==
Participants are invited to submit a brief poster abstract by March 31st at 11:59 PM (ET). We encourage submissions from bioinformatics labs and also other labs that do not primarily focus on bioinformatics but have research relevant to bioinformatics topics. A select few will be chosen for lightning talks. Due to the limited number of poster boards, priority will be given to ensure each lab/group has at least one designated board. If the number of submitted poster abstracts exceeds the available poster boards, additional posters may be printed as flyers with QR codes, enabling attendees to scan, view, or download them electronically.

'''Size:''' Poster sizes can be up to 42 (width) x 36 (height) inches.

'''Poster Abstract Submission Portal:''' [https://cri-datacap.org/surveys/?s=8LFM3TKDWY338KDC Click here].

'''Poster Printing Instructions'''

Download the poster template from GW Research Day Resources: [https://guides.himmelfarb.gwu.edu/ResearchDay/poster-design-layout Poster Design & Layout].

After you create the PPT for your poster, request free poster printing from Gelman Library [https://library.gwu.edu/3-d-and-large-format-printing using this form].

Submit your printing request by April 15th.

== Schedule ([https://hivelab.biochemistry.gwu.edu/wiki/2025_Bioinformatics_Symposium#Talk_Titles Talk Titles]) ==

{| class="wikitable"
|+
!Time
!Duration
!Topic
!Presenter(s)
|-
| colspan="4" |'''Morning Session'''
'''Topics: Registration, introduction, and health-related topics.'''
|-
|8:30 - 9:00 AM
|30 min
|Registration & coffee
Lead: Raechelle McCants, Jewel Dias

* Registration
* Coffee
* Poster Setup
* Slide/AV setup & check
|
|-
|9 - 11:00 AM
|120 min
|Welcome
Rong Li (Chair, Dept. BMM, SMHS)

Alison Hall (Senior Assoc Dean for Res, SMHS)

Raja Mazumder

Anelia Horvath

Talks (12 mins + questions)

Session chairs: Anelia Horvath, Raja Mazumder
|Anelia Horvath (Biochemistry)
Ljubica Caldovic (Children’s)

Dae Young Kim (Children’s; Muhammad Rahman lab)

Seth Berger (Children’s)

Raja Mazumder (Biochemistry)

Yi-Wen Chen (Children’s)

Aintzane Santaquiteria Gil (Biology, Orti Lab)

''*Marc Garbey (Neurology)''

Two 5 mins Poster Flash Talks (TBD)
|-
|11 - 11:15 AM
|15 min
|Refreshment Break.
|
|-
|11:15 - 12:30 PM
|75 min
|Talks (12 mins + questions)
Session chairs: Ali Rahnavard, Ljubica Caldovic
|''Jo Lynne Rokita (Children’s)''
Max Alekseyev (Milken)

Erika Hubbard (Crandall lab; Milken)

Ali R. Taheriyoun (Rahnavard Lab; Milken)

Hiroki Morizono (Children’s)

One 5 mins Poster Flash Talk (TBD)
|-
|12:30 - 2 PM
|90 min
|Lunch and poster session
Lead: Anelia Horvath, Raechelle McCants, Jewel Dias
|'''Poster Judging Committee:'''
Anelia Horvath

Ali Rahnavard

Hiroki Morizono

Yi-Wen Chen

Jimmy Saw
|-
| colspan="4" |'''Afternoon Session'''
'''Topics: Breadth of bioinformatics in biological research; IT/security; Training'''
|-
|2:00 - 3:30 PM
|90 min
|Talks (12 mins + questions)
Session Chairs: Howie Huang, Jimmy Saw, Chen Zeng
|Mohammad Hammas Saeed (Howie Huang Lab, Engineering)
Nan Wu (ECE, Engineering)

Aya Zirikly (Computer Science, GW/JHU)

Chen Zeng (Physics)

Weiqun Peng (Physics)

Shekhar Nagar (Jimmy Saw Lab, Biology)

Two 5 mins Poster Flash Talks

|-
|3: 30 - 4:30 PM
|60 min
|Session Chair: Jonathon Keeney.
Co-chairs: Hiroki Morizono, Anelia Horvath

* 5 min. intros: IT, omics support, and related topics

* Questions from audence
** Round table discussion
** Careers in bioinformatics
** Funding opportunities
|Clark Gaylord (Director, Research Technology Services)
Brian Choi (MFA)

Anelia Horvath (MGPC core/Bioinformatics support)

Jack Villani (GW Genomics Core)

Ali Rahnavard (CBI Analytics)

Hiroki Morizono (Children's)
|-
|4:30 - 6:00 PM
|
|Networking event, poster prizes, and refreshments
|Anelia Horvath
Jonathon Keeney

Raja Mazumder

Keith Crandall
|}
<nowiki>*</nowiki>Talk titles TBD

== Presentation/Discussion Sessions ==
There will be a Q&A session and a networking event at the end of the workshop.

== Scientific Organizing Committee ==

Raja Mazumder (Symposium Chair), Anelia Horvath, Hiroki Morizono, Ljubica Caldovic, Keith Crandall, Jorge Sepulveda, Howie Huang, Chen Zeng, Jimmy Saw, Clark Gaylord.

== Logistics Organizing Committee ==

Raja Mazumder, Anelia Horvath, Raechelle McCants, Jewel Das. Student volunteers: Jane, Sofia, Allison, Chloe, Trupri and Lincoln.

== Talk Titles ==
{| class="wikitable"
|+
!Name
!Department
!School
!Title
|-
|Jo Lynne Rokita
|Pediatrics
|CNH
|Accelerating discovery and target identification for pediatric brain tumors through open-source platforms and tools
|-
|Erika Hubbard
|Bioinformatics and Biostatistics
|SPH
|Machine Learning to Determine Endotypes of Lupus
|-
|Raja Mazumder
|Biochemistry and Molecular Medicine
|SMHS
|Integrating Biomedical Knowledgebases and Clinical Data for ML/AI-Powered Insights
|-
|Jack Villani
|GW Genomics Core
|SPH
|GW Genomics Core: An Introduction & Overview (panel discussion)
|-
| Ayah Zirikly || Computer Science || SEAS/Johns Hopkins University || Developments in NLP and AI for Mental Health: Insights from the Last Decade and Future Directions – A Focus on the CLPsych Workshop
|-
| Weiqun Peng || Department of Physics || CCAS || Finding structures and their associated functions in genome wide of profiles of chromatin architecture
|-
| Nan Wu || Electrical and Computer Engineering || SEAS || Directed Graph Representation Learning for Circuits, Boolean Networks, and Beyond
|-
| Seth Berger || Biochemistry and Molecular Medicine / Pediatrics || SMHS || Blindspots in Clinical Genetic Testing: Integration of Multiomics to Improve Diagnostic Yields
|-
| Ali Reza Taheriyoun || Biostatistics and Bioinformatics || SPH || Dynamics of Gut Microbiome and Metabolome of Moderate and Severe Obesity Patients Under Sleeve Gastrectomy
|-
| Yi-Wen Chen || Biochemistry and Molecular Medicine / Pediatrics || SMHS || From gene to treatment: omics approaches for understanding facioscapulohumeral muscular dystrophy
|-
| Mohammad Saeed || Computer Science || SEAS || Biases in AI-Driven Healthcare: Challenges and Implications for Clinical Decision-Making
|-
| Shekhar Nagar || Biological Sciences || CCAS || Metabolic flexibility and dissemination of antibiotic resistomes from Actinobacteria in Hawaii hydrothermal steam vents
|-
|Dae Young Kim
|Center for Translational Research
|CNH
|mhGPT: A Lightweight Domain-Specific Language Model for Mental Health Analysis
|-
|Anelia Horvath
|Biochemistry and Molecular Medicine
|SMHS
|AI driven Functional SNV Discovery from long read Single-Cell RNA-Seq Data
|-
|Ljubica Caldovic
|Center for Genetic Medicine Research
|CNH
|Active Learning of Data Science and Bioinformtics
|-
|Max Alekseyev
|Mathematics / Biostatistics & Bioinformatics
|CCAS/SPH
|Bioinformatics Meets Quantum Informatics: from Genome Rearrangements to Weingarten Calculus
|-
|Aintzane Santaquiteria Gil
|Department of Biological Sciences
|CCAS
|Using comparative genomics to link genes with convergently evolved traits.
|-
|Chen Zeng
|Department of Physics
|CCAS
|Modeling RNA-protein Interactions with network guided machine learning
|-
|Mohammad Hammas Saeed
|Electrical and Computer Engineering
|SEAS
|AI for Good: Leveraging Graph-Based Methods and Large Language Models to Address Real-World Challenges
|-
|Hiroki Morizono
|Center for Genetic Medicine Research
|CNH
|Biomedical data resources at Children's National
|-
|Marc Garbey/Henry Kaminski
|Neurology & Rehabilitation Medicine
|MFA
|Moving towards a digital twin for myasthenia gravis ''(tentative title)''
|}

== Acknowledgments ==

Sponsors: Dept. of Biochemistry and Molecular Medicine (coffee, refreshments, lunch, poster prizes), IBS (poster boards), Milken Institute School of Public Health (happy hour, poster prizes).

== Contact ==
'''For questions about registration, abstract submission or general inquiries, please contact:'''

Raja Mazumder: mazumder@gwu.edu

== Poster Presentations ==
{| class="wikitable"
!Poster Number
! Name !! Presentation Title
|-
|1
| Sunisha Harish || AI-Driven Drug Response Prediction in Cancer Using Long-Read Single-Cell RNA-Seq
|-
|2
| Dae Young Kim || mhGPT: A Lightweight Domain-Specific Language Model for Mental Health Analysis
|-
|3
| Vania Ballesteros Prieto || Uncovering the Contributions of Expressed Genetic Variants, Isoforms, and RNA Editing to Tumor Heterogeneity via Long-Read Single-Cell RNA-Seq Analysis
|-
|4
| Sarah Tiufekchiev-Grieco || Promoting Resolution of Inflammation as a Potential Therapy for DMD
|-
|5
| Karli Gilbert || Machine Learning Models Predict Treatment Outcome from Serum Proteins in Patients with Myasthenia Gravis that received Thymectomy
|-
|6
| Reny Mathew || Identification of anti-helminthic drug resistance associated Quantitative trait loci (QTLs) in the canine hookworm, Ancylostoma caninum: A pooled-sequencing approach
|-
|7
| Jo Lynne Rokita || Accelerating discovery and target identification for pediatric brain tumors through open-source platforms and tools
|-
|8
| Henry Kaminski || Moving towards a digital twin for myasthenia gravis
|-
|9
| Huai Chin Chiang || Single-Cell Transcriptomic and Phenotypic Profiling Reveals T Cell Dysfunction in BRCA1 Mutation Carriers
|-
|10
| Lori Krammer || GW-FEAST: a federated ecosystem for data analysis and machine learning
|-
|11
| Medha Kurukunda || Analyzing the Use of Artificial Intelligence to Enhance the Identification of Food Insecure Areas in Washington, D.C.
|-
|12
| Christie Rose Woodside || Bridging Genomics and Preparedness: Regulatory-Grade Genomics and Quality Control Metrics and Analysis for Emerging and Circulating Avian Influenza in 2024-2025
|-
|13
| Aiste Gulla, MD, PhD || Clinical Outcomes and Long-Term Survival of Pancreatic Cancers by Histological Sub-Type in the Epic Cosmos Database: Results from 2010-2025
|-
|14
| Jane Ulianova || Comparison of alignment performance between the T2T-CHM13 and GRCh38/hg38 reference genome assemblies for RNAseq
|-
|15
| Zhe Yu || Automated Tracking of Freezing Behavior in Paired House Mice Using DeepLabCut
|-
|16
| Zhe Yu || Behavioral Bioinformatics for Temporal Analysis of Freezing Behavior in Dyad Mice
|-
|17
| Karim Ismat || Generation of a single nuclei RNA sequencing atlas of dysferlin-deficient skeletal muscle
|-
|18
| Kai Leung (Adam) Wong || An Experience of carrying out GPU-accelerated Genomic Analysis on Pegasus
|-
|19
| Gabriel Batzli || Defining macrophage heterogeneity in murine skin wounds during inflammation
|-
|20
| Hovhannes Arestakesyan || Recurrent Somatic scSNVs in Single-Cell RNA-Seq: Insights into Tumor Heterogeneity and RNA-Level Variants
|-
|21
| Chloe Sachs || Secretome distinguishes spectrum of NF1 associated peripheral nerve sheath tumors
|-
|22
| Nikhil Arethiya || A Time-Series Approach to Glucose-Based Participant Classification
|-
|23
| Siera Martinez || Hetero-GNN Link Prediction of RNA Editing in Single Cells
|-
|24
| Renxi Li || Thirty-day outcomes of infrainguinal bypass surgery with concurrent iliac artery stenting in patients with chronic limb-threatening ischemia
|-
|25
| Matthew Mollerus || ResLens: Detecting Antibiotic Resistance Genes with Large Language Models
|-
|26
| Parimala Nagaraj || Cybersecurity at the Intersection of Genomics and Data Science: Securing the Future of Bioinformatics
|-
|27
| Shekhar Nagar || Metabolic flexibility and dissemination of antibiotic resistomes from Actinobacteria in Hawaii hydrothermal steam vents
|-
|28
| Cristina Fenollar Ferrer || Functional impact of PIP2 on the Serotonin Transporter (SERT)
|-
|29
| Ali Taheriyoun || Dynamics of gut microbiome and metabolome of obesity patients under sleeve gastrectomy
|-
|30
| Irene Zohn || Next Generation sequencing approaches to understand developmental defects
|-
|31
| Max Alekseyev || Bioinformatics meets Quantum Informatics: from genome rearrangements to Weingarten calculus
|-
|32
| Lausanne Lee Oliver || Phylogenetic analysis of novel phages from Hawaiian fumaroles
|-
|33
|Mahdi Baghbanzadeh
|seqLens: optimizing language models for genomic predictions
|-
|34
|Dezhao Fu
|varLens - enhancers genetic testing using language models
|-
|35
|Lilly Shaw
|Uncovering Shared and Unique Biomarkers Across 23 Cancer Types Using The Cancer Genome Atlas (TCGA)
|-
|36
|Daniall Masood
|BiomarkerKB: A Comprehensive Biomarker Knowledgebase
|-
|37
|Ljubica Caldovic
|Active Learning of Data Science and Bioinformatics
|-
|38
|Urnisha Bhuiyan
|GlyGen: A Comprehensive Resource for Glycoscience Data Integration and Discovery
|-
|39
|Surajit Bhattacharya
|Redefining Human Airway Biology in Children from The Top Down: Unique Features of the Nasal Airway Epithelium.
|-
|40
|Anelia Horvath
|A Machine Learning Approach to Functional SNV Discovery via Isoform-Aware Single-Cell RNA-Seq
|-
|41
|Christie Rose Woodside
|Enhanced QC Metrics for Reference-Grade Genomic Data
|-
|42
|Yi-Wen Chen
|From gene to treatment: omics approaches for understanding facioscapulohumeral muscular dystrophy
|-
|43
|Pia Sen
|Investigating the role of bacteriophage diversity in Hawaiian steam vent microbial communities
|-
|44
|Emily Williams*
|TRIM28 regulates endogenous retroviral element expression in prostate cancer
|-
|45
|Alexander Thiersch
|Patient-centric approaches for antipsychotic medication research: an application of the Desirability of Outcome Ranking (DOOR) and Global Benefit-Risk (GBR) Score
|-
|46
|Cadina Powell
|Be Smart And Use Smartphones for Telemedicine: Narrative Review
|-
|47
|Xinyang Zhang
|Meta-analytic microbiome target discovery for immune checkpoint inhibitor response in advanced melanoma
|-
|48
|Cyrus Chun Hong Au Yeung
|Leveraging Large Language Models for Scalable Glycan-Disease Relation Extraction
|-
|49
|Ashley Garrison
|Gut Microbiome Composition as an Indicator of Preclinical Alzheimer's Disease
|-
|50
|Chelcie Puetz
|Combined Neuroinflammatory and Neurovascular Molecular Screening for Early Detection of Blood-Brain Barrier Dysfunction in Patients with Traumatic Brain Injury
|}

2025 Bioinformatics Symposium

2025-04-15T18:01:07Z

Jkeeney: /* Schedule (Talk Titles) */ Removed Adam Ciarleglio per his request.

{{DISPLAYTITLE: 2025 Bioinformatics Symposium}}

'''Title''': 2025 Inaugural GW Bioinformatics Symposium

'''When''': April 29th 2025, 9am to 6pm

'''Venue''': Talks: SEH B1220. Refreshments, lunch, and posters: Green wall area (SEH B1167)

'''Join us for a full-day, all-hands GW Bioinformatics Symposium featuring posters, talks, and roundtable discussions. Open to GW students, staff, and faculty!'''

'''REGISTRATION IS CLOSED. Event Registration:''' Space is limited. Please register by '''April 12th. 2025''' for the event through this '''<big>[https://docs.google.com/forms/d/e/1FAIpQLSd_VfXIL_S59cVgOBxx_0b0E-wMBphWbBVuK6-JOSm9-cqiJA/viewform?usp=sharing form]</big>'''. If you encounter any issues, please email Raja Mazumder (mazumder@gwu.edu) your name and the lab you are representing and he will register you.

'''ABSTRACT SUBMISSION IS CLOSED. Abstract submission:''' Please submit your abstract by '''April 12th. 2025''' through [https://cri-datacap.org/surveys/?s=8LFM3TKDWY338KDC '''<big>REDCap</big>''']

'''OPT-OUT LUNCH PICKUP.''' We’re excited by the overwhelming response. Over 120 participants from nearly all GW schools have signed up. Please note that seating in the main room is limited to the first 90 attendees. We encourage you to arrive early to secure a seat. For those who arrive later, we’re working to set up a spillover room with TV monitors so everyone can still follow the sessions. Lunch will be provided for all attendees who plan to stay for most of the day. If you’re attending only briefly and do not plan to pick up lunch, please let us know to help minimize food waste. You can either fill out this [https://docs.google.com/forms/d/e/1FAIpQLScWaw6JFsbtQgcdsk9XCZQMJ0mMb3Z0UR5uHZnTZ1DgBYJlDQ/viewform?usp=header form] or email mazumder@gwu.edu. We appreciate your understanding and cooperation.

== Abstract/Overview ==

The GW Bioinformatics Symposium on April 29, 2025, a full-day event is designed to bring together faculty, staff, and student bioinformatics researchers and also researchers who use bioinformatics in their labs, from across GW to foster networking, collaboration, and knowledge exchange. The symposium will feature talks from GW labs that focus on bioinformatics and related research, poster presentations, roundtable discussions, and sessions on resources, funding and career opportunities in bioinformatics. Topics will span bioinformatics, computational methods, IT/security, and training, highlighting the breadth of bioinformatics in various GW schools and centers. This event offers a unique opportunity for attendees to engage in meaningful discussions, explore potential collaborations, and stay informed about the latest advancements in the field. The symposium is a great way to connect with the GW bioinformatics community.

== Poster ==
Participants are invited to submit a brief poster abstract by March 31st at 11:59 PM (ET). We encourage submissions from bioinformatics labs and also other labs that do not primarily focus on bioinformatics but have research relevant to bioinformatics topics. A select few will be chosen for lightning talks. Due to the limited number of poster boards, priority will be given to ensure each lab/group has at least one designated board. If the number of submitted poster abstracts exceeds the available poster boards, additional posters may be printed as flyers with QR codes, enabling attendees to scan, view, or download them electronically.

'''Size:''' Poster sizes can be up to 42 (width) x 36 (height) inches.

'''Poster Abstract Submission Portal:''' [https://cri-datacap.org/surveys/?s=8LFM3TKDWY338KDC Click here].

'''Poster Printing Instructions'''

Download the poster template from GW Research Day Resources: [https://guides.himmelfarb.gwu.edu/ResearchDay/poster-design-layout Poster Design & Layout].

After you create the PPT for your poster, request free poster printing from Gelman Library [https://library.gwu.edu/3-d-and-large-format-printing using this form].

Submit your printing request by April 15th.

== Schedule ([https://hivelab.biochemistry.gwu.edu/wiki/2025_Bioinformatics_Symposium#Talk_Titles Talk Titles]) ==

{| class="wikitable"
|+
!Time
!Duration
!Topic
!Presenter(s)
|-
| colspan="4" |'''Morning Session'''
'''Topics: Registration, introduction, and health-related topics'''
|-
|8:30 - 9:00 AM
|30 min
|Registration & coffee
Lead: Raechelle McCants, Sunisha Harish

* Registration
* Coffee
* Poster Setup
* Slide/AV setup & check
|
|-
|9 - 11:00 AM
|120 min
|Welcome
Rong Li (Chair, Dept. BMM, SMHS)

Alison Hall (Senior Assoc Dean for Res, SMHS)

Raja Mazumder

Anelia Horvath

Talks

Session chairs: Anelia Horvath, Raja Mazumder
|Anelia Horvath (Biochemistry)
Ljubica Caldovic (Children’s)

Dae Young Kim (Children’s; Muhammad Rahman lab)

Seth Berger (Children’s)

Raja Mazumder (Biochemistry)

Yi-Wen Chen (Children’s)

Guillermo Orti (Biology)

''Marc Garbey (Neurology)''

''Additional presenters TBD''
|-
|11 - 11:15 AM
|15 min
|Refreshment Break.
|
|-
|11:15 - 12:30 PM
|75 min
|Talks
Session chairs: Ali Rahnavard, Ljubica Caldovic
|''Jo Lynne Rokita (Children’s)''
Max Alekseyev (Milken)

Erika Hubbard (Crandall lab; Milken)

Ali R. Taheriyoun (Rahnavard Lab; Milken)

Hiroki Morizono (Children’s)

''Additional presenters TBD''
|-
|12:30 - 2 PM
|90 min
|Lunch and poster session
Lead: Raechelle McCants, Jewel Dias
|'''Poster Judging Committee:'''
Ali Rahnavard

Hiroki Morizono

Yi-Wen Chen

Jimmy Saw
|-
| colspan="4" |'''Afternoon Session'''
'''Topics: Breadth of bioinformatics in biological research; IT/security; Training'''
|-
|2:00 - 3:30 PM
|90 min
|Talks
Session Chairs: Howie Huang, Jimmy Saw, Chen Zeng
|Howie Huang (Engineering)
Nan Wu (ECE, Engineering)

Aya Zirikly (Computer Science, GW/JHU)

Chen Zeng (Physics)

Weiqun Peng (Physics)

Xiangyun Qiu (Physics)

Shekhar Nagar (Jimmy Saw Lab, Biology)

''Some speakers might be moved to the morning sessions''

|-
|3: 30 - 4:30 PM
|60 min
|Session Chair: Jonathon Keeney.
Co-chairs: Hiroki Morizono, Anelia Horvath
* Talks on IT, omics support, and related topics
* Round table discussion
* Careers in bioinformatics
* Funding opportunities
* Poster awards
|Clark Gaylord (Director, Research Technology Services)
Brian Choi (MFA)

Anelia Horvath (MGPC core/Bioinformatics support)

Jack Villani (GW Genomics Core)

Ali Rahnavard (CBI Analytics)
|-
|4:30 - 6:00 PM
|
|Networking event, poster prizes, and refreshments
|Keith Crandall
|}

== Presentation/Discussion Sessions ==
There will be a Q&A session and a networking event at the end of the workshop.

== Scientific Organizing Committee ==

Raja Mazumder (Symposium Chair), Anelia Horvath, Hiroki Morizono, Ljubica Caldovic, Keith Crandall, Jorge Sepulveda, Howie Huang, Chen Zeng, Jimmy Saw, Clark Gaylord.

== Logistics Organizing Committee ==

Raja Mazumder, Anelia Horvath, Raechelle McCants, Jewel Das. Student volunteers: Jane, Sofia, Allison, Chloe, Trupri and Lincoln.

== Talk Titles ==
{| class="wikitable"
|+
!Name
!Department
!School
!Title
|-
|Jo Lynne Rokita
|Pediatrics
|CNH
|Accelerating discovery and target identification for pediatric brain tumors through open-source platforms and tools
|-
|Erika Hubbard
|Bioinformatics and Biostatistics
|SPH
|Machine Learning to Determine Endotypes of Lupus
|-
|Raja Mazumder
|Biochemistry and Molecular Medicine
|SMHS
|Integrating Biomedical Knowledgebases and Clinical Data for ML/AI-Powered Insights
|-
|Jack Villani
|GW Genomics Core
|SPH
|GW Genomics Core: An Introduction & Overview (panel discussion)
|-
| Ayah Zirikly || Computer Science || SEAS/Johns Hopkins University || Developments in NLP and AI for Mental Health: Insights from the Last Decade and Future Directions – A Focus on the CLPsych Workshop
|-
| Weiqun Peng || Physics || CCAS || Finding structures and their associated functions in genome wide of profiles of chromatin architecture
|-
| Nan Wu || Electrical and Computer Engineering || SEAS || Directed Graph Representation Learning for Circuits, Boolean Networks, and Beyond
|-
| Seth Berger || Biochemistry and Molecular Medicine / Pediatrics || SMHS || Blindspots in Clinical Genetic Testing: Integration of Multiomics to Improve Diagnostic Yields
|-
| Ali Reza Taheriyoun || Biostatistics and Bioinformatics || SPH || Dynamics of Gut Microbiome and Metabolome of Moderate and Severe Obesity Patients Under Sleeve Gastrectomy
|-
| Yi-Wen Chen || Biochemistry and Molecular Medicine / Pediatrics || SMHS || From gene to treatment: omics approaches for understanding facioscapulohumeral muscular dystrophy
|-
| Mohammad Saeed || Computer Science || SEAS || Biases in AI-Driven Healthcare: Challenges and Implications for Clinical Decision-Making
|-
| Shekhar Nagar || Biological Sciences || CCAS || Metabolic flexibility and dissemination of antibiotic resistomes from Actinobacteria in Hawaii hydrothermal steam vents
|-
|Dae Young Kim
|Center for Translational Research
|CNH
|mhGPT: A Lightweight Domain-Specific Language Model for Mental Health Analysis
|-
|Anelia Horvath
|Biochemistry and Molecular Medicine
|SMHS
|AI driven Functional SNV Discovery from long read Single-Cell RNA-Seq Data
|}

== Acknowledgments ==

Sponsors: Dept. of Biochemistry and Molecular Medicine (coffee, refreshments, lunch, poster prizes), IBS (poster boards), Milken Institute School of Public Health (happy hour, poster prizes).

== Contact ==
'''For questions about registration, abstract submission or general inquiries, please contact:'''

Raja Mazumder: mazumder@gwu.edu

== Poster Presentations ==
{| class="wikitable"
!Poster Number
! Name !! Presentation Title
|-
|1
| Sunisha Harish || AI-Driven Drug Response Prediction in Cancer Using Long-Read Single-Cell RNA-Seq
|-
|2
| Dae Young Kim || mhGPT: A Lightweight Domain-Specific Language Model for Mental Health Analysis
|-
|3
| Vania Ballesteros Prieto || Uncovering the Contributions of Expressed Genetic Variants, Isoforms, and RNA Editing to Tumor Heterogeneity via Long-Read Single-Cell RNA-Seq Analysis
|-
|4
| Sarah Tiufekchiev-Grieco || Promoting Resolution of Inflammation as a Potential Therapy for DMD
|-
|5
| Karli Gilbert || Machine Learning Models Predict Treatment Outcome from Serum Proteins in Patients with Myasthenia Gravis that received Thymectomy
|-
|6
| Reny Mathew || Identification of anti-helminthic drug resistance associated Quantitative trait loci (QTLs) in the canine hookworm, Ancylostoma caninum: A pooled-sequencing approach
|-
|7
| Jo Lynne Rokita || Accelerating discovery and target identification for pediatric brain tumors through open-source platforms and tools
|-
|8
| Henry Kaminski || Moving towards a digital twin for myasthenia gravis
|-
|9
| Huai Chin Chiang || Single-Cell Transcriptomic and Phenotypic Profiling Reveals T Cell Dysfunction in BRCA1 Mutation Carriers
|-
|10
| Lori Krammer || GW-FEAST: a federated ecosystem for data analysis and machine learning
|-
|11
| Medha Kurukunda || Analyzing the Use of Artificial Intelligence to Enhance the Identification of Food Insecure Areas in Washington, D.C.
|-
|12
| Christie Rose Woodside || Bridging Genomics and Preparedness: Regulatory-Grade Genomics and Quality Control Metrics and Analysis for Emerging and Circulating Avian Influenza in 2024-2025
|-
|13
| Aiste Gulla, MD, PhD || Clinical Outcomes and Long-Term Survival of Pancreatic Cancers by Histological Sub-Type in the Epic Cosmos Database: Results from 2010-2025
|-
|14
| Jane Ulianova || Comparison of alignment performance between the T2T-CHM13 and GRCh38/hg38 reference genome assemblies for RNAseq
|-
|15
| Zhe Yu || Automated Tracking of Freezing Behavior in Paired House Mice Using DeepLabCut
|-
|16
| Zhe Yu || Behavioral Bioinformatics for Temporal Analysis of Freezing Behavior in Dyad Mice
|-
|17
| Karim Ismat || Generation of a single nuclei RNA sequencing atlas of dysferlin-deficient skeletal muscle
|-
|18
| Kai Leung (Adam) Wong || An Experience of carrying out GPU-accelerated Genomic Analysis on Pegasus
|-
|19
| Gabriel Batzli || Defining macrophage heterogeneity in murine skin wounds during inflammation
|-
|20
| Hovhannes Arestakesyan || Recurrent Somatic scSNVs in Single-Cell RNA-Seq: Insights into Tumor Heterogeneity and RNA-Level Variants
|-
|21
| Chloe Sachs || Secretome distinguishes spectrum of NF1 associated peripheral nerve sheath tumors
|-
|22
| Nikhil Arethiya || A Time-Series Approach to Glucose-Based Participant Classification
|-
|23
| Siera Martinez || Hetero-GNN Link Prediction of RNA Editing in Single Cells
|-
|24
| Renxi Li || Thirty-day outcomes of infrainguinal bypass surgery with concurrent iliac artery stenting in patients with chronic limb-threatening ischemia
|-
|25
| Matthew Mollerus || ResLens: Detecting Antibiotic Resistance Genes with Large Language Models
|-
|26
| Parimala Nagaraj || Cybersecurity at the Intersection of Genomics and Data Science: Securing the Future of Bioinformatics
|-
|27
| Shekhar Nagar || Metabolic flexibility and dissemination of antibiotic resistomes from Actinobacteria in Hawaii hydrothermal steam vents
|-
|28
| Cristina Fenollar Ferrer || Functional impact of PIP2 on the Serotonin Transporter (SERT)
|-
|29
| Ali Taheriyoun || Dynamics of gut microbiome and metabolome of obesity patients under sleeve gastrectomy
|-
|30
| Irene Zohn || Next Generation sequencing approaches to understand developmental defects
|-
|31
| Max Alekseyev || Bioinformatics meets Quantum Informatics: from genome rearrangements to Weingarten calculus
|-
|32
| Lausanne Lee Oliver || Phylogenetic analysis of novel phages from Hawaiian fumaroles
|-
|33
|Mahdi Baghbanzadeh
|seqLens: optimizing language models for genomic predictions
|-
|34
|Dezhao Fu
|varLens - enhancers genetic testing using language models
|-
|35
|Lilly Shaw
|Uncovering Shared and Unique Biomarkers Across 23 Cancer Types Using The Cancer Genome Atlas (TCGA)
|-
|36
|Daniall Masood
|BiomarkerKB: A Comprehensive Biomarker Knowledgebase
|-
|37
|Ljubica Caldovic
|Active Learning of Data Science and Bioinformatics
|-
|38
|Urnisha Bhuiyan
|GlyGen: A Comprehensive Resource for Glycoscience Data Integration and Discovery
|-
|39
|Surajit Bhattacharya
|Redefining Human Airway Biology in Children from The Top Down: Unique Features of the Nasal Airway Epithelium.
|-
|40
|Anelia Horvath
|A Machine Learning Approach to Functional SNV Discovery via Isoform-Aware Single-Cell RNA-Seq
|-
|41
|Christie Rose Woodside
|Enhanced QC Metrics for Reference-Grade Genomic Data
|-
|42
|Yi-Wen Chen
|From gene to treatment: omics approaches for understanding facioscapulohumeral muscular dystrophy
|-
|43
|Pia Sen
|Investigating the role of bacteriophage diversity in Hawaiian steam vent microbial communities
|}

Publications

2025-02-13T21:30:44Z

Jkeeney: Added two preprints.

<h2>HIVE Platform Publications</h2>

<ul>
Please cite use of HIVE with
<li>Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. [https://www.ncbi.nlm.nih.gov/pubmed/25271953 PMID: 25271953]</li>
<li>Simonyan V, Chumakov K, Dingerdissen H, et al. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis. Database (Oxford). 2016; 2016:baw022. [https://www.ncbi.nlm.nih.gov/pubmed/26989153 PMID: 26989153]</li>
</ul>

<h2>HIVE Team Publications</h2>
<ul>
<li>Keeney ''et al''. Olduvai domain expression downregulates mitochondrial pathways: implications for human brain evolution and neoteny. October 22, 2024. https://doi.org/10.1101/2024.10.21.619278</li>
<li>Clarke ''et al''. Playbook Workflow Builder: Interactive Construction of Bioinformatics Workflows from a Network of Microservices. June 09, 2024. https://doi.org/10.1101/2024.06.08.598037</li>
<li>Martinez K, Agirre J, Akune Y, Aoki-Kinoshita KF, Arighi C, Axelsen KB, Bolton E, Bordeleau E, Edwards NJ, Fadda E, Feizi T, Hayes C, Ives CM, Joshi HJ, Krishna Prasad K, Kossida S, Lisacek F, Liu Y, Lütteke T, Ma J, Malik A, Martin M, Mehta AY, Neelamegham S, Panneerselvam K, Ranzinger R, Ricard-Blum S, Sanou G, Shanker V, Thomas PD, Tiemeyer M, Urban J, Vita R, Vora J, Yamamoto Y, Mazumder R. Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. Database (Oxford). 2024 Aug 13;2024:baae073. [https://pubmed.ncbi.nlm.nih.gov/39137905/ PMID: 39137905].</li>
<li>Kim S, Mazumder R. Enhancing scientific reproducibility through automated BioCompute Object creation using Retrieval-Augmented Generation from publications. Computer Science, Computation and Language. https://doi.org/10.48550/arXiv.2409.15076</li>
<li>Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://www.ncbi.nlm.nih.gov/pubmed/38313584 PMID: 38313584].</li>
<li>Keeney JG, Gulzar N, Baker JB, Klempir O, Hannigan GD, Bitton DA, Maritz JM, King CHS 4th, Patel JA, Duncan P, Mazumder R. Communicating computational workflows in a regulatory environment. Drug Discov Today. 2024 Jan 12; 103884. [https://www.ncbi.nlm.nih.gov/pubmed/38219969 PMID: 38219969].</li>
<li>Sylvetsky AC, Clement RA, Stearrett N, Issa NT, Dore FJ, Mazumder R, King CH, Hubal MJ, Walter PJ, Cai H, Sen S, Rother KI, Crandall KA. Consumption of sucralose and acesulfame-potassium containing diet soda alters the relative abundance of microbial taxa at the species level: findings of two pilot studies. Appl Physiol Nutr Metab. 2024 Jan 1; 49(1):125-134. [https://www.ncbi.nlm.nih.gov/pubmed/37902107 PMID: 37902107].</li>
<li>Vora J, Navelkar R, Vijay-Shanker K, Edwards N, Martinez K, Ding X, Wang T, Su P, Ross K, Lisacek F, Hayes C, Kahsay R, Ranzinger R, Tiemeyer M, Mazumder R. The glycan structure dictionary-a dictionary describing commonly used glycan structure terms. Glycobiology. 2023 Feb 17; cwad014 [https://www.ncbi.nlm.nih.gov/pubmed/36799723 PMID: 36799723].</li>
<li>Lisacek F, Tiemeyer M, Mazumder R, Aoki-Kinoshita KF. Worldwide Glycoscience Informatics Infrastructure: The GlySpace Alliance. JACS Au. eCollection 2023 Jan 23; [https://www.ncbi.nlm.nih.gov/pubmed/36711080 PMID: 36711080].</li>
<li>Datta Chaudhuri R, Datta R, Rana S, Kar A, Vinh Nguyen Lam P, Mazumder R, Mohanty S, Sarkar S. Cardiomyocyte-specific regression of nitrosative stress-mediated S-Nitrosylation of IKKγ alleviates pathological cardiac hypertrophy. Cell Signal. 2022 Oct; 98:110403 [https://www.ncbi.nlm.nih.gov/pubmed/35835332 PMID: 35835332].</li>
<li>Dahlin M, Singleton SS, David JA, Basuchoudhary A, Wickström R, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumour necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. Cell Signal. 2022 ; eBioMedicine (part of The Lancet discovery science) [https://www.ncbi.nlm.nih.gov/pubmed/35598439 PMID: 35598439].</li>
<li>Lyman DF, Bell A, Black A, Dingerdissen H, Cauley E, Gogate N, Liu D, Joseph A, Kahsay R, Crichton DJ, Mehta A, Mazumder R. Modeling and integration of N-glycan biomarkers in a comprehensive biomarker data model. Glycobiology. August 2022; [https://academic.oup.com/glycob/article/32/10/855/6655823?login=false 35925813].</li>
<li>Torcivia J, Abdilleh K, Seidl F, Shahzada O, Rodriguez R, Pot D, Mazumder R. Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers. Onco (Basel). June 2022; 2(2):129-144. [https://www.ncbi.nlm.nih.gov/pubmed/37841494 PMID: 37841494].</li>
<li>Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://doi.org/10.1016/j.ebiom.2022.104061 https://doi.org/10.1016/j.ebiom.2022.104061].</li>
<li>King CH, Keeney J, Guimera N, Das S, Weber M, Fochtman B, Walderhaug MO, Talwar S, Patel JA, Mazumder R, Donaldson EF. Communicating regulatory high-throughput sequencing data using BioCompute Objects. Drug Discov Today. 2022 Jan 22; [https://www.ncbi.nlm.nih.gov/pubmed/35077912 PMID: 35077912].</li>
<li>Wang Z, Hopson L, Singleton S, Yang X, Jogunoori W, Mazumder R, Obias V, Lin P, Nguyen BN, Yao M, Miller L, White J, Rao S, Mishra L. Mice with dysfunctional TGF-β signaling develop altered intestinal microbiome and colorectal cancer resistant to 5FU. Biochim Biophys Acta Mol Basis Dis. 2021 Oct 1; 1867(10):166179. [https://www.ncbi.nlm.nih.gov/pubmed/34082069 PMID: 34082069].</li>
<li>Lyman D, Natale D, Schriml L, Anton K, Crichton DC, Mazumder R. Analysis of Biomarker Data Towards Development of a Molecular Biomarker Ontology. Proceedings of the International Conference on Biomedical Ontologies 2021 (ICBO 2021) co-located with the Workshop on Ontologies for the Behavioural and Social Sciences (OntoBess 2021) as part of the Bolzano Summer of Knowledge (BOSK 2021) Bozen-Bolzano, Italy. 2021 Sep 16-18; [https://ceur-ws.org/Vol-3073/paper13.pdf https://ceur-ws.org/Vol-3073/paper13.pdf].</li>
<li>Patel JA, Dean DA, King CH, Xiao N, Koc S, Minina E, Golikov A, Brooks P, Kahsay R, Navelkar R, Ray M, Roberson D, Armstrong C, Mazumder R, Keeney J. Bioinformatics tools developed to support BioCompute Objects. Database (Oxford). 2021 March 31; [https://www.ncbi.nlm.nih.gov/pubmed/33784373 PMID: 33784373].</li>
<li>Hora B, Gulzar N, Chen Y, Karagiannis K, Cai F, Su C, Smith K, Simonyan V, Shah SA, Ahmed M, Sanchez AM, Stone M, Cohen MS, Denny TN, Mazumder R, Gao F. Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing. mSphere. 2020 Oct 14; [https://www.ncbi.nlm.nih.gov/pubmed/33055255 PMID: 33055255].</li>
<li>Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://www.ncbi.nlm.nih.gov/pubmed/33814114 PMID: 33814114].</li>
<li>Torcivia J, Mazumder R. Scanning window analysis of non-coding regions within normal-tumor whole-genome sequence samples. Briefings in Bioinformatics. 2020 Sep 17; [https://www.ncbi.nlm.nih.gov/pubmed/32940334 PMID: 32940334].</li>
<li>Gogate N, Lyman D, Bell A, Cauley E, Crandall KA, Joseph A, Kahsay R, Natale DA, Schriml LM, Sen S, Mazumder R. COVID-19 biomarkers and their overlap with comorbidities in a disease biomarker data model. Brief Bioinform. 2021 May 20; bbab191. doi: 10.1093/bib/bbab191. [https://www.ncbi.nlm.nih.gov/pubmed/34015823 PMID: 34015823].</li>
<li>Kahsay R, Vora J, Navelkar R, Mousavi R, Fochtman BC, Holmes X, Pattabiraman N, Ranzinger R, Mahadik R, Williamson T, Kulkarni S, Agarwal G, Martin M, Vasudev P, Garcia L, Edwards N, Zhang W, Natale DA, Ross K, Aoki-Kinoshita KF, Campbell MP, York WS, Mazumder R. GlyGen data model and processing workflow. Bioinformatics. 2020; [https://www.ncbi.nlm.nih.gov/pubmed/32324859 PMID: 32324859].</li>
<li>Kurnat-Thoma E, Baranova A, Baird P, Brodsky E, Butte AJ, Cheema AK, Cheng F, Dutta S, Grant C, Giordano J, Maitland-van der Zee AH, Fridsma DB, Jarrin R, Kann MG, Keeney J, Loscalzo J, Madhavan G, Maron BA, McBride DK, McKean M, Mun SK, Palmer JC, Patel B, Parakh K, Pariser AR, Pristipino C, Radstake TRDJ, Rajasimha HK, Rouse WB, Rozman D, Saleh A, Schmidt HHHW, Schultz N, Sethi T, Silverman EK, Skopac J, Svab I, Trujillo S, Valentine JE, Verma D, West BJ, Vasudevan S. Recent Advances in Systems and Network Medicine: Meeting Report from the First International Conference in Systems and Network Medicine. Syst Med (New Rochelle). 2020; 3(1):22-35. [https://www.ncbi.nlm.nih.gov/pubmed/32226924 PMID: 32226924].</li>
<li>Dingerdissen HM, Bastian F, Vijay-Shanker K, Robinson-Rechavi M, Bell A, Gogate N, Gupta S, Holmes E, Kahsay R, Keeney J, Kincaid H, King CH, Liu D, Crichton DJ, Mazumder R. OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data. JCO Clin Cancer Inform. 2020; 4:210-220. [https://www.ncbi.nlm.nih.gov/pubmed/32142370 PMID: 32142370].</li>
<li>Aoki-Kinoshita KF, Lisacek F, Mazumder R, York WS, Packer NH. The GlySpace Alliance: toward a collaborative global glycoinformatics community. Glycobiology. 2020; 30(2):70-71. [https://www.ncbi.nlm.nih.gov/pubmed/31573039 PMID: 31573039].</li>
<li>York WS, Mazumder R, Ranzinger R, et al. GlyGen: Computational and Informatics Resources for Glycoscience. Glycobiology. 2019. https://doi.org/10.1093/glycob/cwz080 [https://www.ncbi.nlm.nih.gov/pubmed/31616925 PMID: 31616925].</li>
<li>King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://www.ncbi.nlm.nih.gov/pubmed/31509535 PMID: 31509535].</li>
<li>Fan Y, Hu Y, Yan C, Goldman R, Pan Y, Mazumder R, Dingerdissen H. Loss and gain of N-linked glycosylation sequons due to single-nucleotide variation in cancer. Scientific Reports. PLoS One. 2018; 8():4322. [https://www.ncbi.nlm.nih.gov/pubmed/29531238 PMID: 29531238].</li>
<li>Baekdoo Kim, Thahmina Ali, Changsu Dong, Carlos Lijeron, Raja Mazumder, Claudia Wultsch, and Konstantinos Krampis. miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines. Journal of Computational Biology. 2018. http://doi.org/10.1089/cmb.2018.0218</li>
<li>Alterovitz G, Dean D A, Goble C, Crusoe M R, Soiland-Reyes S, Bell A, Hayes A, King, C H S, Taylor D, Johanson E, Thompson E E, Donaldson E, Morizono H, Tsang H S, Goecks J, Yao J, Almeida J S, Krampis K, Guo L, Walderhaug M, Walsh P, Kahsay R, Gottipati S, Bloom T, Lai Y, Simonyan V, Mazumder R. Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results. PLOS Biology; 16(12): e3000099. 2018. https://doi.org/10.1371/journal.pbio.3000099</li>
<li>Hu Y, Dingerdissen H, Gupta S, Kahsay R, Shanker V, Wan Q, Yan C, Mazumder R. Identification of key differentially expressed MicroRNAs in cancer patients through pan-cancer analysis. Computers in Biology and Medicine 2018; vol: 103 pp: 183-197. [https://www.ncbi.nlm.nih.gov/pubmed/30384176 PMID: 30384176].</li>
<li>Dingerdissen H, Torcivia-Rodriguez J, Hu Y, Chang T-C, Mazumder R, Kahsay R. BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery. Nucleic Acids Research. 2017. [https://pubmed.ncbi.nlm.nih.gov/30053270/ PMID: 30053270].</li>
<li>Karagiannis K, Simonyan V, Chumakov K, Mazumder R. Separation and assembly of deep sequencing data into discrete sub-population genomes. Nucleic Acids Research. 45(19):10989-11003. 2017. [https://www.ncbi.nlm.nih.gov/pubmed/28977510 PMID: 28977510].</li>
<li>Chen J, Zaidi S, Rao S, Chen J-S, Phan L, Farci P, Su X, Shetty K, White J, Zamboni F, Wu X, Rashid A, Pattabiraman N, Mazumder R, Horvath A, Wu R-C, Li S, Xiao C, Deng C-X, Wheeler D A, Mishra B, Akbani R, Mishra L. Analysis of Genomes and Transcriptomes of Hepatocellular Carcinomas Identifies Mutations and Gene Expression Changes in the Transforming Growth Factor beta Pathway. Gastroenterology. 2017; S0016-5085(17)36144-9. [https://www.ncbi.nlm.nih.gov/pubmed/28918914 PMID: 28918914].</li>
<li>Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics. 2017; 18(1):391. [https://www.ncbi.nlm.nih.gov/pubmed/28865429 PMID: 28865429].</li>
<li>Gannavaram S, Torcivia J, Gasparyan L, Kaul A, Ismail N, Simonyan V, Nakhasi HL. Whole genome sequencing of live attenuated Leishmania donovani parasites reveals novel biomarkers of attenuation and enables product characterization. Sci Rep. 2017; 7(1):4718. [https://www.ncbi.nlm.nih.gov/pubmed/28680050 PMID: 28680050].</li>
<li>Simonyan V, Chumakov K, Donaldson E, Karagiannis K, Lam PV, Dingerdissen H, Voskanian A. HIVE-heptagon: A sensible variant-calling algorithm with post-alignment quality controls. Genomics. 2017; 109(3-4):131-140. [https://www.ncbi.nlm.nih.gov/pubmed/28188908 PMID: 28188908].</li>
<li>Pan Y, Yan C, Fan Y, Pan Q, Wan Q, Torcivia-Rodriquez J, Mazumder R. Distribution bias analysis of germline and somatic single-nucleotide variations that impact protein functional site and neighboring amino acids. Scientific Reports. 2017; 7:42169 [https://www.ncbi.nlm.nih.gov/pubmed/28176830 PMID: 28176830].</li>
<li>Gulzar N, Dingerdissen H, Yan C, Mazumder R. Impact of Nonsynonymous Single-Nucleotide Variations on Post-Translational Modification Sites in Human Proteins. Methods Mol Biol. 2017; 1558:159-190. [https://www.ncbi.nlm.nih.gov/pubmed/28150238 PMID: 28150238].</li>
<li>Simonyan V, Goecks J, Mazumder R. BioCompute objects - a step towards evaluation and validation of bio-medical scientific computations. PDA J Pharm Sci Technol. 2017; 71(2):136-146 [https://www.ncbi.nlm.nih.gov/pubmed/27974626 PMID: 27974626].</li>
<li>Yan C, Pattabiraman N, Goecks J, Lam P, Nayak A, Pan Y, Torcivia-Rodriquez J, Voskanian A, Wan Q, Mazumder R. Impact of germline and somatic missense variations on drug binding sites. Pharmacogenomics J. 2017; 17(2):128-136 [https://www.ncbi.nlm.nih.gov/pubmed/26810135 PMID: 26810135].</li>
<li>Novatt H, Theisen TC, Massie T, Simonyan V, Voskanian-Kordi A, Renn LA, Rabin RL. Distinct Patterns of Expression of Transcription Factors in Response to Interferon Beta and Interferon lambda-1. J Interferon Cytokine Res. 2016; 36(10):589-598 [https://www.ncbi.nlm.nih.gov/pubmed/27447339 PMID: 27447339].</li>
<li>Chen C, Huang H, Mazumder R, Natale DA, McGarvey PB, Zhang J, Poison SW, Wang Y, Wu CH, UniProt Consortium. Computational clustering for viral reference proteomes. Bioinformatics. 2016; 32(13):2041-3 [https://www.ncbi.nlm.nih.gov/pubmed/27153712 PMID: 27153712].</li>
<li>Mahmood AS, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A text-mining system for mutation-disease association extraction. PLoS One. 2016; 11(4):e0152725 [https://www.ncbi.nlm.nih.gov/pubmed/27073839 PMID: 27073839].</li>
<li>Goldweber S, Theodore J, Torcivia-Rodriquez J, Simonyan V, Mazumder R. Pubcast and Genecast: Browsing and exploring publications and associated curated content in biology through mobile devices. IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(2):498-500 [https://www.ncbi.nlm.nih.gov/pubmed/28113865 PMID: 28113865].</li>
<li>Laassri M, Zagorodnyaya T, Plant EP, Petrovskaya S, Bidzhieva B, Ye Z, Simonyan V, Chumakov K. Deep Sequencing for Evaluation of Genetic Stability of Influenza A/California/07/2009 (H1N1) Vaccine Viruses. PLoS One. 2015; 10(9):e0138650. [https://www.ncbi.nlm.nih.gov/pubmed/26407068 PMID: 26407068].</li>
<li>Sauder CJ, Ngo L, Simonyan V, Cong Y, Zhang C, Link M, Malik T, Rubin SA. Generation and propagation of recombinant mumps viruses exhibiting an additional U residue in the homopolymeric U tract of the F gene-end signal. Virus Genes. 2015; 51(1):12-24. [https://www.ncbi.nlm.nih.gov/pubmed/25962759 PMID: 25962759].</li>
<li>Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D, Mitraka E, Pan Y, Smith KM, Srivastava S, Ward S, Yan C, Mazumder R. Generating a focused view of Disease Ontology cancer terms for pan-cancer data integration and analysis. Database (Oxford). 2015; 2015:bav032. [https://www.ncbi.nlm.nih.gov/pubmed/25841438 PMID: 25841438].</li>
<li>Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu T-J, Yang C, Zhang H, Mazumder R. BioXpress: An integrated RNA-seq derived gene expression database for pan-cancer analysis. Database (Oxford). 2015; 2015. pii: bav019 [https://www.ncbi.nlm.nih.gov/pubmed/25819073 PMID: 25819073].</li>
<li>Kumari P, Mazumder R, Simonyan V, Krampis K. Advantages of distributed and parallel algorithms that leverage Cloud Computing platforms for large-scale genome assembly. F1000Research. 2015; 4(20). [https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&context=smhs_biochem_facpubs https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&context=smhs_biochem_facpubs].</li>
<li>Abunimer A, Dingerdissen H, Torcivia-Rodriguez J, Vinh Nguyen Lam P, Mazumder R. Non-synonymous Single-Nucleotide Variations as Cardiovascular System Disease Biomarkers and Their Roles in Bridging Genomic and Proteomic Technologies. Biomarkers in Cardiovascular Disease. 2015. [https://link.springer.com/referenceworkentry/10.1007/978-94-007-7741-5_40-1 Springer Nature link].</li>
<li>Adhikari S, Chetram MA, Woodrick J, Mitra PS, Manthena PV, Khatkar P, Dakshanamurthy S, Dixon M, Karmahapatra SK, Nuthalapati NK, Gupta S, Narasimhan G, Mazumder R, Loffredo CA, Uren A, Roy R. Germ-line variants of human N-methylpurine DNA glycosylase show impaired DNA repair activity and facilitate 1,N6 ethenoadenine induced mutations. J Biol Chem. 2014; 290(8):4966-80. [https://www.ncbi.nlm.nih.gov/pubmed/25538240 PMID: 25538240].</li>
<li>Wilson CA and Simonyan V. FDA's Activities Supporting Regulatory Application of "Next Gen" Sequencing Technologies. PDA J Pharm Sci Technol. 2014; 68(6):626-630. [https://www.ncbi.nlm.nih.gov/pubmed/25475637 PMID: 25475637].</li>
<li>Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014; 15(1):918. [https://www.ncbi.nlm.nih.gov/pubmed/25336203 PMID: 25336203].</li>
<li>Pan Y, Karagiannis K, Zhang H, Dingerdissen H, Shamsaddini A, Wan Q, Simonyan V, Mazumder R. Human germline and pan-cancer variomes and their distinct functional profiles. Nucleic Acids Research. 2014; 42(18):11570-88. [https://www.ncbi.nlm.nih.gov/pubmed/25232094 PMID: 25232094].</li>
<li>Nayak A, Pattabiraman N, Fadra N, Goldman R, Pond S, Mazumder R. Structure-function analysis of hepatitis C virus envelope glycoproteins E1 and E2. J Biomol Struct Dyn. 2014; 33(8):1682-94. [https://www.ncbi.nlm.nih.gov/pubmed/25245635 PMID: 25245635].</li>
<li>Faison WJ, Rostovtsev A, Castro-Nallar E, Crandall KA, Chumakov K, Simonyan V, Mazumder R. Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics. 2014; 104(1):1-7. [https://www.ncbi.nlm.nih.gov/pubmed/24930720 PMID: 24930720].</li>
<li>Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014; 9(6):e99033. [https://www.ncbi.nlm.nih.gov/pubmed/24918764 PMID: 24918764].</li>
<li>Dingerdissen H, Weaver DS, Karp PD, Pan Y, Simonyan V, Mazumder R. A framework for application of metabolic modeling in yeast to predict the effects of nsSNV in human orthologs. Biol Direct. 2014; 9:9. [https://www.ncbi.nlm.nih.gov/pubmed/24894379 PMID: 24894379].</li>
<li>Bidzhieva B, Zagorodnyaya T, Karagiannis K, Simonyan V, Laassri M, Chumakov K. Deep sequencing approach for genetic stability evaluation of influenza A viruses. J Virol Methods. 2014; 199(68):75. [https://www.ncbi.nlm.nih.gov/pubmed/24406624 PMID: 24406624].</li>
<li>Abunimer A, Smith K, Wu T-J, Lam P, Simonyan V, Mazumder R. Single-nucleotide variations in cardiac arrhythmias: prospects for genomics and proteomics based variation detection. Genes. 2014; 5(2):254-69. [https://www.ncbi.nlm.nih.gov/pubmed/24705329 PMID: 24705329].</li>
<li>Wu T-J, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R. A framework for organizing cancer related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). Database. 2014; 2014:bau022. [https://www.ncbi.nlm.nih.gov/pubmed/24667251 PMID: 24667251].</li>
<li>Dabrazhynetskaya A, Soika V, Volokhov D, Simonyan V, Chizhikov V. Genome Sequence of Mycoplasma hyorhinis Strain DBS 1050. Genome Announce. 2014; 2(2):pii: e00127-14. [https://www.ncbi.nlm.nih.gov/pubmed/24604646 PMID: 24604646].</li>
<li>Cole C, Krampis K, Karagiannis K, Almeida J, Faison JW, Motwani M, Wan Q, Golikov A, Pan Y, Simonyan V, Mazumder R. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics. 2014; 15:28. [https://www.ncbi.nlm.nih.gov/pubmed/24467687 PMID: 24467687].</li>
<li>Mudvari P, Kowsari K, Cole C, Mazumder R, Horvath A. Extraction of molecular features through exome to transcriptome alignment. J Metabol Sys Biol. 2013; 1(1):7. [https://www.ncbi.nlm.nih.gov/pubmed/24791251 PMID: 24791251].</li>
<li>Basuchoudhary A, Simonyan V, Mazumder R. Community annotation and the evolution of cooperation: How patience matters. Open Bioinformatics Journal. 2013; 7:9-18.</li>
<li>Karagiannis K, Simonyan V, Mazumder R. SNVDis: A Proteome-wide Analysis Service for Evaluating nsSNVs in Protein Functional Sites and Pathways. Genomics Proteomics Bioinformatics. 2013; 11(2):122-126. [https://www.ncbi.nlm.nih.gov/pubmed/23618375 PMID: 23618375].</li>
<li>Lam PV, Goldman R, Karagiannis K, Narsule T, Simonyan V, Soika V, Mazumder R. Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes. Genomics Proteomics Bioinformatics. 2013; 11(2):96-104. [https://www.ncbi.nlm.nih.gov/pubmed/23459159 PMID: 23459159].</li>
<li>Dingerdissen H, Motwani M, Karagiannis K, Simonyan V, Mazumder R. Proteome-wide analysis of nonsynonymous single-nucleotide variations in active sites of human proteins. FEBS J. 2013; 280(6):1542-1562. [https://www.ncbi.nlm.nih.gov/pubmed/23350563 PMID: 23350563].</li>
<li>Gaudet P, Arighi C, Bastian F, Bateman A, Blake JA, Cherry MJ, D'Eustachio P, Finn R, Giglio M, Hirschman L, Kania R, Klimke W, Martin MJ, Karsch-Mizrachi I, Munoz-Torres M, Natale D, O'Donovan C, Ouellette F, Pruitt KD, Robinson-Rechavi M, Sansone SA, Schofield P, Sutton G, Van Auken K, Vasudevan S, Wu C, Young J, Mazumder R. Recent advances in biocuration: meeting report from the Fifth International Biocuration Conference. Database (Oxford). 2012; 2012:bas036. [https://www.ncbi.nlm.nih.gov/pubmed/23110974 PMID: 23110974].</li>
<li>Volokhov DV, Simonyan V, Davidson MK, Chizhikov VE. RNA polymerase beta subunit (rpoB) gene and the 16S-23S rRNA intergenic transcribed spacer region (ITS) as complementary molecular markers in addition to the 16S rRNA gene phylogenetic analysis and identification of the species of the family Mycoplasmataceae. Mol Phylogenet Evol. 2012; 62(1):515-28. [https://www.ncbi.nlm.nih.gov/pubmed/22115576 PMID: 22115576].</li>
<li>Mazumder R, Morampudi KS, Motwani M, Vasudevan S, Goldman R. Proteome-wide analysis of single-nucleotide variations in the N-glycosylation sequon of human genes. PLoS One. 2012; 7(5):e36212. [https://www.ncbi.nlm.nih.gov/pubmed/22586465 PMID: 22586465].</li>
</ul>