Volunteership Spring 2026

2026-03-27T17:19:33Z

Lorikrammer: /* Recent Publications: */

PredictMod Publications & Multimedia

2026-03-27T17:18:55Z

Lorikrammer: /* MultiMedia */

<small> Go Back to the [[PredictMod|PredictMod Home Page]]. </small>

== PredictMod Publications ==

* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].

== MultiMedia ==
* '''Talk Data Podcast | MDClone''' Featuring Lori Krammer | Published March 4th, 2026 <br/>[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&utm_medium=member_desktop&rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts]. This video is part of the [[PredictMod|PredictMod Project]].

* '''Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo''' Featuring Stephanie Singleton, Edited by James Ziegler | Published December 8th, 2020 <br />[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo]. View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

* '''VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE''' Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler | Published April 20th, 2021 <br />[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

PredictMod Publications & Multimedia

2026-03-27T17:18:30Z

Lorikrammer: /* MultiMedia */

<small> Go Back to the [[PredictMod|PredictMod Home Page]]. </small>

== PredictMod Publications ==

* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].

== MultiMedia ==
* '''Talk Data Podcast | MDClone''' Featuring Lori Krammer | Published March 4th, 2026 <br/>[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&utm_medium=member_desktop&rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts]. This video is part of the [[PredictMod|PredictMod Project]].
* '''Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo''' Featuring Stephanie Singleton, Edited by James Ziegler | Published December 8th, 2020 <br />[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo]. View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

* '''VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE''' Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler | Published April 20th, 2021 <br />[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

PredictMod Publications & Multimedia

2026-03-27T17:17:29Z

Lorikrammer: /* MultiMedia */

<small> Go Back to the [[PredictMod|PredictMod Home Page]]. </small>

== PredictMod Publications ==

* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].

== MultiMedia ==
* '''Talk Data Podcast | MDClone''' Featuring Lori Krammer | Published March 4th, 2026 [https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&utm_medium=member_desktop&rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts]. This video is part of the [[PredictMod|PredictMod Project]].
* '''Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo''' Featuring Stephanie Singleton, Edited by James Ziegler | Published December 8th, 2020 <br />[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo]. View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

* '''VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE''' Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler | Published April 20th, 2021 <br />[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

PredictMod Publications & Multimedia

2026-03-12T17:32:09Z

Lorikrammer:

<small> Go Back to the [[PredictMod|PredictMod Home Page]]. </small>

== PredictMod Publications ==

* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].

== MultiMedia ==
* '''Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo''' Featuring Stephanie Singleton, Edited by James Ziegler Published December 8th, 2020 <br />[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo]. View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

* '''VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE''' Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler Published April 20th, 2021 <br />[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials]. This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].

PredictMod

2026-03-09T13:58:19Z

Lorikrammer:

GW-FEAST

2026-03-09T13:54:58Z

Lorikrammer:

PredictMod

2026-03-09T13:54:25Z

Lorikrammer:

Publications

2026-03-09T13:53:59Z

Lorikrammer:

All publications listed on this page should follow a modified National Library of Medicine (NLM) citation format, adapted for clarity and consistency. Here is the suggested format:<blockquote>''Author(s). Title of article. Journal Name. Year Month Day;Volume(Issue):Page range. PMID: [if available] DOI: [if no PMID]''</blockquote>Some guidelines:

* If a PubMed ID (PMID) is available, include it and omit the DOI.
* If no PMID is available, include the DOI instead.
* Journal names should be spelled out in full unless the journal is widely recognized by its acronym (e.g., ''PLoS'').
* Use full publication dates when available (e.g., 2025 Mar 28); if only the year is known, include the year alone.
* Include all author names in the order listed in the publication.
<h2>HIVE Platform Publications</h2>

<ul>
<p>Please cite use of HIVE with</p>
<li>Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. [https://www.ncbi.nlm.nih.gov/pubmed/25271953 PMID: 25271953]</li>
<li>Simonyan V, Chumakov K, Dingerdissen H, et al. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis. Database (Oxford). 2016; 2016:baw022. [https://www.ncbi.nlm.nih.gov/pubmed/26989153 PMID: 26989153]</li>
</ul>

<h2>HIVE Team Publications</h2>
<ul>
<li>Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].</li>
<li>Mazumder R, Keeney J, Johnson L, Krammer L, McNeely P, Sepulveda J, Hangen D, Martin M, Jyothi D, De Almeida J, McGarvey P, Alaoui A, Cha S, Sedrakyan A, Shoelle E, Matheny M, LeNoue-Newton M, Winter R, Deppen S, Simonyan V, Horvath A. From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research. J Am Med Inform Assoc. 2026 Jan 20:ocag001. Epub ahead of print. [https://pubmed.ncbi.nlm.nih.gov/41556955/ PMID: 41556955].</li>
<li>Krammer L, McNeely PM, Bhuiyan U, Singleton SS, Arethiya N, Argaw A, Aggarwal V, Basuchoudhary A, Mazumder M, David J, Agrawal S, Sen S, Mazumder R. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. Network and Systems Medicine''.'' 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007].</li>
<li>Kahsay R, Bhuiyan U, Au CCH, Edwards N, Johnson L, Kulkarni S, Martinez K, Ranzinger R, Vijay-Shanker K, Vora J, Warner K, Tiemeyer M, Mazumder R. GlycoSiteMiner: an ML/AI-assisted literature mining-based pipeline for extracting glycosylation sites from PubMed abstracts. Glycobiology. 2025 May 22. [https://pubmed.ncbi.nlm.nih.gov/40401984/ PMID: 40401984].</li>
<li>Aoki-Kinoshita KF, Lisacek F, Mazumder R, Ranzinger R, Tiemeyer M, Yamada I, Packer NH. Meeting report of the GlySpace alliance and GaLSIC symposium. Glycobiology. 2025 Mar 28:cwaf019. [https://pubmed.ncbi.nlm.nih.gov/40156285/ PMID: 40156285].</li>
<li>Clarke DJB, Evangelista JE, Xie Z, Marino GB, Byrd AI, Maurya MR, Srinivasan S, Yu K, Petrosyan V, Roth ME, Milinkov M, King CH, Vora JK, Keeney J, Nemarich C, Khan W, Lachmann A, Ahmed N, Agris A, Pan J, Ramachandran S, Fahy E, Esquivel E, Mihajlovic A, Jevtic B, Milinovic V, Kim S, McNeely P, Wang T, Wenger E, Brown MA, Sickler A, Zhu Y, Jenkins SL, Blood PD, Taylor DM, Resnick AC, Mazumder R, Milosavljevic A, Subramaniam S, Ma'ayan A. Playbook workflow builder: Interactive construction of bioinformatics workflows. PLoS Comput Biol. 2025 Apr 3;21(4):e1012901. [https://pubmed.ncbi.nlm.nih.gov/40179105/ PMID: 40179105].</li>
<li>Keeney ''et al''. Olduvai domain expression downregulates mitochondrial pathways: implications for human brain evolution and neoteny. October 22, 2024. bioRxiv. https://doi.org/10.1101/2024.10.21.619278</li>
<li>Martinez K, Agirre J, Akune Y, Aoki-Kinoshita KF, Arighi C, Axelsen KB, Bolton E, Bordeleau E, Edwards NJ, Fadda E, Feizi T, Hayes C, Ives CM, Joshi HJ, Krishna Prasad K, Kossida S, Lisacek F, Liu Y, Lütteke T, Ma J, Malik A, Martin M, Mehta AY, Neelamegham S, Panneerselvam K, Ranzinger R, Ricard-Blum S, Sanou G, Shanker V, Thomas PD, Tiemeyer M, Urban J, Vita R, Vora J, Yamamoto Y, Mazumder R. Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. Database (Oxford). 2024 Aug 13;2024:baae073. [https://pubmed.ncbi.nlm.nih.gov/39137905/ PMID: 39137905].</li>
<li>Kim S, Mazumder R. Enhancing scientific reproducibility through automated BioCompute Object creation using Retrieval-Augmented Generation from publications. Computer Science, Computation and Language. https://doi.org/10.48550/arXiv.2409.15076</li>
<li>Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://www.ncbi.nlm.nih.gov/pubmed/38313584 PMID: 38313584].</li>
<li>Keeney JG, Gulzar N, Baker JB, Klempir O, Hannigan GD, Bitton DA, Maritz JM, King CHS 4th, Patel JA, Duncan P, Mazumder R. Communicating computational workflows in a regulatory environment. Drug Discov Today. 2024 Jan 12; 103884. [https://www.ncbi.nlm.nih.gov/pubmed/38219969 PMID: 38219969].</li>
<li>Sylvetsky AC, Clement RA, Stearrett N, Issa NT, Dore FJ, Mazumder R, King CH, Hubal MJ, Walter PJ, Cai H, Sen S, Rother KI, Crandall KA. Consumption of sucralose and acesulfame-potassium containing diet soda alters the relative abundance of microbial taxa at the species level: findings of two pilot studies. Appl Physiol Nutr Metab. 2024 Jan 1; 49(1):125-134. [https://www.ncbi.nlm.nih.gov/pubmed/37902107 PMID: 37902107].</li>
<li>Vora J, Navelkar R, Vijay-Shanker K, Edwards N, Martinez K, Ding X, Wang T, Su P, Ross K, Lisacek F, Hayes C, Kahsay R, Ranzinger R, Tiemeyer M, Mazumder R. The glycan structure dictionary-a dictionary describing commonly used glycan structure terms. Glycobiology. 2023 Feb 17; cwad014 [https://www.ncbi.nlm.nih.gov/pubmed/36799723 PMID: 36799723].</li>
<li>Lisacek F, Tiemeyer M, Mazumder R, Aoki-Kinoshita KF. Worldwide Glycoscience Informatics Infrastructure: The GlySpace Alliance. JACS Au. eCollection 2023 Jan 23; [https://www.ncbi.nlm.nih.gov/pubmed/36711080 PMID: 36711080].</li>
<li>Datta Chaudhuri R, Datta R, Rana S, Kar A, Vinh Nguyen Lam P, Mazumder R, Mohanty S, Sarkar S. Cardiomyocyte-specific regression of nitrosative stress-mediated S-Nitrosylation of IKKγ alleviates pathological cardiac hypertrophy. Cell Signal. 2022 Oct; 98:110403 [https://www.ncbi.nlm.nih.gov/pubmed/35835332 PMID: 35835332].</li>
<li>Dahlin M, Singleton SS, David JA, Basuchoudhary A, Wickström R, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumour necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. Cell Signal. 2022 ; eBioMedicine (part of The Lancet discovery science) [https://www.ncbi.nlm.nih.gov/pubmed/35598439 PMID: 35598439].</li>
<li>Lyman DF, Bell A, Black A, Dingerdissen H, Cauley E, Gogate N, Liu D, Joseph A, Kahsay R, Crichton DJ, Mehta A, Mazumder R. Modeling and integration of N-glycan biomarkers in a comprehensive biomarker data model. Glycobiology. August 2022; [https://academic.oup.com/glycob/article/32/10/855/6655823?login=false 35925813].</li>
<li>Torcivia J, Abdilleh K, Seidl F, Shahzada O, Rodriguez R, Pot D, Mazumder R. Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers. Onco (Basel). June 2022; 2(2):129-144. [https://www.ncbi.nlm.nih.gov/pubmed/37841494 PMID: 37841494].</li>
<li>Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://doi.org/10.1016/j.ebiom.2022.104061 https://doi.org/10.1016/j.ebiom.2022.104061].</li>
<li>King CH, Keeney J, Guimera N, Das S, Weber M, Fochtman B, Walderhaug MO, Talwar S, Patel JA, Mazumder R, Donaldson EF. Communicating regulatory high-throughput sequencing data using BioCompute Objects. Drug Discov Today. 2022 Jan 22; [https://www.ncbi.nlm.nih.gov/pubmed/35077912 PMID: 35077912].</li>
<li>Wang Z, Hopson L, Singleton S, Yang X, Jogunoori W, Mazumder R, Obias V, Lin P, Nguyen BN, Yao M, Miller L, White J, Rao S, Mishra L. Mice with dysfunctional TGF-β signaling develop altered intestinal microbiome and colorectal cancer resistant to 5FU. Biochim Biophys Acta Mol Basis Dis. 2021 Oct 1; 1867(10):166179. [https://www.ncbi.nlm.nih.gov/pubmed/34082069 PMID: 34082069].</li>
<li>Lyman D, Natale D, Schriml L, Anton K, Crichton DC, Mazumder R. Analysis of Biomarker Data Towards Development of a Molecular Biomarker Ontology. Proceedings of the International Conference on Biomedical Ontologies 2021 (ICBO 2021) co-located with the Workshop on Ontologies for the Behavioural and Social Sciences (OntoBess 2021) as part of the Bolzano Summer of Knowledge (BOSK 2021) Bozen-Bolzano, Italy. 2021 Sep 16-18; [https://ceur-ws.org/Vol-3073/paper13.pdf https://ceur-ws.org/Vol-3073/paper13.pdf].</li>
<li>Patel JA, Dean DA, King CH, Xiao N, Koc S, Minina E, Golikov A, Brooks P, Kahsay R, Navelkar R, Ray M, Roberson D, Armstrong C, Mazumder R, Keeney J. Bioinformatics tools developed to support BioCompute Objects. Database (Oxford). 2021 March 31; [https://www.ncbi.nlm.nih.gov/pubmed/33784373 PMID: 33784373].</li>
<li>Hora B, Gulzar N, Chen Y, Karagiannis K, Cai F, Su C, Smith K, Simonyan V, Shah SA, Ahmed M, Sanchez AM, Stone M, Cohen MS, Denny TN, Mazumder R, Gao F. Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing. mSphere. 2020 Oct 14; [https://www.ncbi.nlm.nih.gov/pubmed/33055255 PMID: 33055255].</li>
<li>Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://www.ncbi.nlm.nih.gov/pubmed/33814114 PMID: 33814114].</li>
<li>Torcivia J, Mazumder R. Scanning window analysis of non-coding regions within normal-tumor whole-genome sequence samples. Briefings in Bioinformatics. 2020 Sep 17; [https://www.ncbi.nlm.nih.gov/pubmed/32940334 PMID: 32940334].</li>
<li>Gogate N, Lyman D, Bell A, Cauley E, Crandall KA, Joseph A, Kahsay R, Natale DA, Schriml LM, Sen S, Mazumder R. COVID-19 biomarkers and their overlap with comorbidities in a disease biomarker data model. Brief Bioinform. 2021 May 20; bbab191. doi: 10.1093/bib/bbab191. [https://www.ncbi.nlm.nih.gov/pubmed/34015823 PMID: 34015823].</li>
<li>Kahsay R, Vora J, Navelkar R, Mousavi R, Fochtman BC, Holmes X, Pattabiraman N, Ranzinger R, Mahadik R, Williamson T, Kulkarni S, Agarwal G, Martin M, Vasudev P, Garcia L, Edwards N, Zhang W, Natale DA, Ross K, Aoki-Kinoshita KF, Campbell MP, York WS, Mazumder R. GlyGen data model and processing workflow. Bioinformatics. 2020; [https://www.ncbi.nlm.nih.gov/pubmed/32324859 PMID: 32324859].</li>
<li>Kurnat-Thoma E, Baranova A, Baird P, Brodsky E, Butte AJ, Cheema AK, Cheng F, Dutta S, Grant C, Giordano J, Maitland-van der Zee AH, Fridsma DB, Jarrin R, Kann MG, Keeney J, Loscalzo J, Madhavan G, Maron BA, McBride DK, McKean M, Mun SK, Palmer JC, Patel B, Parakh K, Pariser AR, Pristipino C, Radstake TRDJ, Rajasimha HK, Rouse WB, Rozman D, Saleh A, Schmidt HHHW, Schultz N, Sethi T, Silverman EK, Skopac J, Svab I, Trujillo S, Valentine JE, Verma D, West BJ, Vasudevan S. Recent Advances in Systems and Network Medicine: Meeting Report from the First International Conference in Systems and Network Medicine. Syst Med (New Rochelle). 2020; 3(1):22-35. [https://www.ncbi.nlm.nih.gov/pubmed/32226924 PMID: 32226924].</li>
<li>Dingerdissen HM, Bastian F, Vijay-Shanker K, Robinson-Rechavi M, Bell A, Gogate N, Gupta S, Holmes E, Kahsay R, Keeney J, Kincaid H, King CH, Liu D, Crichton DJ, Mazumder R. OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data. JCO Clin Cancer Inform. 2020; 4:210-220. [https://www.ncbi.nlm.nih.gov/pubmed/32142370 PMID: 32142370].</li>
<li>Aoki-Kinoshita KF, Lisacek F, Mazumder R, York WS, Packer NH. The GlySpace Alliance: toward a collaborative global glycoinformatics community. Glycobiology. 2020; 30(2):70-71. [https://www.ncbi.nlm.nih.gov/pubmed/31573039 PMID: 31573039].</li>
<li>York WS, Mazumder R, Ranzinger R, et al. GlyGen: Computational and Informatics Resources for Glycoscience. Glycobiology. 2019. https://doi.org/10.1093/glycob/cwz080 [https://www.ncbi.nlm.nih.gov/pubmed/31616925 PMID: 31616925].</li>
<li>King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://www.ncbi.nlm.nih.gov/pubmed/31509535 PMID: 31509535].</li>
<li>Fan Y, Hu Y, Yan C, Goldman R, Pan Y, Mazumder R, Dingerdissen H. Loss and gain of N-linked glycosylation sequons due to single-nucleotide variation in cancer. Scientific Reports. PLoS One. 2018; 8():4322. [https://www.ncbi.nlm.nih.gov/pubmed/29531238 PMID: 29531238].</li>
<li>Baekdoo Kim, Thahmina Ali, Changsu Dong, Carlos Lijeron, Raja Mazumder, Claudia Wultsch, and Konstantinos Krampis. miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines. Journal of Computational Biology. 2018. http://doi.org/10.1089/cmb.2018.0218</li>
<li>Alterovitz G, Dean D A, Goble C, Crusoe M R, Soiland-Reyes S, Bell A, Hayes A, King, C H S, Taylor D, Johanson E, Thompson E E, Donaldson E, Morizono H, Tsang H S, Goecks J, Yao J, Almeida J S, Krampis K, Guo L, Walderhaug M, Walsh P, Kahsay R, Gottipati S, Bloom T, Lai Y, Simonyan V, Mazumder R. Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results. PLOS Biology; 16(12): e3000099. 2018. https://doi.org/10.1371/journal.pbio.3000099</li>
<li>Hu Y, Dingerdissen H, Gupta S, Kahsay R, Shanker V, Wan Q, Yan C, Mazumder R. Identification of key differentially expressed MicroRNAs in cancer patients through pan-cancer analysis. Computers in Biology and Medicine 2018; vol: 103 pp: 183-197. [https://www.ncbi.nlm.nih.gov/pubmed/30384176 PMID: 30384176].</li>
<li>Dingerdissen H, Torcivia-Rodriguez J, Hu Y, Chang T-C, Mazumder R, Kahsay R. BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery. Nucleic Acids Research. 2017. [https://pubmed.ncbi.nlm.nih.gov/30053270/ PMID: 30053270].</li>
<li>Karagiannis K, Simonyan V, Chumakov K, Mazumder R. Separation and assembly of deep sequencing data into discrete sub-population genomes. Nucleic Acids Research. 45(19):10989-11003. 2017. [https://www.ncbi.nlm.nih.gov/pubmed/28977510 PMID: 28977510].</li>
<li>Chen J, Zaidi S, Rao S, Chen J-S, Phan L, Farci P, Su X, Shetty K, White J, Zamboni F, Wu X, Rashid A, Pattabiraman N, Mazumder R, Horvath A, Wu R-C, Li S, Xiao C, Deng C-X, Wheeler D A, Mishra B, Akbani R, Mishra L. Analysis of Genomes and Transcriptomes of Hepatocellular Carcinomas Identifies Mutations and Gene Expression Changes in the Transforming Growth Factor beta Pathway. Gastroenterology. 2017; S0016-5085(17)36144-9. [https://www.ncbi.nlm.nih.gov/pubmed/28918914 PMID: 28918914].</li>
<li>Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics. 2017; 18(1):391. [https://www.ncbi.nlm.nih.gov/pubmed/28865429 PMID: 28865429].</li>
<li>Gannavaram S, Torcivia J, Gasparyan L, Kaul A, Ismail N, Simonyan V, Nakhasi HL. Whole genome sequencing of live attenuated Leishmania donovani parasites reveals novel biomarkers of attenuation and enables product characterization. Sci Rep. 2017; 7(1):4718. [https://www.ncbi.nlm.nih.gov/pubmed/28680050 PMID: 28680050].</li>
<li>Simonyan V, Chumakov K, Donaldson E, Karagiannis K, Lam PV, Dingerdissen H, Voskanian A. HIVE-heptagon: A sensible variant-calling algorithm with post-alignment quality controls. Genomics. 2017; 109(3-4):131-140. [https://www.ncbi.nlm.nih.gov/pubmed/28188908 PMID: 28188908].</li>
<li>Pan Y, Yan C, Fan Y, Pan Q, Wan Q, Torcivia-Rodriquez J, Mazumder R. Distribution bias analysis of germline and somatic single-nucleotide variations that impact protein functional site and neighboring amino acids. Scientific Reports. 2017; 7:42169 [https://www.ncbi.nlm.nih.gov/pubmed/28176830 PMID: 28176830].</li>
<li>Gulzar N, Dingerdissen H, Yan C, Mazumder R. Impact of Nonsynonymous Single-Nucleotide Variations on Post-Translational Modification Sites in Human Proteins. Methods Mol Biol. 2017; 1558:159-190. [https://www.ncbi.nlm.nih.gov/pubmed/28150238 PMID: 28150238].</li>
<li>Simonyan V, Goecks J, Mazumder R. BioCompute objects - a step towards evaluation and validation of bio-medical scientific computations. PDA J Pharm Sci Technol. 2017; 71(2):136-146 [https://www.ncbi.nlm.nih.gov/pubmed/27974626 PMID: 27974626].</li>
<li>Yan C, Pattabiraman N, Goecks J, Lam P, Nayak A, Pan Y, Torcivia-Rodriquez J, Voskanian A, Wan Q, Mazumder R. Impact of germline and somatic missense variations on drug binding sites. Pharmacogenomics J. 2017; 17(2):128-136 [https://www.ncbi.nlm.nih.gov/pubmed/26810135 PMID: 26810135].</li>
<li>Novatt H, Theisen TC, Massie T, Simonyan V, Voskanian-Kordi A, Renn LA, Rabin RL. Distinct Patterns of Expression of Transcription Factors in Response to Interferon Beta and Interferon lambda-1. J Interferon Cytokine Res. 2016; 36(10):589-598 [https://www.ncbi.nlm.nih.gov/pubmed/27447339 PMID: 27447339].</li>
<li>Chen C, Huang H, Mazumder R, Natale DA, McGarvey PB, Zhang J, Poison SW, Wang Y, Wu CH, UniProt Consortium. Computational clustering for viral reference proteomes. Bioinformatics. 2016; 32(13):2041-3 [https://www.ncbi.nlm.nih.gov/pubmed/27153712 PMID: 27153712].</li>
<li>Mahmood AS, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A text-mining system for mutation-disease association extraction. PLoS One. 2016; 11(4):e0152725 [https://www.ncbi.nlm.nih.gov/pubmed/27073839 PMID: 27073839].</li>
<li>Goldweber S, Theodore J, Torcivia-Rodriquez J, Simonyan V, Mazumder R. Pubcast and Genecast: Browsing and exploring publications and associated curated content in biology through mobile devices. IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(2):498-500 [https://www.ncbi.nlm.nih.gov/pubmed/28113865 PMID: 28113865].</li>
<li>Laassri M, Zagorodnyaya T, Plant EP, Petrovskaya S, Bidzhieva B, Ye Z, Simonyan V, Chumakov K. Deep Sequencing for Evaluation of Genetic Stability of Influenza A/California/07/2009 (H1N1) Vaccine Viruses. PLoS One. 2015; 10(9):e0138650. [https://www.ncbi.nlm.nih.gov/pubmed/26407068 PMID: 26407068].</li>
<li>Sauder CJ, Ngo L, Simonyan V, Cong Y, Zhang C, Link M, Malik T, Rubin SA. Generation and propagation of recombinant mumps viruses exhibiting an additional U residue in the homopolymeric U tract of the F gene-end signal. Virus Genes. 2015; 51(1):12-24. [https://www.ncbi.nlm.nih.gov/pubmed/25962759 PMID: 25962759].</li>
<li>Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D, Mitraka E, Pan Y, Smith KM, Srivastava S, Ward S, Yan C, Mazumder R. Generating a focused view of Disease Ontology cancer terms for pan-cancer data integration and analysis. Database (Oxford). 2015; 2015:bav032. [https://www.ncbi.nlm.nih.gov/pubmed/25841438 PMID: 25841438].</li>
<li>Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu T-J, Yang C, Zhang H, Mazumder R. BioXpress: An integrated RNA-seq derived gene expression database for pan-cancer analysis. Database (Oxford). 2015; 2015. pii: bav019 [https://www.ncbi.nlm.nih.gov/pubmed/25819073 PMID: 25819073].</li>
<li>Kumari P, Mazumder R, Simonyan V, Krampis K. Advantages of distributed and parallel algorithms that leverage Cloud Computing platforms for large-scale genome assembly. F1000Research. 2015; 4(20). [https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&context=smhs_biochem_facpubs https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&context=smhs_biochem_facpubs].</li>
<li>Abunimer A, Dingerdissen H, Torcivia-Rodriguez J, Vinh Nguyen Lam P, Mazumder R. Non-synonymous Single-Nucleotide Variations as Cardiovascular System Disease Biomarkers and Their Roles in Bridging Genomic and Proteomic Technologies. Biomarkers in Cardiovascular Disease. 2015. [https://link.springer.com/referenceworkentry/10.1007/978-94-007-7741-5_40-1 Springer Nature link].</li>
<li>Adhikari S, Chetram MA, Woodrick J, Mitra PS, Manthena PV, Khatkar P, Dakshanamurthy S, Dixon M, Karmahapatra SK, Nuthalapati NK, Gupta S, Narasimhan G, Mazumder R, Loffredo CA, Uren A, Roy R. Germ-line variants of human N-methylpurine DNA glycosylase show impaired DNA repair activity and facilitate 1,N6 ethenoadenine induced mutations. J Biol Chem. 2014; 290(8):4966-80. [https://www.ncbi.nlm.nih.gov/pubmed/25538240 PMID: 25538240].</li>
<li>Wilson CA and Simonyan V. FDA's Activities Supporting Regulatory Application of "Next Gen" Sequencing Technologies. PDA J Pharm Sci Technol. 2014; 68(6):626-630. [https://www.ncbi.nlm.nih.gov/pubmed/25475637 PMID: 25475637].</li>
<li>Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014; 15(1):918. [https://www.ncbi.nlm.nih.gov/pubmed/25336203 PMID: 25336203].</li>
<li>Pan Y, Karagiannis K, Zhang H, Dingerdissen H, Shamsaddini A, Wan Q, Simonyan V, Mazumder R. Human germline and pan-cancer variomes and their distinct functional profiles. Nucleic Acids Research. 2014; 42(18):11570-88. [https://www.ncbi.nlm.nih.gov/pubmed/25232094 PMID: 25232094].</li>
<li>Nayak A, Pattabiraman N, Fadra N, Goldman R, Pond S, Mazumder R. Structure-function analysis of hepatitis C virus envelope glycoproteins E1 and E2. J Biomol Struct Dyn. 2014; 33(8):1682-94. [https://www.ncbi.nlm.nih.gov/pubmed/25245635 PMID: 25245635].</li>
<li>Faison WJ, Rostovtsev A, Castro-Nallar E, Crandall KA, Chumakov K, Simonyan V, Mazumder R. Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics. 2014; 104(1):1-7. [https://www.ncbi.nlm.nih.gov/pubmed/24930720 PMID: 24930720].</li>
<li>Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014; 9(6):e99033. [https://www.ncbi.nlm.nih.gov/pubmed/24918764 PMID: 24918764].</li>
<li>Dingerdissen H, Weaver DS, Karp PD, Pan Y, Simonyan V, Mazumder R. A framework for application of metabolic modeling in yeast to predict the effects of nsSNV in human orthologs. Biol Direct. 2014; 9:9. [https://www.ncbi.nlm.nih.gov/pubmed/24894379 PMID: 24894379].</li>
<li>Bidzhieva B, Zagorodnyaya T, Karagiannis K, Simonyan V, Laassri M, Chumakov K. Deep sequencing approach for genetic stability evaluation of influenza A viruses. J Virol Methods. 2014; 199(68):75. [https://www.ncbi.nlm.nih.gov/pubmed/24406624 PMID: 24406624].</li>
<li>Abunimer A, Smith K, Wu T-J, Lam P, Simonyan V, Mazumder R. Single-nucleotide variations in cardiac arrhythmias: prospects for genomics and proteomics based variation detection. Genes. 2014; 5(2):254-69. [https://www.ncbi.nlm.nih.gov/pubmed/24705329 PMID: 24705329].</li>
<li>Wu T-J, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R. A framework for organizing cancer related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). Database. 2014; 2014:bau022. [https://www.ncbi.nlm.nih.gov/pubmed/24667251 PMID: 24667251].</li>
<li>Dabrazhynetskaya A, Soika V, Volokhov D, Simonyan V, Chizhikov V. Genome Sequence of Mycoplasma hyorhinis Strain DBS 1050. Genome Announce. 2014; 2(2):pii: e00127-14. [https://www.ncbi.nlm.nih.gov/pubmed/24604646 PMID: 24604646].</li>
<li>Cole C, Krampis K, Karagiannis K, Almeida J, Faison JW, Motwani M, Wan Q, Golikov A, Pan Y, Simonyan V, Mazumder R. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics. 2014; 15:28. [https://www.ncbi.nlm.nih.gov/pubmed/24467687 PMID: 24467687].</li>
<li>Mudvari P, Kowsari K, Cole C, Mazumder R, Horvath A. Extraction of molecular features through exome to transcriptome alignment. J Metabol Sys Biol. 2013; 1(1):7. [https://www.ncbi.nlm.nih.gov/pubmed/24791251 PMID: 24791251].</li>
<li>Basuchoudhary A, Simonyan V, Mazumder R. Community annotation and the evolution of cooperation: How patience matters. Open Bioinformatics Journal. 2013; 7:9-18.</li>
<li>Karagiannis K, Simonyan V, Mazumder R. SNVDis: A Proteome-wide Analysis Service for Evaluating nsSNVs in Protein Functional Sites and Pathways. Genomics Proteomics Bioinformatics. 2013; 11(2):122-126. [https://www.ncbi.nlm.nih.gov/pubmed/23618375 PMID: 23618375].</li>
<li>Lam PV, Goldman R, Karagiannis K, Narsule T, Simonyan V, Soika V, Mazumder R. Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes. Genomics Proteomics Bioinformatics. 2013; 11(2):96-104. [https://www.ncbi.nlm.nih.gov/pubmed/23459159 PMID: 23459159].</li>
<li>Dingerdissen H, Motwani M, Karagiannis K, Simonyan V, Mazumder R. Proteome-wide analysis of nonsynonymous single-nucleotide variations in active sites of human proteins. FEBS J. 2013; 280(6):1542-1562. [https://www.ncbi.nlm.nih.gov/pubmed/23350563 PMID: 23350563].</li>
<li>Gaudet P, Arighi C, Bastian F, Bateman A, Blake JA, Cherry MJ, D'Eustachio P, Finn R, Giglio M, Hirschman L, Kania R, Klimke W, Martin MJ, Karsch-Mizrachi I, Munoz-Torres M, Natale D, O'Donovan C, Ouellette F, Pruitt KD, Robinson-Rechavi M, Sansone SA, Schofield P, Sutton G, Van Auken K, Vasudevan S, Wu C, Young J, Mazumder R. Recent advances in biocuration: meeting report from the Fifth International Biocuration Conference. Database (Oxford). 2012; 2012:bas036. [https://www.ncbi.nlm.nih.gov/pubmed/23110974 PMID: 23110974].</li>
<li>Volokhov DV, Simonyan V, Davidson MK, Chizhikov VE. RNA polymerase beta subunit (rpoB) gene and the 16S-23S rRNA intergenic transcribed spacer region (ITS) as complementary molecular markers in addition to the 16S rRNA gene phylogenetic analysis and identification of the species of the family Mycoplasmataceae. Mol Phylogenet Evol. 2012; 62(1):515-28. [https://www.ncbi.nlm.nih.gov/pubmed/22115576 PMID: 22115576].</li>
<li>Mazumder R, Morampudi KS, Motwani M, Vasudevan S, Goldman R. Proteome-wide analysis of single-nucleotide variations in the N-glycosylation sequon of human genes. PLoS One. 2012; 7(5):e36212. [https://www.ncbi.nlm.nih.gov/pubmed/22586465 PMID: 22586465].</li>
</ul>

GW-FEAST

2026-03-09T13:50:15Z

Lorikrammer:

GW-FEAST

2026-03-09T13:49:04Z

Lorikrammer:

PredictMod ML Pipeline Tutorial

2026-02-24T16:42:18Z

Lorikrammer:

<small>Go Back to [[Modeling Tutorials]].</small><h1>Integration of a Machine Learning-based Approach for Predictive Clinical Decision-making using Python</h1>

<h2>Summary</h2>
<h3>Part I. Machine Learning Using Python</h3>
<ol>
<li>What is Python?</li>
<li>Objectives</li>
<li>Methodology of the Machine Learning Algorithms</li>
<li>Software Installation</li>
<li>Downloading the Input Files for Synthetic Data Generation</li>
<li>Downloading the Input Files for Model Training</li>
</ol>

<h3>Part II. Using Python scripts to detect signal difference in the Electronic Healthcare Records of responsive and unresponsive patients</h3>
<ol>
<li>Process</li>
<li>Interpreting the Results</li>
<li>Further Analysis and Next Steps</li>
</ol>

<h2>Part I. Machine Learning Using Python</h2>

<h3>1. What is Python?</h3>
<p>Python is a versatile programming language that supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It is widely used for tasks such as data manipulation, web development, scientific computing, and automation. Python’s extensive standard library and external packages make it particularly useful for data analysis, machine learning, and visualization. Through libraries like NumPy, pandas, matplotlib, and scikit-learn, Python excels at handling large datasets, building models, and visualizing results. Additionally, Python can easily interface with programs written in other languages and supports the integration of a wide range of toolkits to extend its functionality.</p>

<h3>2. Objectives</h3>
<p>The general purpose of this protocol is to provide proof-of-concept through a Python workflow for creating predictive machine learning models using some form of data. In this tutorial, patient data will serve as an example input into the system while the output will determine whether the patient is a responder or a non-responder to the treatment assigned. The concepts in this tutorial will be applicable to most binary classification datasets for future model development.</p>

<p>Two major machine learning concepts will be applied to this system as follows:</p>
<ol>
<li>Create a synthetic data set to be used as an input to a machine learning model to ensure consistency during the model training steps</li>
<li>Input patient data through a series of machine learning classification models to predict whether or not a treatment is effective before dietary or medical intervention (e.g. responder vs. non-responder)</li>
</ol>

<p>This tutorial utilizes patient data provided by <a href="https://synthea.mitre.org/downloads">Synthea</a>. The process of how Synthea data was retrieved and filtered using MATLAB can be found in this <a href="https://docs.google.com/document/d/1yfUjoaU0lfTx8blTCgZehR7Qdn0C0iTU3VTPAag9ITI/edit?usp=sharing">link</a>. This tutorial has its own retrieve and filter process written in Python that we will use. If interested, the full synthetic generation process can be found at this <a href="https://github.com/GW-HIVE/PredictMod/tree/main/flask_backend/models/Diabetes_EHR_v1">link</a>.</p>

<h3>3. Methodology of the Machine Learning Algorithms</h3>
<p>1. Generating Synthetic Data: In order to generate synthetic data, the covariance, standard deviation, and mean are calculated for each variable (BMI, glucose etc.) from the patient dataset. The algorithm will also designate whether the variable is continuous or discrete. A “noise” data set is then generated based on these statistical calculations. This “noise” data is then refined when two classifier neural networks compete to label the data appropriately based on the training set. Once the synthetic data is labeled appropriately, it is stored in a matrix and this process is repeated until a sufficient number of values are generated. This algorithm is similar to a Generational Adversarial Network (GAN). More information regarding how GANs work can be found through the following <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">link</a>. The purpose of the synthetic data generation step in this tutorial is to ensure that the multiple models we will apply to the dataset can handle the input data. Some traditional machine learning models cannot handle NAN/NULL values due to the mathematical operations involved in the process. Languages such as MATLAB that have built-in toolkits that account for this can take years to develop. Python does not have a library capable of this, so to avoid this issue, we can generate synthetic data that avoids this issue.</p>

<p>2. Classifying Training Data: A classification system utilizes a neural network or decision tree to create a binary classifier that can predict whether or not a patient (Row of data from the dataset) will be responsive (Label) to the standard Type II diabetes intervention plan. This plan involves non-invasive lifestyle changes such as diet and exercise. There are two identifiable classes for pre-diabetic individuals who follow the intervention plan: responders and non-responders. Responders are individuals who remain at prediabetic levels or return to normal levels while non-responders are individuals who develop diabetic levels after following the intervention plan. The machine learning algorithm is provided with a fraction of the original patient dataset, known as the “training set”, and trains the model to then be able to predict new patient data (“test set”) without knowing its label.</p>

<h3>4. Software Installation</h3>
<p>Software Installation Requirements for running Machine Learning Algorithms. This guide provides step-by-step instructions on how to set up your environment for running the machine learning algorithm. Follow the instructions below to ensure that the necessary libraries and software are installed correctly.</p>

<p><strong>NOTE:</strong> There may be new versions of the software and libraries from when this tutorial was written. If you run into any issues with functions or methods being unusable, troubleshoot on forums such as stackoverflow or use older versions of software.</p>

<ol>
<li>Install Python: First, ensure that Python is installed on your system. This tutorial uses Python 3.11. You can download Python from the official <a href="https://www.python.org/downloads/">website</a></li>
<li>Install VScode from the official <a href="https://code.visualstudio.com/download">website</a>. Ensure you are using the correct version based on your Operating System (O.S).</li>
<li>Configure VSCode to be able to run Python following the instructions available on the official website: <a href="https://code.visualstudio.com/docs/python/python-tutorial">https://code.visualstudio.com/docs/python/python-tutorial</a>. It is highly recommended to include Pylance: an extension that works alongside Python in Visual Studio Code to provide performant language support. Pylance can be added by opening Visual Studio Code, clicking on the extensions on the left-hand side and search for Pylance, and installing it.</li>
<li>Pip Install Python libraries required for this tutorial. PIP is the package installer for Python for libraries not included in the default python package. Run the following commands:
<pre>pip install tensorflow imageio matplotlib numpy pandas scikit-learn</pre>''If for any reason Pylance cannot resolve an import of one of the required libraries, check which libraries you have installed by typing pip list in Visual Studio Code'' </li>
</ol>

<h3>5. Downloading the Input Files for Synthetic Data Generation</h3>
<p>Download the required material for this tutorial from this <a href="https://drive.google.com/drive/folders/1U-TIZe-Iqmziijiiw-1VHZNaGhIXUerQ?usp=drive_link">link</a>. The Synthetic Generation folder contains the Python file and input excel files required for generating the dataset that we will use.</p>
<ul>
<li><strong>Python Project Materials List:</strong>
<ul>
<li><strong>Synthetic_EHR_data_diabetes.py:</strong> A Python script that generates synthetic Electronic Health Record (EHR) data for diabetes-related studies, using GAN techniques</li>
</ul>
</li>
<li><strong>Excel Data Files:</strong>
<ul>
<li><strong>label_non_responsive.xlsx:</strong> Contains labels for non-responsive patient data</li>
<li><strong>label_responsive.xlsx:</strong> Contains labels for responsive patient data</li>
<li><strong>data_non_responsive.xlsx:</strong> Contains observational data related to patients non-responsive to the treatment</li>
<li><strong>data_responsive.xlsx:</strong> Contains observational data related to patients non-responsive to the treatment</li>
<li><strong>var_list_.xlsx:</strong> A list of variables or features that are present in the dataset. This can be used to identify key variables of interest during analysis.</li>
</ul>
</li>
<li><strong>Documentation:</strong>
<ul>
<li><strong>README.md:</strong> A markdown file providing an overview of the project, explaining the purpose of the scripts, and instructions on how to use the code and data.</li>
</ul>
</li>
</ul>

<h3>6. Downloading the Input Files for Model Training</h3>
<p>Download the required material for this tutorial from this link. The model training folder contains the Python file and input excel files required for testing multiple models and providing performance metrics.</p>

<h2>Part II. Using Python scripts to detect signal difference in the Electronic Healthcare Records of responsive and unresponsive patients</h2>

<h3>Process</h3>
<ol>
<li>Create GAN model that will take input data and generate synthetic data within the same distribution of the initial data but without NAN/NULL values to avoid errors in later steps.</li>
<li>Run multi-model analysis python script that tests a wide variety of machine learning models and outputs the accuracy and RMSE (Root Mean Squared Error) for each.</li>
</ol>

<h3>Step 1: Creating Synthetic Data</h3>

<ol>
<li><strong>Open Visual Studio Code (VSCode)</strong> and go to the top-left corner, click on <strong>File → Open Folder</strong>.</li>
<li>Navigate to the folder where you’ve downloaded the <strong>Synthetic_EHR_data_diabetes.py</strong> script.</li>
<li>Once the folder is open, you should see the file explorer in VSCode.</li>
<li>Double-click on the <strong>Synthetic_EHR_data_diabetes.py</strong> file to open it in the center of the page.</li>
<li>In the top-right of VSCode, click the <strong>Run Python</strong> button to execute the script.</li>
<li>Ensure that you are generating synthetic data for both the responsive and non-responsive datasets. You can achieve this by adjusting the files the script is reading:
<ul>
<li>Change <code>data_responsive.xlsx</code> to <code>data_non_responsive.xlsx</code>.</li>
<li>Do the same for <code>label_responsive.xlsx</code> to <code>label_non_responsive.xlsx</code>.</li>
</ul>
</li>
<li>Adjust the output file name at line 167 of the script to differentiate between the responsive and non-responsive datasets:
<ul>
<li>For the <strong>responsive dataset</strong>: <code>df.to_excel('EHR_responsive_at_epoch_{:04d}.xlsx'.format(epoch))</code></li>
<li>For the <strong>non-responsive dataset</strong>: <code>df.to_excel('EHR_non_responsive_at_epoch_{:04d}.xlsx'.format(epoch))</code></li>
</ul>
</li>
<li>After running the script, concatenate the two synthetic files (responsive and non-responsive) and make sure the <strong>response column</strong> is populated correctly:
<ul>
<li><code>1</code> for the responsive dataset.</li>
<li><code>0</code> for the non-responsive dataset.</li>
</ul>
</li>
</ol>

<h3>Step 2: Running the Multi-Model Analysis</h3>

<p>The multi-model analysis consists of the following steps:</p>

<ol>
<li><strong>Load the dataset</strong> and extract <strong>X (features)</strong> and <strong>y (labels)</strong>.</li>
<li><strong>Split the data</strong> into training and testing sets.</li>
<li>Define a list of machine learning models to test.</li>
<li>Train each model and evaluate its performance using accuracy and RMSE.</li>
<li>Print the results for each model.</li>
</ol>

<p>Once you run the script, the performance metrics (accuracy and RMSE) will be printed in the terminal of VSCode. These metrics will provide a baseline to identify the most effective model for further analysis, including parameter and hyperparameter tuning.</p>

<h3>Interpreting the Results:</h3>

<h4>1. Logistic Regression:</h4>
<ul>
<li><strong>Accuracy:</strong> ~89.5%</li>
<li><strong>RMSE:</strong> Moderate</li>
<li><strong>Misclassification:</strong> 21 incorrect predictions (11 false positives and 10 false negatives).</li>
<li><strong>Next Steps:</strong> Consider feature scaling, regularization, and hyperparameter tuning.</li>
</ul>

<h4>2. Decision Tree:</h4>
<ul>
<li><strong>Accuracy:</strong> 99%</li>
<li><strong>RMSE:</strong> Low</li>
<li><strong>Misclassification:</strong> 2 false negatives.</li>
<li><strong>Next Steps:</strong> Prune the tree to avoid overfitting. Hyperparameter tuning and ensemble techniques like boosting or bagging could help further.</li>
</ul>

<h4>3. Random Forest:</h4>
<ul>
<li><strong>Accuracy:</strong> 98%</li>
<li><strong>RMSE:</strong> Low</li>
<li><strong>Misclassification:</strong> 4 samples misclassified.</li>
<li><strong>Next Steps:</strong> Hyperparameter tuning (number of trees, maximum depth, etc.) and feature importance analysis.</li>
</ul>

<h4>4. Gradient Boosting:</h4>
<ul>
<li><strong>Accuracy:</strong> 99.5%</li>
<li><strong>RMSE:</strong> Lowest among the models.</li>
<li><strong>Misclassification:</strong> 1 false negative.</li>
<li><strong>Next Steps:</strong> Tune learning rate, number of boosting stages, and depth. Dimensionality reduction may also help.</li>
</ul>

<h4>5. K-Nearest Neighbors (KNN):</h4>
<ul>
<li><strong>Accuracy:</strong> 94.5%</li>
<li><strong>RMSE:</strong> 0.2345 (slightly higher compared to tree-based methods).</li>
<li><strong>Misclassification:</strong> 11 false negatives.</li>
<li><strong>Next Steps:</strong> Scale the data, tune the number of neighbors (k), and explore distance metrics.</li>
</ul>

<h4>6. Support Vector Machine (SVM):</h4>
<ul>
<li><strong>Accuracy:</strong> 81.5%</li>
<li><strong>RMSE:</strong> 0.4301 (highest RMSE).</li>
<li><strong>Misclassification:</strong> Significant number of false negatives (37).</li>
<li><strong>Next Steps:</strong> Tune kernel type, C parameter, and gamma. Try dimensionality reduction techniques.</li>
</ul>

<h4>7. Extra Trees:</h4>
<ul>
<li><strong>Accuracy:</strong> 97.5%</li>
<li><strong>RMSE:</strong> 0.1581.</li>
<li><strong>Misclassification:</strong> 5 false negatives.</li>
<li><strong>Next Steps:</strong> Hyperparameter tuning (number of trees, depth), ensemble techniques, or feature selection.</li>
</ul>

<h4>8. AdaBoost:</h4>
<ul>
<li><strong>Accuracy:</strong> 98.5%</li>
<li><strong>RMSE:</strong> 0.1225 (low RMSE).</li>
<li><strong>Misclassification:</strong> Only 3 misclassified samples.</li>
<li><strong>Next Steps:</strong> Tune learning rate and number of estimators. Consider ensemble techniques or cross-validation.</li>
</ul>

<h3>Further Analysis and Next Steps:</h3>

<p>While most models performed exceptionally well, additional techniques could further enhance results:</p>
<ol>
<li><strong>Hyperparameter Tuning:</strong> Fine-tuning model parameters using techniques like <em>Grid Search</em> or <em>Random Search</em> will likely yield improvements. Parameters like learning rate, depth, number of estimators, and regularization strength could be optimized.</li>
<li><strong>Cross-Validation:</strong> Apply k-fold cross-validation for a more reliable estimate of model performance and to avoid overfitting on the test set.</li>
<li><strong>Feature Selection and Dimensionality Reduction:</strong> Implement Principal Component Analysis (PCA) or feature selection methods to reduce noise, improve computation efficiency, and enhance predictive power, especially for models like <strong>KNN</strong> and <strong>SVM</strong>.</li>
<li><strong>Handling Class Imbalance:</strong> Techniques like <em>SMOTE</em> or class weighting could be useful if class imbalance is present in the dataset.</li>
</ol>

<p>By applying these further techniques, we can continue to refine the model performance, increase predictive accuracy, and reduce error metrics across the board.</p>