<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://hivelab.biochemistry.gwu.edu/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Lorikrammer</id>
	<title>HIVE Lab - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://hivelab.biochemistry.gwu.edu/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Lorikrammer"/>
	<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/Special:Contributions/Lorikrammer"/>
	<updated>2026-07-02T02:39:29Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.42.1</generator>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Recommended_Publications_for_Intervention_Outcome_Prediction_Models&amp;diff=1307</id>
		<title>Recommended Publications for Intervention Outcome Prediction Models</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Recommended_Publications_for_Intervention_Outcome_Prediction_Models&amp;diff=1307"/>
		<updated>2026-06-01T13:41:26Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* PMID 37313409 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt; Go Back to [[PredictMod|PredictMod Project]]. &amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following publications have been evaluated for their usefulness in providing publicly-available datasets for intervention outcome prediction models. &lt;br /&gt;
&lt;br /&gt;
== Breast Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/36358741/ PMID 36358741] ===&lt;br /&gt;
The data was collected from the breast cancer cohort of The Cancer Genome Atlas (TCGA) database. The cohort size was 399 patients. &lt;br /&gt;
&lt;br /&gt;
All the genomic data was found at this link: https://www.cbioportal.org/study/summary?id=brca_tcga&lt;br /&gt;
&lt;br /&gt;
[[File:36358741_example_table.png|frameless|601x601px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Curator comments:&amp;lt;/u&amp;gt; I traced the link and looked at the summary page, which provided me with data on the cancer type, data types, and mutations. I focused on where I could find the response/nonresponse status. Since the response indicator for this PMID was “Patients had no disease progression after first-line chemotherapy during the 150 months follow-up period”, I looked for where I could find data on progression of the disease. I found this data on the KM plot for overall survival and disease free months. If we isolated each of the data points on this graph, we could get each patient ID, cancer ID, and other data that corresponds with the responder status of the ID. For example, I found that the data point for the patient ID TCGA-B6-A0I5 had the highest overall percentage disease free after 281 months. When I traced this ID on the clinical data tab, I found the specific mutations and genome alterations to the corresponding ID, which were two important factors looked at in the paper.&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/36257316/ PMID 36257316] ===&lt;br /&gt;
The purpose of this study was to explore how Ferroptosis (a type of cell death linked to iron and lipid metabolism) works differently across triple-negative breast cancer (TNBC) tumors, and to test whether blocking the enzyme GPX4 could make TNBC cells more sensitive to immunotherapy (anti–PD-1).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Condition&amp;lt;/u&amp;gt;: 465 TNBC patients from a large multi-omics dataset, including 360 with transcriptomic data, 279 with whole-exome sequencing (WES), 401 with somatic copy-number alteration (SCNA) data, and 330 with metabolomic data. Additional validation was done in LAR-like TNBC mouse models (TS/A cells in BALB/c mice). Clinical data came from the I-SPY2 cohort and related GEO datasets (GSE173839, GSE124821, GSE176078).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Intervention (drug treatment)&amp;lt;/u&amp;gt;: Combination of GPX4 inhibitors (&#039;&#039;RSL3, ML162&#039;&#039;) and anti–PD-1 immunotherapy, tested both alone and together. Mice received treatments of: GPX4 inhibitor only, anti–PD-1 only, or the combination of both.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;# of Patients / Samples&amp;lt;/u&amp;gt;: 465 patients in the TNBC multi-omics cohort. 8 mice per treatment group in the in-vivo study.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Primary Endpoint&amp;lt;/u&amp;gt;: To see whether GPX4 inhibition could: Slow down tumor growth, change the tumor microenvironment to be more immune-active, improve tumor control when combined with anti–PD-1 therapy.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Responder / Non-Responder Definition&amp;lt;/u&amp;gt;: Responders: Tumors where GPX4 inhibition caused ferroptosis, reduced growth, and made the tumor environment more inflammatory. Non-Responders: Tumors with high GSH metabolism (glutathione pathway) that resisted ferroptosis and showed poor response to immunotherapy.&lt;br /&gt;
&lt;br /&gt;
== Lung Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/27613525/ PMID 27613525] ===&lt;br /&gt;
&#039;&#039;&#039;THIS DATA HAS BEEN USED TO TRAIN MODELS FOR THE PREDICTMOD PROJECT.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Detailed clinico-pathological information was collected from 6 different cohorts derived from public datasets excluding cases with additional chemotherapy or treatment. The different properties and metadata included in the datasets provided are in similar formats. With some filtering, the general dataset is presented as such:&lt;br /&gt;
&lt;br /&gt;
[[File:27613525_example_table.png|frameless|626x626px]]&lt;br /&gt;
&lt;br /&gt;
Then the two-gene classifier was calculated for the samples and sorted into low, medium, or high classifiers. 253 genes were selected based on literature research and array data was analyzed on patients with early stage lung cancer.&lt;br /&gt;
&lt;br /&gt;
The results were validated using qRT-PCR in the same sample population and measured as significantly correlated (p &amp;lt; 0.05) with the microarray data for all of the 20 genes that were most significantly associated with relapse-free survival (RFS) using univariable Cox regression. 7 out of the 20 genes measured were significantly associated with RFS. Two of which analyzed were DUSP6 or ACTN4 in high expression indicated a worse prognosis in Japan and NCI-MD/Norway cohorts. This was validated with the other four cohorts and a fixed effects meta-analysis of the datasets demonstrated no heterogeneity or inconsistency. Therefore, the two-gene classifier reliably and precisely identified stage I + II and stage I patients at high risk for death.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Curator comments&amp;lt;/u&amp;gt;: The focus of the article is the predictive capability and potential of the two genes: DUSP6 and ACTN4 which can give an accurate result for the patient’s prognosis. Therefore, originally for the purposes of our study and the results of interventions, diagnosis biomarkers are not ideal, however, after looking at the datasets themselves, it would be interesting to consider its capability for our purposes. Due to the different datasets containing metadata such as age of surgery and whether the patient had a history of smoking. We could look at the exact details for the surgery as our intervention and use the dead or alive category as the responder/non-responder property based on the type of surgery and taking into consideration the severity of the condition.&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/38935373/ PMID 38935373] ===&lt;br /&gt;
The purpose of this study was to see if a newer type of radiation called intensity-modulated radiotherapy (IMRT) works better and is safer than an older method, three-dimensional conformal radiotherapy (3D-CRT), when both are given with chemotherapy to people with advanced non-small cell lung cancer (NSCLC) that cannot be removed by surgery.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Condition&amp;lt;/u&amp;gt;: Patients with locally advanced NSCLC who could not have surgery, enrolled in the NRG Oncology–RTOG 0617 phase 3 clinical trial. 483 patients (average age 64 years; 40% female) received carboplatin and paclitaxel chemotherapy along with radiation.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Intervention (treatment comparison)&amp;lt;/u&amp;gt;: Compared IMRT vs 3D-CRT, both given with the same chemotherapy. Looked at how much radiation reached the heart and lungs — measured as lung V5 and heart V40 — and how those doses affected survival and side effects.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;# of Patients / Samples&amp;lt;/u&amp;gt;: Total: 483 (IMRT = 228 (47%); 3D-CRT = 255 (53%)). Follow-up time: about 5 years.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Primary Endpoint&amp;lt;/u&amp;gt;: Measured overall survival (OS), progression-free survival (PFS), local failure, new cancers, and severe side effects (grade ≥ 3). Focused on whether heart and lung radiation dose affected long-term outcomes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Responder / Non-Responder Definition&amp;lt;/u&amp;gt;: Responders: Patients treated with IMRT whose heart V40 &amp;lt; 20% had fewer severe lung side effects (pneumonitis) and lived longer (2.5 years vs 1.7 years). Non-Responders: Patients with heart V40 ≥ 20% or who received 3D-CRT, showing more pneumonitis and lower survival (HR 1.34 [95% CI 1.06–1.70]; p = 0.01).&lt;br /&gt;
&lt;br /&gt;
== Liver Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/34975338/ PMID 34975338] ===&lt;br /&gt;
RNA-seq datasets from publicly available datasets were used to compute a stemness (the stem cell-like properties of a subpopulation of tumor cells known as cancer stem cells (CSCs) which are responsible for initiating and sustaining tumor growth) index (mRNAsi) for HCC (hepatocellular carcinoma) patients. Using the index, HCC patients were categorized into two stemness subtypes which would essentially determine the effectiveness and sensitivity of therapies that would target immunotherapy. &lt;br /&gt;
&lt;br /&gt;
When looking for the data itself, it is publicly available, and when exploring the National Cancer Institute GDC Data Portal, there is extensive meta data such as the demographic, race, age (Days to Birth), Days To Death, Vital Status, Primary Diagnosis, Disease Response, and Therapeutic Agents. The formatting may be difficult as I would need to fully download the data before seeing it in the json or csv format to judge but by compiling the variables above, a table could be presented as such by pulling specific relevant data: &lt;br /&gt;
&lt;br /&gt;
[[File:34975338 example table.png|frameless|592x592px]]&lt;br /&gt;
&lt;br /&gt;
With some possible formatting, it would be usable for PredictMod with the selection of data that would be for a specific treatment, in the article, it was noted that Erlotinib was analyzed. Additionally, the assumption that the variable where the patients’ overall survival (OS) in HCC is equivalent to the Disease Response identified in the actual data. The data also seemed to have gone through multiple bioinformatic pipelines, therefore, the disease response could have also been calculated using different biological biomarkers and that data is not accessible.&lt;br /&gt;
&lt;br /&gt;
== Ovarian Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [http://pubmed.ncbi.nlm.nih.gov/35671108/ PMID 35671108] ===&lt;br /&gt;
&#039;&#039;&#039;THIS DATA HAS BEEN USED TO TRAIN MODELS FOR THE PREDICTMOD PROJECT.&#039;&#039;&#039;  &lt;br /&gt;
&lt;br /&gt;
See https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-Methylation and https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-RNAseq. &lt;br /&gt;
&lt;br /&gt;
This study investigates the combination of epigenetic priming (guadecitabine) and immunotherapy (pembrolizumab) in patients with platinum-resistant ovarian, fallopian tube, or primary peritoneal cancer (N=35). The study defines clear response criteria, classifying patients as Responders (durable clinical benefit, CBR; receiving ≥6 cycles) and Non-Responders. The primary endpoint is objective response rate (ORR by RECIST 1.1). All high-throughput sequencing data (methylomic and transcriptomic) is publicly available in GEO (GSE186825, GSE188250), making it suitable for intervention outcome prediction modelling.&lt;br /&gt;
&lt;br /&gt;
== Esophageal Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/37313409/ PMID 37313409] ===&lt;br /&gt;
&#039;&#039;&#039;THIS DATA HAS BEEN USED TO TRAIN MODELS FOR THE PREDICTMOD PROJECT.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Single Cell RNA Sequencing Data: This paper selects transcriptomic data from the GSE78220 cohort, which included melanoma patients who received anti-PD-1 checkpoint inhibition therapy before treatment.&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Recommended_Publications_for_Intervention_Outcome_Prediction_Models&amp;diff=1306</id>
		<title>Recommended Publications for Intervention Outcome Prediction Models</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Recommended_Publications_for_Intervention_Outcome_Prediction_Models&amp;diff=1306"/>
		<updated>2026-06-01T13:41:08Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* PMID 27613525 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt; Go Back to [[PredictMod|PredictMod Project]]. &amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following publications have been evaluated for their usefulness in providing publicly-available datasets for intervention outcome prediction models. &lt;br /&gt;
&lt;br /&gt;
== Breast Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/36358741/ PMID 36358741] ===&lt;br /&gt;
The data was collected from the breast cancer cohort of The Cancer Genome Atlas (TCGA) database. The cohort size was 399 patients. &lt;br /&gt;
&lt;br /&gt;
All the genomic data was found at this link: https://www.cbioportal.org/study/summary?id=brca_tcga&lt;br /&gt;
&lt;br /&gt;
[[File:36358741_example_table.png|frameless|601x601px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Curator comments:&amp;lt;/u&amp;gt; I traced the link and looked at the summary page, which provided me with data on the cancer type, data types, and mutations. I focused on where I could find the response/nonresponse status. Since the response indicator for this PMID was “Patients had no disease progression after first-line chemotherapy during the 150 months follow-up period”, I looked for where I could find data on progression of the disease. I found this data on the KM plot for overall survival and disease free months. If we isolated each of the data points on this graph, we could get each patient ID, cancer ID, and other data that corresponds with the responder status of the ID. For example, I found that the data point for the patient ID TCGA-B6-A0I5 had the highest overall percentage disease free after 281 months. When I traced this ID on the clinical data tab, I found the specific mutations and genome alterations to the corresponding ID, which were two important factors looked at in the paper.&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/36257316/ PMID 36257316] ===&lt;br /&gt;
The purpose of this study was to explore how Ferroptosis (a type of cell death linked to iron and lipid metabolism) works differently across triple-negative breast cancer (TNBC) tumors, and to test whether blocking the enzyme GPX4 could make TNBC cells more sensitive to immunotherapy (anti–PD-1).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Condition&amp;lt;/u&amp;gt;: 465 TNBC patients from a large multi-omics dataset, including 360 with transcriptomic data, 279 with whole-exome sequencing (WES), 401 with somatic copy-number alteration (SCNA) data, and 330 with metabolomic data. Additional validation was done in LAR-like TNBC mouse models (TS/A cells in BALB/c mice). Clinical data came from the I-SPY2 cohort and related GEO datasets (GSE173839, GSE124821, GSE176078).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Intervention (drug treatment)&amp;lt;/u&amp;gt;: Combination of GPX4 inhibitors (&#039;&#039;RSL3, ML162&#039;&#039;) and anti–PD-1 immunotherapy, tested both alone and together. Mice received treatments of: GPX4 inhibitor only, anti–PD-1 only, or the combination of both.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;# of Patients / Samples&amp;lt;/u&amp;gt;: 465 patients in the TNBC multi-omics cohort. 8 mice per treatment group in the in-vivo study.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Primary Endpoint&amp;lt;/u&amp;gt;: To see whether GPX4 inhibition could: Slow down tumor growth, change the tumor microenvironment to be more immune-active, improve tumor control when combined with anti–PD-1 therapy.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Responder / Non-Responder Definition&amp;lt;/u&amp;gt;: Responders: Tumors where GPX4 inhibition caused ferroptosis, reduced growth, and made the tumor environment more inflammatory. Non-Responders: Tumors with high GSH metabolism (glutathione pathway) that resisted ferroptosis and showed poor response to immunotherapy.&lt;br /&gt;
&lt;br /&gt;
== Lung Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/27613525/ PMID 27613525] ===&lt;br /&gt;
&#039;&#039;&#039;THIS DATA HAS BEEN USED TO TRAIN MODELS FOR THE PREDICTMOD PROJECT.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Detailed clinico-pathological information was collected from 6 different cohorts derived from public datasets excluding cases with additional chemotherapy or treatment. The different properties and metadata included in the datasets provided are in similar formats. With some filtering, the general dataset is presented as such:&lt;br /&gt;
&lt;br /&gt;
[[File:27613525_example_table.png|frameless|626x626px]]&lt;br /&gt;
&lt;br /&gt;
Then the two-gene classifier was calculated for the samples and sorted into low, medium, or high classifiers. 253 genes were selected based on literature research and array data was analyzed on patients with early stage lung cancer.&lt;br /&gt;
&lt;br /&gt;
The results were validated using qRT-PCR in the same sample population and measured as significantly correlated (p &amp;lt; 0.05) with the microarray data for all of the 20 genes that were most significantly associated with relapse-free survival (RFS) using univariable Cox regression. 7 out of the 20 genes measured were significantly associated with RFS. Two of which analyzed were DUSP6 or ACTN4 in high expression indicated a worse prognosis in Japan and NCI-MD/Norway cohorts. This was validated with the other four cohorts and a fixed effects meta-analysis of the datasets demonstrated no heterogeneity or inconsistency. Therefore, the two-gene classifier reliably and precisely identified stage I + II and stage I patients at high risk for death.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Curator comments&amp;lt;/u&amp;gt;: The focus of the article is the predictive capability and potential of the two genes: DUSP6 and ACTN4 which can give an accurate result for the patient’s prognosis. Therefore, originally for the purposes of our study and the results of interventions, diagnosis biomarkers are not ideal, however, after looking at the datasets themselves, it would be interesting to consider its capability for our purposes. Due to the different datasets containing metadata such as age of surgery and whether the patient had a history of smoking. We could look at the exact details for the surgery as our intervention and use the dead or alive category as the responder/non-responder property based on the type of surgery and taking into consideration the severity of the condition.&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/38935373/ PMID 38935373] ===&lt;br /&gt;
The purpose of this study was to see if a newer type of radiation called intensity-modulated radiotherapy (IMRT) works better and is safer than an older method, three-dimensional conformal radiotherapy (3D-CRT), when both are given with chemotherapy to people with advanced non-small cell lung cancer (NSCLC) that cannot be removed by surgery.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Condition&amp;lt;/u&amp;gt;: Patients with locally advanced NSCLC who could not have surgery, enrolled in the NRG Oncology–RTOG 0617 phase 3 clinical trial. 483 patients (average age 64 years; 40% female) received carboplatin and paclitaxel chemotherapy along with radiation.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Intervention (treatment comparison)&amp;lt;/u&amp;gt;: Compared IMRT vs 3D-CRT, both given with the same chemotherapy. Looked at how much radiation reached the heart and lungs — measured as lung V5 and heart V40 — and how those doses affected survival and side effects.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;# of Patients / Samples&amp;lt;/u&amp;gt;: Total: 483 (IMRT = 228 (47%); 3D-CRT = 255 (53%)). Follow-up time: about 5 years.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Primary Endpoint&amp;lt;/u&amp;gt;: Measured overall survival (OS), progression-free survival (PFS), local failure, new cancers, and severe side effects (grade ≥ 3). Focused on whether heart and lung radiation dose affected long-term outcomes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;Responder / Non-Responder Definition&amp;lt;/u&amp;gt;: Responders: Patients treated with IMRT whose heart V40 &amp;lt; 20% had fewer severe lung side effects (pneumonitis) and lived longer (2.5 years vs 1.7 years). Non-Responders: Patients with heart V40 ≥ 20% or who received 3D-CRT, showing more pneumonitis and lower survival (HR 1.34 [95% CI 1.06–1.70]; p = 0.01).&lt;br /&gt;
&lt;br /&gt;
== Liver Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/34975338/ PMID 34975338] ===&lt;br /&gt;
RNA-seq datasets from publicly available datasets were used to compute a stemness (the stem cell-like properties of a subpopulation of tumor cells known as cancer stem cells (CSCs) which are responsible for initiating and sustaining tumor growth) index (mRNAsi) for HCC (hepatocellular carcinoma) patients. Using the index, HCC patients were categorized into two stemness subtypes which would essentially determine the effectiveness and sensitivity of therapies that would target immunotherapy. &lt;br /&gt;
&lt;br /&gt;
When looking for the data itself, it is publicly available, and when exploring the National Cancer Institute GDC Data Portal, there is extensive meta data such as the demographic, race, age (Days to Birth), Days To Death, Vital Status, Primary Diagnosis, Disease Response, and Therapeutic Agents. The formatting may be difficult as I would need to fully download the data before seeing it in the json or csv format to judge but by compiling the variables above, a table could be presented as such by pulling specific relevant data: &lt;br /&gt;
&lt;br /&gt;
[[File:34975338 example table.png|frameless|592x592px]]&lt;br /&gt;
&lt;br /&gt;
With some possible formatting, it would be usable for PredictMod with the selection of data that would be for a specific treatment, in the article, it was noted that Erlotinib was analyzed. Additionally, the assumption that the variable where the patients’ overall survival (OS) in HCC is equivalent to the Disease Response identified in the actual data. The data also seemed to have gone through multiple bioinformatic pipelines, therefore, the disease response could have also been calculated using different biological biomarkers and that data is not accessible.&lt;br /&gt;
&lt;br /&gt;
== Ovarian Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [http://pubmed.ncbi.nlm.nih.gov/35671108/ PMID 35671108] ===&lt;br /&gt;
&#039;&#039;&#039;THIS DATA HAS BEEN USED TO TRAIN MODELS FOR THE PREDICTMOD PROJECT.&#039;&#039;&#039;  &lt;br /&gt;
&lt;br /&gt;
See https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-Methylation and https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-RNAseq. &lt;br /&gt;
&lt;br /&gt;
This study investigates the combination of epigenetic priming (guadecitabine) and immunotherapy (pembrolizumab) in patients with platinum-resistant ovarian, fallopian tube, or primary peritoneal cancer (N=35). The study defines clear response criteria, classifying patients as Responders (durable clinical benefit, CBR; receiving ≥6 cycles) and Non-Responders. The primary endpoint is objective response rate (ORR by RECIST 1.1). All high-throughput sequencing data (methylomic and transcriptomic) is publicly available in GEO (GSE186825, GSE188250), making it suitable for intervention outcome prediction modelling.&lt;br /&gt;
&lt;br /&gt;
== Esophageal Cancer ==&lt;br /&gt;
&lt;br /&gt;
=== [https://pubmed.ncbi.nlm.nih.gov/37313409/ PMID 37313409] ===&lt;br /&gt;
Single Cell RNA Sequencing Data: This paper selects transcriptomic data from the GSE78220 cohort, which included melanoma patients who received anti-PD-1 checkpoint inhibition therapy before treatment.&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1305</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1305"/>
		<updated>2026-05-29T17:17:03Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date June 1, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Sujeet Kulkarni &lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Neha Rao&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Navya Sinha&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|John McCaffrey*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Jovanna Aragon&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay&lt;br /&gt;
|GlycoSiteMiner Curation Project &lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;GW Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1303</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1303"/>
		<updated>2026-05-29T14:28:57Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date June 1, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Sujeet Kulkarni &lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Neha Rao&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Navya Sinha&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|John McCaffrey*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;GW Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1299</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1299"/>
		<updated>2026-05-28T15:49:46Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 2026 Summer Volunteer Program Details */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date June 1, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Neha Rao&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Navya Sinha&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|John McCaffrey&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1296</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1296"/>
		<updated>2026-05-26T19:38:31Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Caleb Hailu&lt;br /&gt;
|Pending&lt;br /&gt;
|Pending&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1295</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1295"/>
		<updated>2026-05-26T19:37:19Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Caleb Hailu&lt;br /&gt;
|Pending&lt;br /&gt;
|Pending&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran**&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1294</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1294"/>
		<updated>2026-05-26T19:35:04Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Caleb Hailu&lt;br /&gt;
|Pending&lt;br /&gt;
|Pending&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran**&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1287</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1287"/>
		<updated>2026-05-21T16:27:44Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran**&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1283</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1283"/>
		<updated>2026-05-19T19:54:00Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1282</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1282"/>
		<updated>2026-05-14T20:53:57Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1281</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1281"/>
		<updated>2026-05-14T20:48:18Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 5. PredictMod Machine Learning (ML) Modeling Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either &#039;&#039;&#039;task 1&#039;&#039;&#039; or &#039;&#039;&#039;task 2&#039;&#039;&#039;. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. GlycoSiteMiner Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. GlyGen Publication Analysis Project&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 5. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1280</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1280"/>
		<updated>2026-05-14T20:48:06Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either &#039;&#039;&#039;task 1&#039;&#039;&#039; or &#039;&#039;&#039;task 2&#039;&#039;&#039;. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. GlycoSiteMiner Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. GlyGen Publication Analysis Project&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 5. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, MPH, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1279</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1279"/>
		<updated>2026-05-12T20:43:26Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either &#039;&#039;&#039;task 1&#039;&#039;&#039; or &#039;&#039;&#039;task 2&#039;&#039;&#039;. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. GlycoSiteMiner Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. GlyGen Publication Analysis Project&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 5. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1277</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1277"/>
		<updated>2026-05-11T15:53:14Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either &#039;&#039;&#039;task 1&#039;&#039;&#039; or &#039;&#039;&#039;task 2&#039;&#039;&#039;. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. GlycoSiteMiner Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. GlyGen Publication Analysis Project&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 5. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1276</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1276"/>
		<updated>2026-05-08T17:58:14Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 5. PredictMod Machine Learning (ML) Modeling Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either &#039;&#039;&#039;task 1&#039;&#039;&#039; or &#039;&#039;&#039;task 2&#039;&#039;&#039;. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. GlycoSiteMiner Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. GlyGen Publication Analysis Project&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 5. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1275</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1275"/>
		<updated>2026-05-08T17:57:29Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either &#039;&#039;&#039;task 1&#039;&#039;&#039; or &#039;&#039;&#039;task 2&#039;&#039;&#039;. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. GlycoSiteMiner Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. GlyGen Publication Analysis Project&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 5. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1263</id>
		<title>PredictMod</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1263"/>
		<updated>2026-04-29T16:23:35Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to PredictMod Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the PredictMod project. This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/predictmod/ PredictMod Portal].&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[About PredictMod|About]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
PredictMod (https://hivelab.biochemistry.gwu.edu/predictmod) is an application designed to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. Through the use of the open-source PredictMod platform, clinicians, patients, and researchers will access predictive ML models based on real-world data. The platform empowers users with limited experience in bioinformatics to leverage the power of predictive modeling, providing a collaborative solution for improving patient outcomes. The PredictMod platform utilizes ML tools and complex datasets based on electronic medical records (EMR), gut microbiome, and other -omics data to forecast patient outcomes, often in response to treatment for a particular condition. &lt;br /&gt;
&lt;br /&gt;
While our primary conditions of interest are prediabetes and cancer, the tool is designed to be used for a variety of conditions, interventions, and data types. The agnostic nature of the platform allows for widespread use and relevance to all fields within the scope of medicine.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Complete our [https://docs.google.com/forms/d/e/1FAIpQLSeJPwBcdG4LnP4AvwFtb8I-7Q8UH5aH8XVKBOBpvKW4J6Q2jA/viewform?usp=sharing&amp;amp;ouid=108976537849882609491 use case survey].&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod User Guide|User Guide]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
[[PredictMod User Guide|This document]] contains resources for users of the PredictMod Platform.  &lt;br /&gt;
&lt;br /&gt;
=== Quick links for model users: ===&lt;br /&gt;
*[[PredictMod User Guide|User Guide]]&lt;br /&gt;
*[[PredictMod Frequently Asked Questions|Frequently Asked Questions]]&lt;br /&gt;
*[[PredictMod Contact Us|Contact Us]]&lt;br /&gt;
*[[PredictMod BCOs]]&lt;br /&gt;
&lt;br /&gt;
=== Quick links for model submitters: ===&lt;br /&gt;
* [[How to Find and Extract Machine-Usable Data from Scientific Literature]]&lt;br /&gt;
* [[Recommended Publications for Intervention Outcome Prediction Models]]&lt;br /&gt;
* [[Model Training and Validation]]&lt;br /&gt;
* [[Modeling Tutorials|PredictMod Modeling Tutorials]]&lt;br /&gt;
* [[Augmenting real data with synthetic data|Augmenting Real Data with Synthetic Data]]&lt;br /&gt;
* [[PredictMod Model Submission]]&lt;br /&gt;
* [[AI-READI Dataset Overview]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod Publications &amp;amp; Multimedia|Publications &amp;amp; MultiMedia]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recent Publications: ====&lt;br /&gt;
&lt;br /&gt;
* Talk Data Podcast | MDClone Featuring Lori Krammer | Published March 4th, 2026 &amp;lt;br/&amp;gt;[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts].&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. &#039;&#039;NSM.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI:10.14293/NSM.25.1.0007]&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;Current and Former Contributors&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;The George Washington University &amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt; Raja Mazumder &amp;lt;br /&amp;gt; &lt;br /&gt;
Pat McNeely &amp;lt;br /&amp;gt;  &lt;br /&gt;
Urnisha Bhuiyan &amp;lt;br /&amp;gt;&lt;br /&gt;
Lori Krammer &amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;External Collaborators&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
Sabyasachi Sen, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Jorge Sepulveda, &amp;lt;em&amp;gt;Medical Faculty Associates&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Atin Basu Choudhary, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
John David, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Vinod Aggarwal, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Former Contributors&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Miguel Mazumder&amp;lt;br /&amp;gt;&lt;br /&gt;
Abel Argaw&amp;lt;br /&amp;gt;&lt;br /&gt;
Stephanie Singleton&amp;lt;br /&amp;gt;&lt;br /&gt;
Sangeeta Agarwal&amp;lt;br /&amp;gt;&lt;br /&gt;
Zacharie Savarie&amp;lt;br /&amp;gt;&lt;br /&gt;
Janet Chrosniak&amp;lt;br /&amp;gt;&lt;br /&gt;
Josh Hakakian&amp;lt;br /&amp;gt;&lt;br /&gt;
Nicole Richmond&amp;lt;br /&amp;gt;&lt;br /&gt;
Wilma Jogunoori&amp;lt;br /&amp;gt;&lt;br /&gt;
Arad Jain&amp;lt;br /&amp;gt;&lt;br /&gt;
Hadley King&amp;lt;br /&amp;gt;&lt;br /&gt;
Robel Kahsay&amp;lt;p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Special thanks to our interns and volunteers.&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1252</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1252"/>
		<updated>2026-04-14T20:08:15Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 2026 Summer Volunteer Program Details */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either &#039;&#039;&#039;task 1&#039;&#039;&#039; or &#039;&#039;&#039;task 2&#039;&#039;&#039;. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. GlycoSiteMiner Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. GlyGen Publication Analysis Project&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 5. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Projects&amp;diff=1249</id>
		<title>Projects</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Projects&amp;diff=1249"/>
		<updated>2026-04-14T14:11:42Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Current Projects&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row2&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main The High-performance Integrated Virtual Environment (HIVE) platform]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
HIVE is a cloud-based environment optimized for the storage and analysis of extra-large data, such as biomedical data, clinical data, next-generation sequencing (NGS) data, mass spectrometry files, confocal microscopy images, post-market surveillance data, medical recall data, and many others. HIVE provides secure web access for authorized users to deposit, retrieve, annotate and compute on Big Data, and analyze the outcomes using web user interfaces. [https://docs.google.com/document/d/1F5iq00uKkJfdSsbwanvKOy-nPnwijH56mwbwa_HhzfY/edit?tab=t.0#heading=h.7dlfmngwfzih More here].&lt;br /&gt;
&lt;br /&gt;
The HIVE platform and associated algorithms such as CensuScope and HIVE-Hexagon is used to support Metgenomics analysis infrastructure.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[GW-HIVE WIKI]]&lt;br /&gt;
&lt;br /&gt;
[[METAGENOMICS WIKI]]&lt;br /&gt;
        &lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://data.argosdb.org/ FDA-ARGOS Project (Food and Drug Administration-dAtabase for Regulatory-Grade micrObial Sequences)]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The FDA-ARGOS Project (Food and Drug Administration-dAtabase for Regulatory-Grade micrObial Sequences) is a collaborative effort to create a high-quality genomic database for identifying and characterizing microbial pathogens. Developed in partnership with the FDA, University of Maryland, and NCBI, the project provides regulatory-grade genomic data, crucial for public health and diagnostic use. Expanded in 2021 with support from GWU, Temple University, and Embleema, FDA-ARGOS aims to enhance infectious disease research through rigorous quality control protocols. The ArgosDB hosts this data, offering downloadable sequences and reproducible workflows for research and regulatory applications.[https://www.fda.gov/medical-devices/science-and-research-medical-devices/database-reference-grade-microbial-sequences-fda-argos More here].&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[FDA-ARGOS WIKI]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://www.biocomputeobject.org/ BioCompute Objects (BCO)]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The BioCompute is FDA funded project to establish a framework for community-based development of standards for harmonization of High-throughput Sequencing (HTS), standardization of data formats, promotion of interoperability, and bioinformatics verification protocols. The BioCompute Object (BCO) was developed in the High-throughput Sequencing Computational Standards for Regulatory Sciences (HTS-CSRS) initiative in the BioCompute Objects Portal (BOP), a web portal to serve as a collaborative ground to encourage a dialogue to facilitate interoperability between different bioinformatic pipelines, industries, and developers. HIVE capabilities have been leveraged to support the development of the BCO. The BCO is versatile and adaptable to other common HTS analysis platforms. [https://docs.google.com/document/d/1WQFZm_PFiQXob4NyOKq6y-2ywnbmNoFHSS27fYf3l4Y/edit?tab=t.0#heading=h.bs8eki17tykx More here].&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://wiki.biocomputeobject.org/Main_Page BIOCOMPUTE OBJECTS WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://www.glygen.org/ GlyGen]&amp;lt;/h3&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
GlyGen (gly-glycobiology; gen-information), [https://www.glygen.org/&amp;lt;nowiki&amp;gt;] is an advanced glycoinformatics resource developed to facilitate discovery in basic and translational glycobiology research along with enhancing the integration of multidisciplinary information from diverse resources. GlyGen includes knowledge about molecular, biophysical and functional properties of glycans, genes, and proteins organized in pathways and ontologies, plus a rapidly growing body of biological big data related to cancer mutation and expression. GlyGen adopts an innovative user-driven approach for implementing, prioritizing and knowledge disseminating tools to address the questions and needs of glycobiology community. GlyGen is funded by the National Institute of General Medical Sciences under the grant # 1R24GM146616 - 01 and the  National Institutes of Health Office of Strategic Coordination - The Common Fund under the grant # 1OT2OD032092. More information about GlyGen - &amp;lt;/nowiki&amp;gt;https://www.glygen.org/about/ &amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://wiki.glygen.org/Main_Page GlyGen WIKI]&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hivelab.biochemistry.gwu.edu/predictmod PredictMod]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
PredictMod is an application designed to predict the outcome of an intervention prior to a patient initiating treatment. Our goal is to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. The PredictMod platform utilizes machine learning tools and complex datasets based on electronic health records, gut microbiome, and -omics data to forecast patient outcomes, often in response to treatment for a particular condition. While our primary condition of interest is Prediabetes, the tool is designed to be used for a variety of conditions, interventions, and data types.  &amp;lt;br&amp;gt; &amp;lt;br&amp;gt;&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/PredictMod PredictMod WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[GW-FEAST]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The GW Federated Ecosystems for Analytics and Standardized Technologies (GW-FEAST) project is part of the ARPA-H FEAST performer team initiative that includes academic and industry partners. The goal of the ARPA-H performer teams is “to create bridges across data silos to make health data more accessible and usable”. &amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/GW-FEAST GW-FEAST WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://biomarkerkb.org/ Biomarker Knowledgebase]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The Biomarker Partnership is a CFDE sponsored project to develop a knowledgebase that will organize and integrate biomarker data from different public sources. The data will be connected to contextual information to show a novel systems-level view of biomarkers. The motivation for this project is to improve the harmonization and organization of biomarker data. This will be done by mapping biomarkers from public sources to, and across, CF data elements. This mapping will bridge knowledge across multiple DCCs and biomedical disciplines.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://wiki.biomarkerkb.org/Main_Page BioMarkerKB WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f5faff; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Volunteership Semesters&amp;lt;/div&amp;gt;[[Volunteership Summer 2026]]&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026]]&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025]]&lt;br /&gt;
&lt;br /&gt;
[[Volunteership 2025|Volunteership Summer 2025]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Past Projects&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row2&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hivelab.tst.biochemistry.gwu.edu/gfkb Gut Microbiome Analytic System (Microbiome)]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The HIVE team received NSF funding to develop a Gut Microbiome Monitoring System (GutFeeling) as a tool which when used over time will allow users to rectify their dietary (such as consumption of probiotics and prebiotics) and other lifestyle habits and to help restore their normal microbiome. Rapid analysis of the large amount of metagenomic data, a major bottleneck, has been resolved by our group through the development of a novel algorithm and accompanying software called CensuScope. Through analysis of healthy gut microbiome data, we are actively developing a Knowledge Base (GutFeelingKB) to provide a clearer picture of not only an ideal personalized microbiome but also establish baseline characteristics for each customer. The Mazumder Lab is collaborating with the Milken School of Public Health and Kamtek Sequencing Facility to investigate the relationship between bacterial species commonly present in the digestive tract, diet, physical activity, lifestyle habits, and metabolic risk factors. [https://docs.google.com/document/d/18WyVTJrrf-FR0sHt634vO8Lwel-4OQxP9sNar7gYYro/edit?tab=t.0#heading=h.7qbm3f7lky31 More here].&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
	&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;HIVE-EQAPOL Project on HIVE NGS Data Processing and Analysis&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
For this project, our group works closely with the External Quality Assurance Program Oversight Laboratory (EQAPOL) team to conduct HIV NGS data analysis and collaborate in terms of analyzing, storing, and tracking HIV NGS Data. Reliable identification of strains is critical for developing new assays, validating assay platforms, assisting regulators to evaluate test kits, monitoring HIV drug resistance, and informing vaccine development. The HIVE tools and platform are used for virus identification, recombination analysis, and clone discovery.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://www.oncomx.org/ OncoMX]&amp;lt;/h3&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The OncoMX mission is to create an integrated cancer mutation and expression resource for exploring cancer biomarkers. OncoMX is a collaboration between the George Washington University (GW), NASA&#039;s Jet Propulsion Laboratory (JPL), the Swiss Institute of Bioinformatics (SIB), and the University of Delaware (UD). The core knowledgebase of OncoMX is derived from BioMuta and BioXpress integrated cancer mutation and expression databases which are actively maintained. Normal expression data from Bgee and custom text mining software augment the cancer data to improve functional interpretation of the reported variants and expression profiles. All data are wrapped into the OncoMX database and web portal, mapped to additional functional information from NCI Early Detection Research Network (EDRN) and Reactome. It is expected that the large-scale integration of cancer data and supporting information, provided by OncoMX with direct community feedback, will benefit cancer research by improving synthesis of information and may make earlier detection a reality.&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Glycoproteomics Characterization Workflow and Data-Analysis Pipeline for Vaccines and Biosimilars]&amp;lt;/h3&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
In this FDA funded project we are extending High-performance Integrated Virtual Environment (HIVE) capabilities through the development and integration of software tools and datasets for comparative analysis of glycoproteins. Glycomic analysis has many angles and has been extensively reviewed in recent literature. We propose to rely on the independent development of the glycomics field and incorporate these approaches in the HIVE pipeline as they mature while we develop a standardized glycoinformatics pipeline that will benefit investigators and regulators at the FDA.&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;RESOURCES&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Tool Resources]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Tool Resources]]&#039;&#039;&amp;lt;br&amp;gt;There are a variety of bioinformatic tool resources developed by our team.&lt;br /&gt;
&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Dataset Resources]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Dataset Resources]]&#039;&#039;&amp;lt;br&amp;gt;There are a variety of bioinformatic dataset resources integrated by our team.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Projects&amp;diff=1248</id>
		<title>Projects</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Projects&amp;diff=1248"/>
		<updated>2026-04-14T14:08:23Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Current Projects&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row2&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main The High-performance Integrated Virtual Environment (HIVE) platform]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
HIVE is a cloud-based environment optimized for the storage and analysis of extra-large data, such as biomedical data, clinical data, next-generation sequencing (NGS) data, mass spectrometry files, confocal microscopy images, post-market surveillance data, medical recall data, and many others. HIVE provides secure web access for authorized users to deposit, retrieve, annotate and compute on Big Data, and analyze the outcomes using web user interfaces. [https://docs.google.com/document/d/1F5iq00uKkJfdSsbwanvKOy-nPnwijH56mwbwa_HhzfY/edit?tab=t.0#heading=h.7dlfmngwfzih More here].&lt;br /&gt;
&lt;br /&gt;
The HIVE platform and associated algorithms such as CensuScope and HIVE-Hexagon is used to support Metgenomics analysis infrastructure.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[GW-HIVE WIKI]]&lt;br /&gt;
&lt;br /&gt;
[[METAGENOMICS WIKI]]&lt;br /&gt;
        &lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://data.argosdb.org/ FDA-ARGOS Project (Food and Drug Administration-dAtabase for Regulatory-Grade micrObial Sequences)]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The FDA-ARGOS Project (Food and Drug Administration-dAtabase for Regulatory-Grade micrObial Sequences) is a collaborative effort to create a high-quality genomic database for identifying and characterizing microbial pathogens. Developed in partnership with the FDA, University of Maryland, and NCBI, the project provides regulatory-grade genomic data, crucial for public health and diagnostic use. Expanded in 2021 with support from GWU, Temple University, and Embleema, FDA-ARGOS aims to enhance infectious disease research through rigorous quality control protocols. The ArgosDB hosts this data, offering downloadable sequences and reproducible workflows for research and regulatory applications.[https://www.fda.gov/medical-devices/science-and-research-medical-devices/database-reference-grade-microbial-sequences-fda-argos More here].&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[FDA-ARGOS WIKI]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://www.biocomputeobject.org/ BioCompute Objects (BCO)]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The BioCompute is FDA funded project to establish a framework for community-based development of standards for harmonization of High-throughput Sequencing (HTS), standardization of data formats, promotion of interoperability, and bioinformatics verification protocols. The BioCompute Object (BCO) was developed in the High-throughput Sequencing Computational Standards for Regulatory Sciences (HTS-CSRS) initiative in the BioCompute Objects Portal (BOP), a web portal to serve as a collaborative ground to encourage a dialogue to facilitate interoperability between different bioinformatic pipelines, industries, and developers. HIVE capabilities have been leveraged to support the development of the BCO. The BCO is versatile and adaptable to other common HTS analysis platforms. [https://docs.google.com/document/d/1WQFZm_PFiQXob4NyOKq6y-2ywnbmNoFHSS27fYf3l4Y/edit?tab=t.0#heading=h.bs8eki17tykx More here].&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://wiki.biocomputeobject.org/Main_Page BIOCOMPUTE OBJECTS WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://www.glygen.org/ GlyGen]&amp;lt;/h3&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
GlyGen (gly-glycobiology; gen-information), [https://www.glygen.org/&amp;lt;nowiki&amp;gt;] is an advanced glycoinformatics resource developed to facilitate discovery in basic and translational glycobiology research along with enhancing the integration of multidisciplinary information from diverse resources. GlyGen includes knowledge about molecular, biophysical and functional properties of glycans, genes, and proteins organized in pathways and ontologies, plus a rapidly growing body of biological big data related to cancer mutation and expression. GlyGen adopts an innovative user-driven approach for implementing, prioritizing and knowledge disseminating tools to address the questions and needs of glycobiology community. GlyGen is funded by the National Institute of General Medical Sciences under the grant # 1R24GM146616 - 01 and the  National Institutes of Health Office of Strategic Coordination - The Common Fund under the grant # 1OT2OD032092. More information about GlyGen - &amp;lt;/nowiki&amp;gt;https://www.glygen.org/about/ &amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://wiki.glygen.org/Main_Page GlyGen WIKI]&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hivelab.biochemistry.gwu.edu/predictmod PredictMod]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
PredictMod is an application designed to predict the outcome of an intervention prior to a patient initiating treatment. Our goal is to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. The PredictMod platform utilizes machine learning tools and complex datasets based on electronic health records, gut microbiome, and -omics data to forecast patient outcomes, often in response to treatment for a particular condition. While our primary condition of interest is Prediabetes, the tool is designed to be used for a variety of conditions, interventions, and data types.  &amp;lt;br&amp;gt; &amp;lt;br&amp;gt;&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/PredictMod PredictMod WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[GW-FEAST]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The GW Federated Ecosystems for Analytics and Standardized Technologies (GW-FEAST) project is part of the ARPA-H FEAST performer team initiative that includes academic and industry partners. The goal of the ARPA-H performer teams is “to create bridges across data silos to make health data more accessible and usable”. &amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/GW-FEAST GW-FEAST WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://biomarkerkb.org/ Biomarker Knowledgebase]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The Biomarker Partnership is a CFDE sponsored project to develop a knowledgebase that will organize and integrate biomarker data from different public sources. The data will be connected to contextual information to show a novel systems-level view of biomarkers. The motivation for this project is to improve the harmonization and organization of biomarker data. This will be done by mapping biomarkers from public sources to, and across, CF data elements. This mapping will bridge knowledge across multiple DCCs and biomedical disciplines.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[https://wiki.biomarkerkb.org/Main_Page BioMarkerKB WIKI]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f5faff; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Volunteership Semesters&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Past Projects&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row2&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hivelab.tst.biochemistry.gwu.edu/gfkb Gut Microbiome Analytic System (Microbiome)]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The HIVE team received NSF funding to develop a Gut Microbiome Monitoring System (GutFeeling) as a tool which when used over time will allow users to rectify their dietary (such as consumption of probiotics and prebiotics) and other lifestyle habits and to help restore their normal microbiome. Rapid analysis of the large amount of metagenomic data, a major bottleneck, has been resolved by our group through the development of a novel algorithm and accompanying software called CensuScope. Through analysis of healthy gut microbiome data, we are actively developing a Knowledge Base (GutFeelingKB) to provide a clearer picture of not only an ideal personalized microbiome but also establish baseline characteristics for each customer. The Mazumder Lab is collaborating with the Milken School of Public Health and Kamtek Sequencing Facility to investigate the relationship between bacterial species commonly present in the digestive tract, diet, physical activity, lifestyle habits, and metabolic risk factors. [https://docs.google.com/document/d/18WyVTJrrf-FR0sHt634vO8Lwel-4OQxP9sNar7gYYro/edit?tab=t.0#heading=h.7qbm3f7lky31 More here].&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
	&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;HIVE-EQAPOL Project on HIVE NGS Data Processing and Analysis&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
For this project, our group works closely with the External Quality Assurance Program Oversight Laboratory (EQAPOL) team to conduct HIV NGS data analysis and collaborate in terms of analyzing, storing, and tracking HIV NGS Data. Reliable identification of strains is critical for developing new assays, validating assay platforms, assisting regulators to evaluate test kits, monitoring HIV drug resistance, and informing vaccine development. The HIVE tools and platform are used for virus identification, recombination analysis, and clone discovery.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://www.oncomx.org/ OncoMX]&amp;lt;/h3&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
The OncoMX mission is to create an integrated cancer mutation and expression resource for exploring cancer biomarkers. OncoMX is a collaboration between the George Washington University (GW), NASA&#039;s Jet Propulsion Laboratory (JPL), the Swiss Institute of Bioinformatics (SIB), and the University of Delaware (UD). The core knowledgebase of OncoMX is derived from BioMuta and BioXpress integrated cancer mutation and expression databases which are actively maintained. Normal expression data from Bgee and custom text mining software augment the cancer data to improve functional interpretation of the reported variants and expression profiles. All data are wrapped into the OncoMX database and web portal, mapped to additional functional information from NCI Early Detection Research Network (EDRN) and Reactome. It is expected that the large-scale integration of cancer data and supporting information, provided by OncoMX with direct community feedback, will benefit cancer research by improving synthesis of information and may make earlier detection a reality.&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[https://hive.biochemistry.gwu.edu/dna.cgi?cmd=main Glycoproteomics Characterization Workflow and Data-Analysis Pipeline for Vaccines and Biosimilars]&amp;lt;/h3&amp;gt;&lt;br /&gt;
	&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
In this FDA funded project we are extending High-performance Integrated Virtual Environment (HIVE) capabilities through the development and integration of software tools and datasets for comparative analysis of glycoproteins. Glycomic analysis has many angles and has been extensively reviewed in recent literature. We propose to rely on the independent development of the glycomics field and incorporate these approaches in the HIVE pipeline as they mature while we develop a standardized glycoinformatics pipeline that will benefit investigators and regulators at the FDA.&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;RESOURCES&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Tool Resources]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Tool Resources]]&#039;&#039;&amp;lt;br&amp;gt;There are a variety of bioinformatic tool resources developed by our team.&lt;br /&gt;
&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Dataset Resources]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Dataset Resources]]&#039;&#039;&amp;lt;br&amp;gt;There are a variety of bioinformatic dataset resources integrated by our team.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1224</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1224"/>
		<updated>2026-04-03T15:00:06Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 5. BioCompute Objects User Research Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora, Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
# Review exisiting published biomarkers for the correctness and validity&lt;br /&gt;
#* Validate biomarker–disease associations using primary literature&lt;br /&gt;
#* Assess evidence strength and/or add excerpts from listed papers serving as evidence&lt;br /&gt;
#* Identify outdated, conflicting, or unsupported biomarker claims&lt;br /&gt;
# Biocurate biomarkers from publications based on disease and entity type&lt;br /&gt;
#* Identify and curate novel biomarkers from recent publications&lt;br /&gt;
#* Standardize biomarker representation using controlled vocabularies and ontologies&lt;br /&gt;
#* Classify biomarkers by type and disease context&lt;br /&gt;
# Review and Map Electronic Health Records Normal Entity Data&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# Biocuration and LLM Benchmarking&lt;br /&gt;
#*Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines. This involves reviewing biomedical publications, extracting biomarker entities and supporting evidence, and matching entities to their corresponding ontology terms&lt;br /&gt;
#*Depending on the volunteer&#039;s background and completion of the first task, volunteers may assist in applying an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets  &#039;&#039;Deliverables&#039;&#039;: curated biomarker tables, documentation of curation criteria, benchmark datasets for evaluation, LLM performance report&lt;br /&gt;
&lt;br /&gt;
This volunteership involves human-in-the-loop interactions with freely available LLMs. &lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or would like to know more, please reach out to jeetvora@gwu.edu&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This volunteership is currently paused.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1223</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1223"/>
		<updated>2026-04-03T14:52:52Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 2. GlyGen Biocuration Project Ideas */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora, Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
# Review exisiting published biomarkers for the correctness and validity&lt;br /&gt;
#* Validate biomarker–disease associations using primary literature&lt;br /&gt;
#* Assess evidence strength and/or add excerpts from listed papers serving as evidence&lt;br /&gt;
#* Identify outdated, conflicting, or unsupported biomarker claims&lt;br /&gt;
# Biocurate biomarkers from publications based on disease and entity type&lt;br /&gt;
#* Identify and curate novel biomarkers from recent publications&lt;br /&gt;
#* Standardize biomarker representation using controlled vocabularies and ontologies&lt;br /&gt;
#* Classify biomarkers by type and disease context&lt;br /&gt;
# Review and Map Electronic Health Records Normal Entity Data&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# Biocuration and LLM Benchmarking&lt;br /&gt;
#*Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines. This involves reviewing biomedical publications, extracting biomarker entities and supporting evidence, and matching entities to their corresponding ontology terms&lt;br /&gt;
#*Depending on the volunteer&#039;s background and completion of the first task, volunteers may assist in applying an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets  &#039;&#039;Deliverables&#039;&#039;: curated biomarker tables, documentation of curation criteria, benchmark datasets for evaluation, LLM performance report&lt;br /&gt;
&lt;br /&gt;
This volunteership involves human-in-the-loop interactions with freely available LLMs. &lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or would like to know more, please reach out to jeetvora@gwu.edu&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1221</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1221"/>
		<updated>2026-04-03T14:43:09Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 1. BiomarkerKB Biocuration Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
&lt;br /&gt;
# Review exisiting published biomarkers for the correctness and validity&lt;br /&gt;
#* Validate biomarker–disease associations using primary literature&lt;br /&gt;
#* Assess evidence strength and/or add excerpts from listed papers serving as evidence&lt;br /&gt;
#* Identify outdated, conflicting, or unsupported biomarker claims&lt;br /&gt;
# Biocurate biomarkers from publications based on disease and entity type&lt;br /&gt;
#* Identify and curate novel biomarkers from recent publications&lt;br /&gt;
#* Standardize biomarker representation using controlled vocabularies and ontologies&lt;br /&gt;
#* Classify biomarkers by type and disease context&lt;br /&gt;
# Review and Map Electronic Health Records Normal Entity Data&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# Biocuration and LLM Benchmarking&lt;br /&gt;
#*Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines. This involves reviewing biomedical publications, extracting biomarker entities and supporting evidence, and matching entities to their corresponding ontology terms&lt;br /&gt;
#*Depending on the volunteer&#039;s background and completion of the first task, volunteers may assist in applying an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets  &#039;&#039;Deliverables&#039;&#039;: curated biomarker tables, documentation of curation criteria, benchmark datasets for evaluation, LLM performance report&lt;br /&gt;
&lt;br /&gt;
This volunteership involves human-in-the-loop interactions with freely available LLMs. &lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or would like to know more, please reach out to jeetvora@gwu.edu&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1220</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1220"/>
		<updated>2026-04-03T14:37:40Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Volunteer Expectations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project ====&lt;br /&gt;
&lt;br /&gt;
# Review exisiting published biomarkers for the correctness and validity&lt;br /&gt;
#* Validate biomarker–disease associations using primary literature&lt;br /&gt;
#* Assess evidence strength and/or add excerpts from listed papers serving as evidence&lt;br /&gt;
#* Identify outdated, conflicting, or unsupported biomarker claims&lt;br /&gt;
# Biocurate biomarkers from publications based on disease and entity type&lt;br /&gt;
#* Identify and curate novel biomarkers from recent publications&lt;br /&gt;
#* Standardize biomarker representation using controlled vocabularies and ontologies&lt;br /&gt;
#* Classify biomarkers by type and disease context&lt;br /&gt;
# Review and Map Electronic Health Records Normal Entity Data&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# Biocuration and LLM Benchmarking&lt;br /&gt;
#*Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines. This involves reviewing biomedical publications, extracting biomarker entities and supporting evidence, and matching entities to their corresponding ontology terms&lt;br /&gt;
#*Depending on the volunteer&#039;s background and completion of the first task, volunteers may assist in applying an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets  &#039;&#039;Deliverables&#039;&#039;: curated biomarker tables, documentation of curation criteria, benchmark datasets for evaluation, LLM performance report&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or would like to know more, please reach out to jeetvora@gwu.edu&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1212</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1212"/>
		<updated>2026-04-02T18:35:26Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 5. BioCompute Objects User Research Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1211</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1211"/>
		<updated>2026-04-02T18:35:16Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 4. PredictMod Machine Learning (ML) Modeling Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress reports, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1210</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1210"/>
		<updated>2026-04-02T18:34:41Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 5. BioCompute Objects User Research Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress reports, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1209</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1209"/>
		<updated>2026-04-02T18:30:47Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Presentation &amp;amp; Slide Submission */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user stories maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress reports, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1208</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1208"/>
		<updated>2026-04-02T18:30:17Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* 4. PredictMod Machine Learning (ML) Modeling Project */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user stories maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress reports, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1207</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1207"/>
		<updated>2026-04-02T18:29:35Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. BioCompute Objects User Research Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user stories maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress reports, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Modeling_Tutorials&amp;diff=1206</id>
		<title>Modeling Tutorials</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Modeling_Tutorials&amp;diff=1206"/>
		<updated>2026-04-01T20:33:07Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Modeling &amp;amp; Evaluation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;Go Back to [[PredictMod|PredictMod Project]].&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step Tutorials ==&lt;br /&gt;
The following tutorials are recommended for those interested in creating and submitting models to the PredictMod platform. &lt;br /&gt;
&lt;br /&gt;
# [[Metagenomic ML Tutorial]]&lt;br /&gt;
# [[PredictMod ML Pipeline Tutorial|EHR ML Tutorial]]&lt;br /&gt;
&lt;br /&gt;
== Recommended Pipeline Steps ==&lt;br /&gt;
Please use the existing tutorials on the wiki and supplement your pipeline with the information below. Please indicate which of this content, if any, is more useful than what is online so that we can revise accordingly. &lt;br /&gt;
&lt;br /&gt;
=== Environment Setup ===&lt;br /&gt;
&lt;br /&gt;
# Install VS Code, Miniconda, and Python (version 3.11 is recommended).&lt;br /&gt;
## Miniconda is great for creating isolated environments. Learn more at https://www.anaconda.com/docs/getting-started/working-with-conda/conda-intro-tutorial. &lt;br /&gt;
# Make sure relevant packages are imported:&lt;br /&gt;
## scikit-learn, matplotlib, numpy, pandas, shap, xgboost, seaborn, and joblib packages.&lt;br /&gt;
&lt;br /&gt;
=== Data Preparation ===&lt;br /&gt;
&lt;br /&gt;
# Import the dataset. The ML-ready dataset should:&lt;br /&gt;
## Include all required variables for model training&lt;br /&gt;
## Include a clear R/NR column&lt;br /&gt;
# Run initial descriptive statistics&lt;br /&gt;
# Split the data into train/test&lt;br /&gt;
# Handle missing data (if needed)&lt;br /&gt;
## Simple imputation (median/mode) should be sufficient for first-time modelers, KNN is recommended but more complex.&lt;br /&gt;
# Conduct feature selection&lt;br /&gt;
## Select a method (stepwise selection using random forest (RF) or logistic regression (LOGR) is recommended)&lt;br /&gt;
## Select a threshold (a maximum threshold of 10 or 15, or a percentage threshold of 5% or 1% is recommended)&lt;br /&gt;
&lt;br /&gt;
=== Modeling &amp;amp; Evaluation ===&lt;br /&gt;
&lt;br /&gt;
# Select at least 3 model algorithms for training and comparison (I recommend RF, gradient boosting (XGB), &amp;amp; LOGR)&lt;br /&gt;
## Decision tree classifier (DTC) &amp;amp; support vector machine (SVM) algorithms are also recommended.&lt;br /&gt;
## Information about each model algorithm is thoroughly documented at https://scikit-learn.org/stable/api/sklearn.ensemble.html.&lt;br /&gt;
# Record accuracy, AUROC, f1, precision, recall, and confusion matrices for each model.&lt;br /&gt;
# Generate a SHAP summary plot to visualize feature importance&lt;br /&gt;
# Recommendations for improving model performance:&lt;br /&gt;
## Conduct hyperparameter tuning using random search, grid search, cross validation, etc.&lt;br /&gt;
## For those interested in more advanced methods, such as SHAP or LIME, are recommended.&lt;br /&gt;
## For small datasets, techniques for oversampling such as SMOTE, GANs, etc. are recommended.&lt;br /&gt;
&lt;br /&gt;
=== Single-Patient Predictions ===&lt;br /&gt;
&lt;br /&gt;
# Once trained, create a separate .py script to run predictions using the model of choice.&lt;br /&gt;
## Save the fitted scaler, trained model, and feature names to pickle files using joblib, then load them into the new file.&lt;br /&gt;
# Check for missing data, extra columns, and reorder columns as needed to match the training dataset.&lt;br /&gt;
# Apply the saved scaler using .transform().&lt;br /&gt;
# Generate a class prediction and print the results clearly.&lt;br /&gt;
## OPTIONAL: generate a probability score in addition to the class prediction.&lt;br /&gt;
# Validate the script by testing several single-patient predictions.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
# Each step of your script should include a comment indicating the intended function.&lt;br /&gt;
# Each model must be accompanied by a [https://wiki.biocomputeobject.org/Main_Page BioCompute Object].&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1205</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1205"/>
		<updated>2026-04-01T20:00:03Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Agenda (All times are in Eastern Standard Time) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, Urnisha Bhuiyan &lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced. &lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod; GlyGen&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang**&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside; Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi**&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Spring 2026 Symposium ==&lt;br /&gt;
The Spring symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; April 15th, 2026 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|4:00-4:05 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|4:05-4:30 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Diya Kamalabharathy; Isaac Kim&lt;br /&gt;
|-&lt;br /&gt;
|4:30-4:45 PM&lt;br /&gt;
|ARGOS&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|-&lt;br /&gt;
|4:45-5:10 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Muthusekaran; Conner Cognata&lt;br /&gt;
|-&lt;br /&gt;
|5:10-5:30 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 15 + 5 mins QA - group presentation &lt;br /&gt;
|Diya Kamalabharathy; Sampurna Chakravorty; Ashley Tien&lt;br /&gt;
|-&lt;br /&gt;
|5:30-5:45PM&lt;br /&gt;
| PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|-&lt;br /&gt;
|5:45-6:00 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1186</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1186"/>
		<updated>2026-03-27T17:34:31Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1185</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1185"/>
		<updated>2026-03-27T17:25:21Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod; GlyGen&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang**&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside; Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi**&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Spring 2026 Symposium ==&lt;br /&gt;
The Spring symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; April 15th, 2026 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|4:00-4:05 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|4:05-4:30 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Diya Kamalabharathy; Isaac Kim&lt;br /&gt;
|-&lt;br /&gt;
|4:30-4:45 PM&lt;br /&gt;
|ARGOS&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|-&lt;br /&gt;
|4:45-5:10 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Muthusekaran; Conner Cognata&lt;br /&gt;
|-&lt;br /&gt;
|5:10-5:30 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 15 + 5 mins QA - group presentation&lt;br /&gt;
|Diya Kamalabharathy; Sampurna Chakravorty; Ashley Tien&lt;br /&gt;
|-&lt;br /&gt;
|5:30-5:55PM&lt;br /&gt;
| -&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Bakshi; Miao Wang&lt;br /&gt;
|-&lt;br /&gt;
|5:55-6:00 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1184</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1184"/>
		<updated>2026-03-27T17:21:33Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: Created page with &amp;quot;Test&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Test&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1183</id>
		<title>PredictMod</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1183"/>
		<updated>2026-03-27T17:20:15Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to PredictMod Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the PredictMod project. This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/predictmod/ PredictMod Portal].&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[About PredictMod|About]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
PredictMod (https://hivelab.biochemistry.gwu.edu/predictmod) is an application designed to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. Through the use of the open-source PredictMod platform, clinicians, patients, and researchers will access predictive ML models based on real-world data. The platform empowers users with limited experience in bioinformatics to leverage the power of predictive modeling, providing a collaborative solution for improving patient outcomes. The PredictMod platform utilizes ML tools and complex datasets based on electronic medical records (EMR), gut microbiome, and other -omics data to forecast patient outcomes, often in response to treatment for a particular condition. &lt;br /&gt;
&lt;br /&gt;
While our primary conditions of interest are prediabetes and cancer, the tool is designed to be used for a variety of conditions, interventions, and data types. The agnostic nature of the platform allows for widespread use and relevance to all fields within the scope of medicine.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod User Guide|User Guide]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
[[PredictMod User Guide|This document]] contains resources for users of the PredictMod Platform.  &lt;br /&gt;
&lt;br /&gt;
=== Quick links for model users: ===&lt;br /&gt;
*[[PredictMod User Guide|User Guide]]&lt;br /&gt;
*[[PredictMod Frequently Asked Questions|Frequently Asked Questions]]&lt;br /&gt;
*[[PredictMod Contact Us|Contact Us]]&lt;br /&gt;
*[[PredictMod BCOs]]&lt;br /&gt;
&lt;br /&gt;
=== Quick links for model submitters: ===&lt;br /&gt;
* [[How to Find and Extract Machine-Usable Data from Scientific Literature]]&lt;br /&gt;
* [[Recommended Publications for Intervention Outcome Prediction Models]]&lt;br /&gt;
* [[Model Training and Validation]]&lt;br /&gt;
* [[Modeling Tutorials|PredictMod Modeling Tutorials]]&lt;br /&gt;
* [[Augmenting real data with synthetic data|Augmenting Real Data with Synthetic Data]]&lt;br /&gt;
* [[PredictMod Model Submission]]&lt;br /&gt;
* [[AI-READI Dataset Overview]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod Publications &amp;amp; Multimedia|Publications &amp;amp; MultiMedia]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recent Publications: ====&lt;br /&gt;
&lt;br /&gt;
* Talk Data Podcast | MDClone Featuring Lori Krammer | Published March 4th, 2026 &amp;lt;br/&amp;gt;[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts].&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. &#039;&#039;NSM.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI:10.14293/NSM.25.1.0007]&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;Current and Former Contributors&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;The George Washington University &amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt; Raja Mazumder &amp;lt;br /&amp;gt; &lt;br /&gt;
Pat McNeely &amp;lt;br /&amp;gt;  &lt;br /&gt;
Urnisha Bhuiyan &amp;lt;br /&amp;gt;&lt;br /&gt;
Lori Krammer &amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;External Collaborators&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
Sabyasachi Sen, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Jorge Sepulveda, &amp;lt;em&amp;gt;Medical Faculty Associates&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Atin Basu Choudhary, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
John David, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Vinod Aggarwal, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Former Contributors&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Miguel Mazumder&amp;lt;br /&amp;gt;&lt;br /&gt;
Abel Argaw&amp;lt;br /&amp;gt;&lt;br /&gt;
Stephanie Singleton&amp;lt;br /&amp;gt;&lt;br /&gt;
Sangeeta Agarwal&amp;lt;br /&amp;gt;&lt;br /&gt;
Zacharie Savarie&amp;lt;br /&amp;gt;&lt;br /&gt;
Janet Chrosniak&amp;lt;br /&amp;gt;&lt;br /&gt;
Josh Hakakian&amp;lt;br /&amp;gt;&lt;br /&gt;
Nicole Richmond&amp;lt;br /&amp;gt;&lt;br /&gt;
Wilma Jogunoori&amp;lt;br /&amp;gt;&lt;br /&gt;
Arad Jain&amp;lt;br /&amp;gt;&lt;br /&gt;
Hadley King&amp;lt;br /&amp;gt;&lt;br /&gt;
Robel Kahsay&amp;lt;p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Special thanks to our interns and volunteers.&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1182</id>
		<title>PredictMod</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1182"/>
		<updated>2026-03-27T17:19:53Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to PredictMod Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the PredictMod project. This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/predictmod/ PredictMod Portal].&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[About PredictMod|About]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
PredictMod (https://hivelab.biochemistry.gwu.edu/predictmod) is an application designed to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. Through the use of the open-source PredictMod platform, clinicians, patients, and researchers will access predictive ML models based on real-world data. The platform empowers users with limited experience in bioinformatics to leverage the power of predictive modeling, providing a collaborative solution for improving patient outcomes. The PredictMod platform utilizes ML tools and complex datasets based on electronic medical records (EMR), gut microbiome, and other -omics data to forecast patient outcomes, often in response to treatment for a particular condition. &lt;br /&gt;
&lt;br /&gt;
While our primary conditions of interest are prediabetes and cancer, the tool is designed to be used for a variety of conditions, interventions, and data types. The agnostic nature of the platform allows for widespread use and relevance to all fields within the scope of medicine.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod User Guide|User Guide]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
[[PredictMod User Guide|This document]] contains resources for users of the PredictMod Platform.  &lt;br /&gt;
&lt;br /&gt;
=== Quick links for model users: ===&lt;br /&gt;
*[[PredictMod User Guide|User Guide]]&lt;br /&gt;
*[[PredictMod Frequently Asked Questions|Frequently Asked Questions]]&lt;br /&gt;
*[[PredictMod Contact Us|Contact Us]]&lt;br /&gt;
*[[PredictMod BCOs]]&lt;br /&gt;
&lt;br /&gt;
=== Quick links for model submitters: ===&lt;br /&gt;
* [[How to Find and Extract Machine-Usable Data from Scientific Literature]]&lt;br /&gt;
* [[Recommended Publications for Intervention Outcome Prediction Models]]&lt;br /&gt;
* [[Model Training and Validation]]&lt;br /&gt;
* [[Modeling Tutorials|PredictMod Modeling Tutorials]]&lt;br /&gt;
* [[Augmenting real data with synthetic data|Augmenting Real Data with Synthetic Data]]&lt;br /&gt;
* [[PredictMod Model Submission]]&lt;br /&gt;
* [[AI-READI Dataset Overview]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod Publications &amp;amp; Multimedia|Publications &amp;amp; MultiMedia]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recent Publications: ====&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Talk Data Podcast | MDClone&#039;&#039;&#039; Featuring Lori Krammer | Published March 4th, 2026 &amp;lt;br/&amp;gt;[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts].&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. &#039;&#039;NSM.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI:10.14293/NSM.25.1.0007]&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;Current and Former Contributors&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;The George Washington University &amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt; Raja Mazumder &amp;lt;br /&amp;gt; &lt;br /&gt;
Pat McNeely &amp;lt;br /&amp;gt;  &lt;br /&gt;
Urnisha Bhuiyan &amp;lt;br /&amp;gt;&lt;br /&gt;
Lori Krammer &amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;External Collaborators&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
Sabyasachi Sen, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Jorge Sepulveda, &amp;lt;em&amp;gt;Medical Faculty Associates&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Atin Basu Choudhary, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
John David, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Vinod Aggarwal, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Former Contributors&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Miguel Mazumder&amp;lt;br /&amp;gt;&lt;br /&gt;
Abel Argaw&amp;lt;br /&amp;gt;&lt;br /&gt;
Stephanie Singleton&amp;lt;br /&amp;gt;&lt;br /&gt;
Sangeeta Agarwal&amp;lt;br /&amp;gt;&lt;br /&gt;
Zacharie Savarie&amp;lt;br /&amp;gt;&lt;br /&gt;
Janet Chrosniak&amp;lt;br /&amp;gt;&lt;br /&gt;
Josh Hakakian&amp;lt;br /&amp;gt;&lt;br /&gt;
Nicole Richmond&amp;lt;br /&amp;gt;&lt;br /&gt;
Wilma Jogunoori&amp;lt;br /&amp;gt;&lt;br /&gt;
Arad Jain&amp;lt;br /&amp;gt;&lt;br /&gt;
Hadley King&amp;lt;br /&amp;gt;&lt;br /&gt;
Robel Kahsay&amp;lt;p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Special thanks to our interns and volunteers.&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1181</id>
		<title>PredictMod</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1181"/>
		<updated>2026-03-27T17:19:33Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* Recent Publications: */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to PredictMod Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the PredictMod project. This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/predictmod/ PredictMod Portal].&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[About PredictMod|About]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
PredictMod (https://hivelab.biochemistry.gwu.edu/predictmod) is an application designed to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. Through the use of the open-source PredictMod platform, clinicians, patients, and researchers will access predictive ML models based on real-world data. The platform empowers users with limited experience in bioinformatics to leverage the power of predictive modeling, providing a collaborative solution for improving patient outcomes. The PredictMod platform utilizes ML tools and complex datasets based on electronic medical records (EMR), gut microbiome, and other -omics data to forecast patient outcomes, often in response to treatment for a particular condition. &lt;br /&gt;
&lt;br /&gt;
While our primary conditions of interest are prediabetes and cancer, the tool is designed to be used for a variety of conditions, interventions, and data types. The agnostic nature of the platform allows for widespread use and relevance to all fields within the scope of medicine.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod User Guide|User Guide]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
[[PredictMod User Guide|This document]] contains resources for users of the PredictMod Platform.  &lt;br /&gt;
&lt;br /&gt;
=== Quick links for model users: ===&lt;br /&gt;
*[[PredictMod User Guide|User Guide]]&lt;br /&gt;
*[[PredictMod Frequently Asked Questions|Frequently Asked Questions]]&lt;br /&gt;
*[[PredictMod Contact Us|Contact Us]]&lt;br /&gt;
*[[PredictMod BCOs]]&lt;br /&gt;
&lt;br /&gt;
=== Quick links for model submitters: ===&lt;br /&gt;
* [[How to Find and Extract Machine-Usable Data from Scientific Literature]]&lt;br /&gt;
* [[Recommended Publications for Intervention Outcome Prediction Models]]&lt;br /&gt;
* [[Model Training and Validation]]&lt;br /&gt;
* [[Modeling Tutorials|PredictMod Modeling Tutorials]]&lt;br /&gt;
* [[Augmenting real data with synthetic data|Augmenting Real Data with Synthetic Data]]&lt;br /&gt;
* [[PredictMod Model Submission]]&lt;br /&gt;
* [[AI-READI Dataset Overview]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod Publications &amp;amp; Multimedia|Publications &amp;amp; MultiMedia]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recent Publications: ====&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Talk Data Podcast | MDClone&#039;&#039;&#039; Featuring Lori Krammer | Published March 4th, 2026 &amp;lt;br/&amp;gt;[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts].&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. &#039;&#039;NSM.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI:10.14293/NSM.25.1.0007]&lt;br /&gt;
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.&lt;br /&gt;
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID:38313584.]&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;Current and Former Contributors&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;The George Washington University &amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt; Raja Mazumder &amp;lt;br /&amp;gt; &lt;br /&gt;
Pat McNeely &amp;lt;br /&amp;gt;  &lt;br /&gt;
Urnisha Bhuiyan &amp;lt;br /&amp;gt;&lt;br /&gt;
Lori Krammer &amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;External Collaborators&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
Sabyasachi Sen, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Jorge Sepulveda, &amp;lt;em&amp;gt;Medical Faculty Associates&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Atin Basu Choudhary, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
John David, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Vinod Aggarwal, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Former Contributors&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Miguel Mazumder&amp;lt;br /&amp;gt;&lt;br /&gt;
Abel Argaw&amp;lt;br /&amp;gt;&lt;br /&gt;
Stephanie Singleton&amp;lt;br /&amp;gt;&lt;br /&gt;
Sangeeta Agarwal&amp;lt;br /&amp;gt;&lt;br /&gt;
Zacharie Savarie&amp;lt;br /&amp;gt;&lt;br /&gt;
Janet Chrosniak&amp;lt;br /&amp;gt;&lt;br /&gt;
Josh Hakakian&amp;lt;br /&amp;gt;&lt;br /&gt;
Nicole Richmond&amp;lt;br /&amp;gt;&lt;br /&gt;
Wilma Jogunoori&amp;lt;br /&amp;gt;&lt;br /&gt;
Arad Jain&amp;lt;br /&amp;gt;&lt;br /&gt;
Hadley King&amp;lt;br /&amp;gt;&lt;br /&gt;
Robel Kahsay&amp;lt;p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Special thanks to our interns and volunteers.&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1180</id>
		<title>PredictMod Publications &amp; Multimedia</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1180"/>
		<updated>2026-03-27T17:18:55Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* MultiMedia */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt; Go Back to the [[PredictMod|PredictMod Home Page]]. &amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== PredictMod Publications ==&lt;br /&gt;
&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]&lt;br /&gt;
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.&lt;br /&gt;
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].&lt;br /&gt;
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).&lt;br /&gt;
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].&lt;br /&gt;
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].&lt;br /&gt;
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].&lt;br /&gt;
&lt;br /&gt;
== MultiMedia ==&lt;br /&gt;
* &#039;&#039;&#039;Talk Data Podcast | MDClone&#039;&#039;&#039; Featuring Lori Krammer | Published March 4th, 2026 &amp;lt;br/&amp;gt;[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts]. This video is part of the [[PredictMod|PredictMod Project]].&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo&#039;&#039;&#039; Featuring Stephanie Singleton, Edited by James Ziegler | Published December 8th, 2020  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo].  View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE&#039;&#039;&#039;  Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler | Published April 20th, 2021  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1179</id>
		<title>PredictMod Publications &amp; Multimedia</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1179"/>
		<updated>2026-03-27T17:18:30Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* MultiMedia */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt; Go Back to the [[PredictMod|PredictMod Home Page]]. &amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== PredictMod Publications ==&lt;br /&gt;
&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]&lt;br /&gt;
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.&lt;br /&gt;
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].&lt;br /&gt;
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).&lt;br /&gt;
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].&lt;br /&gt;
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].&lt;br /&gt;
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].&lt;br /&gt;
&lt;br /&gt;
== MultiMedia ==&lt;br /&gt;
* &#039;&#039;&#039;Talk Data Podcast | MDClone&#039;&#039;&#039; Featuring Lori Krammer | Published March 4th, 2026 &amp;lt;br/&amp;gt;[https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts]. This video is part of the [[PredictMod|PredictMod Project]].&lt;br /&gt;
* &#039;&#039;&#039;Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo&#039;&#039;&#039; Featuring Stephanie Singleton, Edited by James Ziegler | Published December 8th, 2020  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo].  View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE&#039;&#039;&#039;  Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler | Published April 20th, 2021  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1178</id>
		<title>PredictMod Publications &amp; Multimedia</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1178"/>
		<updated>2026-03-27T17:17:29Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: /* MultiMedia */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt; Go Back to the [[PredictMod|PredictMod Home Page]]. &amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== PredictMod Publications ==&lt;br /&gt;
&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]&lt;br /&gt;
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.&lt;br /&gt;
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].&lt;br /&gt;
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).&lt;br /&gt;
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].&lt;br /&gt;
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].&lt;br /&gt;
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].&lt;br /&gt;
&lt;br /&gt;
== MultiMedia ==&lt;br /&gt;
* &#039;&#039;&#039;Talk Data Podcast | MDClone&#039;&#039;&#039; Featuring Lori Krammer | Published March 4th, 2026 [https://www.linkedin.com/posts/lori-krammer_syntheticdata-machinelearning-healthcareinnovation-activity-7442231985365770242-xzV0?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACerBVYBiJq4wwQ4cu1WPEc-RZ1z7ZHiMhQ Linkedin Post]. Listen on [https://open.spotify.com/show/68biApf6cwsE50bAnAdj1R Spotify] or [https://podcasts.apple.com/us/podcast/talk-data/id1653305563 Apple Podcasts]. This video is part of the [[PredictMod|PredictMod Project]].&lt;br /&gt;
* &#039;&#039;&#039;Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo&#039;&#039;&#039; Featuring Stephanie Singleton, Edited by James Ziegler | Published December 8th, 2020  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo].  View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE&#039;&#039;&#039;  Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler | Published April 20th, 2021  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1176</id>
		<title>PredictMod Publications &amp; Multimedia</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_Publications_%26_Multimedia&amp;diff=1176"/>
		<updated>2026-03-12T17:32:09Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt; Go Back to the [[PredictMod|PredictMod Home Page]]. &amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== PredictMod Publications ==&lt;br /&gt;
&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. NSM. 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007]&lt;br /&gt;
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.&lt;br /&gt;
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID: 38313584].&lt;br /&gt;
* Bhuiyan, U. in Biochemistry and Molecular Medicine, Vol. Masters 72 (George Washington University, Washington, DC; 2023).&lt;br /&gt;
* Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://linkinghub.elsevier.com/retrieve/pii/S2352396422002420 https://doi.org/10.1016/j.ebiom.2022.104061].&lt;br /&gt;
* Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://pubmed.ncbi.nlm.nih.gov/33814114/ PMID: 33814114].&lt;br /&gt;
* King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://pubmed.ncbi.nlm.nih.gov/31509535/ PMID: 31509535].&lt;br /&gt;
&lt;br /&gt;
== MultiMedia ==&lt;br /&gt;
* &#039;&#039;&#039;Microbiome: VA AI Tech Sprint 2021 | Phase 2 Demo&#039;&#039;&#039; Featuring Stephanie Singleton, Edited by James Ziegler  Published December 8th, 2020  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/K2S7YrIBN_0 Phase 2 Demo]. View our [https://youtu.be/RRm6-kCGegE MATLAB Prototype Demo].  View our [https://tinyurl.com/phase-2-demo-slides Phase 2 Demo Slides].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;VA AI Tech Sprint Phase 3 Final Demo | GWU HIVE&#039;&#039;&#039;  Presented by Stephanie Singleton, James Ziegler, Edited by James Ziegler  Published April 20th, 2021  &amp;lt;br /&amp;gt;[https://www.youtube.com/embed/CgIwy_zfn9g Phase 3 Demo]. View our [https://tinyurl.com/Final-Demo-Materials materials].  This video is a part of the [https://hivelab.biochemistry.gwu.edu/gfkb Microbiome Project].&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1174</id>
		<title>PredictMod</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1174"/>
		<updated>2026-03-09T13:58:19Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to PredictMod Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the PredictMod project. This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/predictmod/ PredictMod Portal].&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[About PredictMod|About]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
PredictMod (https://hivelab.biochemistry.gwu.edu/predictmod) is an application designed to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. Through the use of the open-source PredictMod platform, clinicians, patients, and researchers will access predictive ML models based on real-world data. The platform empowers users with limited experience in bioinformatics to leverage the power of predictive modeling, providing a collaborative solution for improving patient outcomes. The PredictMod platform utilizes ML tools and complex datasets based on electronic medical records (EMR), gut microbiome, and other -omics data to forecast patient outcomes, often in response to treatment for a particular condition. &lt;br /&gt;
&lt;br /&gt;
While our primary conditions of interest are prediabetes and cancer, the tool is designed to be used for a variety of conditions, interventions, and data types. The agnostic nature of the platform allows for widespread use and relevance to all fields within the scope of medicine.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod User Guide|User Guide]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
[[PredictMod User Guide|This document]] contains resources for users of the PredictMod Platform.  &lt;br /&gt;
&lt;br /&gt;
=== Quick links for model users: ===&lt;br /&gt;
*[[PredictMod User Guide|User Guide]]&lt;br /&gt;
*[[PredictMod Frequently Asked Questions|Frequently Asked Questions]]&lt;br /&gt;
*[[PredictMod Contact Us|Contact Us]]&lt;br /&gt;
*[[PredictMod BCOs]]&lt;br /&gt;
&lt;br /&gt;
=== Quick links for model submitters: ===&lt;br /&gt;
* [[How to Find and Extract Machine-Usable Data from Scientific Literature]]&lt;br /&gt;
* [[Recommended Publications for Intervention Outcome Prediction Models]]&lt;br /&gt;
* [[Model Training and Validation]]&lt;br /&gt;
* [[Modeling Tutorials|PredictMod Modeling Tutorials]]&lt;br /&gt;
* [[Augmenting real data with synthetic data|Augmenting Real Data with Synthetic Data]]&lt;br /&gt;
* [[PredictMod Model Submission]]&lt;br /&gt;
* [[AI-READI Dataset Overview]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod Publications &amp;amp; Multimedia|Publications &amp;amp; MultiMedia]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recent Publications: ====&lt;br /&gt;
&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. &#039;&#039;NSM.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI:10.14293/NSM.25.1.0007]&lt;br /&gt;
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.&lt;br /&gt;
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID:38313584.]&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;Current and Former Contributors&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;The George Washington University &amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt; Raja Mazumder &amp;lt;br /&amp;gt; &lt;br /&gt;
Pat McNeely &amp;lt;br /&amp;gt;  &lt;br /&gt;
Urnisha Bhuiyan &amp;lt;br /&amp;gt;&lt;br /&gt;
Lori Krammer &amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;External Collaborators&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
Sabyasachi Sen, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Jorge Sepulveda, &amp;lt;em&amp;gt;Medical Faculty Associates&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Atin Basu Choudhary, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
John David, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Vinod Aggarwal, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Former Contributors&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Miguel Mazumder&amp;lt;br /&amp;gt;&lt;br /&gt;
Abel Argaw&amp;lt;br /&amp;gt;&lt;br /&gt;
Stephanie Singleton&amp;lt;br /&amp;gt;&lt;br /&gt;
Sangeeta Agarwal&amp;lt;br /&amp;gt;&lt;br /&gt;
Zacharie Savarie&amp;lt;br /&amp;gt;&lt;br /&gt;
Janet Chrosniak&amp;lt;br /&amp;gt;&lt;br /&gt;
Josh Hakakian&amp;lt;br /&amp;gt;&lt;br /&gt;
Nicole Richmond&amp;lt;br /&amp;gt;&lt;br /&gt;
Wilma Jogunoori&amp;lt;br /&amp;gt;&lt;br /&gt;
Arad Jain&amp;lt;br /&amp;gt;&lt;br /&gt;
Hadley King&amp;lt;br /&amp;gt;&lt;br /&gt;
Robel Kahsay&amp;lt;p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Special thanks to our interns and volunteers.&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=GW-FEAST&amp;diff=1173</id>
		<title>GW-FEAST</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=GW-FEAST&amp;diff=1173"/>
		<updated>2026-03-09T13:54:58Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to GW-FEAST Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the GW-FEAST project.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&#039;&#039;This research was funded, in part, by the Advanced Research Projects Agency for Health (ARPA-H). The views and conclusions contained in this documentation are those of the authors and should not be interpreted as representing the oﬃcial policies, either expressed or implied, of the U.S. Government.&#039;&#039;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;About GW-FEAST&amp;lt;/h2&amp;gt;&lt;br /&gt;
Federated Ecosystems for Analytics and Standardized Technologies ([https://hivelab.biochemistry.gwu.edu/gw-feast FEAST]) is a cloud-based, agile bioinformatics and data analysis platform under development through the ARPA-H Biomedical Data Fabric (BDF) toolbox program. The project is led by [https://dnahive.com DNA-HIVE] and other funded collaborators include Cornell University, Vanderbilt University, Georgetown University, European Bioinformatics Institute, and Kaiser Permanente. Our team is responsible for the GW instance of FEAST (GW-FEAST) and for co-leading the project with DNA-HIVE. This project is part of the ARPA-H FEAST performer team initiative to create bridges across data silos and make health data more accessible and usable. &lt;br /&gt;
&lt;br /&gt;
Several hospitals and cancer centers will have a FEAST platform, which enables cross-site data analysis without the need to export or transform the data. Currently, large chunks of data are used by insurance companies, pharmaceutical companies, and others for research and development purposes. The FEAST platform, which is particularly strong with noisy, real-world data, aims to enable more precise data selection for research use while preserving patient privacy. When clinical data is submitted to the suite of tools, submission is handled via the HL7 FHIR protocol, ensuring only authorized parties ever have access to protected data. Models that provide update mechanisms such as online training will be updated appropriately without retaining any personally identifiable information (PII). Thus, these tools support federated data sets and training without ever retaining clinical PII within the system. All services are treated as independent microservices through containerization within docker containers. &lt;br /&gt;
&lt;br /&gt;
[https://drive.google.com/file/d/1iv9VmFhNbd-5iwSwDMLVumCFnFN84cl8/view?usp=drive_link FEAST Video]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Quick Links ==&lt;br /&gt;
&lt;br /&gt;
* [[GW-FEAST Data|GW-FEAST Data Sources]]&lt;br /&gt;
* [[GW-FEAST Data Access Portal]]&lt;br /&gt;
* [[GW-FEAST Data De-identification|GW-FEAST De-identification]]&lt;br /&gt;
* [[FEAST Knowledgebase APIs|FEASTKB APIs]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;GW-FEAST Project Architecture&amp;lt;/h2&amp;gt;[[File:GW-FEAST_architecture.png|none|thumb|658x658px]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;The GW-FEAST architecture diagram showcases the GW environment set up to facilitate FEAST queries through the GW node (or instance of FEAST at GW). While other consortium sites may have slightly different environment configurations, the overall structure and security practices will be similar across all sites. This diagram is subject to change throughout the life of the project.&#039;&#039;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Recent Publications ==&lt;br /&gt;
&lt;br /&gt;
* Mazumder R, Keeney J, Johnson L, Krammer L, McNeely P, Sepulveda J, Hangen D, Martin M, Jyothi D, De Almeida J, McGarvey P, Alaoui A, Cha S, Sedrakyan A, Shoelle E, Matheny M, LeNoue-Newton M, Winter R, Deppen S, Simonyan V, Horvath A. From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research. J Am Med Inform Assoc. 2026 Jan 20:ocag001. Epub ahead of print. [https://pubmed.ncbi.nlm.nih.gov/41556955/ PMID: 41556955].&amp;lt;/div&amp;gt; &lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Collaborating Institutions ==&lt;br /&gt;
&lt;br /&gt;
DNA-HIVE (prime)  &lt;br /&gt;
&lt;br /&gt;
Department of Biochemistry and Molecular Medicine, The George Washington University (Co-PI)  &lt;br /&gt;
&lt;br /&gt;
Innovation Center for Biomedical Informatics, Georgetown University  &lt;br /&gt;
&lt;br /&gt;
Division of Cancer Epidemiology and Genetics, National Cancer Institute  &lt;br /&gt;
&lt;br /&gt;
Vanderbilt University Medical Center   &lt;br /&gt;
&lt;br /&gt;
Weill Cornell Medical College  &lt;br /&gt;
&lt;br /&gt;
European Bioinformatics Institute, European Molecular Biology Laboratory   &lt;br /&gt;
&lt;br /&gt;
Kaiser Permanente  &lt;br /&gt;
&amp;lt;/div&amp;gt; &lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1172</id>
		<title>PredictMod</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod&amp;diff=1172"/>
		<updated>2026-03-09T13:54:25Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to PredictMod Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the PredictMod project. This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/predictmod/ PredictMod Portal].&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[About PredictMod|About]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
PredictMod (https://hivelab.biochemistry.gwu.edu/predictmod) is an application designed to provide clinicians with a powerful decision making tool that enhances clinical understanding of patient-level data. Through the use of the open-source PredictMod platform, clinicians, patients, and researchers will access predictive ML models based on real-world data. The platform empowers users with limited experience in bioinformatics to leverage the power of predictive modeling, providing a collaborative solution for improving patient outcomes. The PredictMod platform utilizes ML tools and complex datasets based on electronic medical records (EMR), gut microbiome, and other -omics data to forecast patient outcomes, often in response to treatment for a particular condition. &lt;br /&gt;
&lt;br /&gt;
While our primary conditions of interest are prediabetes and cancer, the tool is designed to be used for a variety of conditions, interventions, and data types. The agnostic nature of the platform allows for widespread use and relevance to all fields within the scope of medicine.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod User Guide|User Guide]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
[[PredictMod User Guide|This document]] contains resources for users of the PredictMod Platform.  &lt;br /&gt;
&lt;br /&gt;
=== Quick links for model users: ===&lt;br /&gt;
*[[PredictMod User Guide|User Guide]]&lt;br /&gt;
*[[PredictMod Frequently Asked Questions|Frequently Asked Questions]]&lt;br /&gt;
*[[PredictMod Contact Us|Contact Us]]&lt;br /&gt;
*[[PredictMod BCOs]]&lt;br /&gt;
&lt;br /&gt;
=== Quick links for model submitters: ===&lt;br /&gt;
* [[How to Find and Extract Machine-Usable Data from Scientific Literature]]&lt;br /&gt;
* [[Recommended Publications for Intervention Outcome Prediction Models]]&lt;br /&gt;
* [[Model Training and Validation]]&lt;br /&gt;
* [[Modeling Tutorials|PredictMod Modeling Tutorials]]&lt;br /&gt;
* [[Augmenting real data with synthetic data|Augmenting Real Data with Synthetic Data]]&lt;br /&gt;
* [[PredictMod Model Submission]]&lt;br /&gt;
* [[AI-READI Dataset Overview]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;[[PredictMod Publications &amp;amp; Multimedia|Publications &amp;amp; MultiMedia]]&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Recent Publications: ====&lt;br /&gt;
&lt;br /&gt;
* Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&lt;br /&gt;
* Krammer L, McNeely P, and Bhuiyan U et al. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. &#039;&#039;NSM.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI:10.14293/NSM.25.1.0007]&lt;br /&gt;
* Krammer L, Aggarwal V, Bhuiyan U, McNeely P, Mazumder R. PredictMod: a machine learning-based platform for predicting and sharing intervention outcomes in patients. Poster presented at: 22nd International Conference on Artificial Intelligence in Medicine; July 9-12, 2024; Salt Lake City, Utah, USA.&lt;br /&gt;
* Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://pubmed.ncbi.nlm.nih.gov/38313584/ PMID:38313584.]&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;Current and Former Contributors&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;The George Washington University &amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt; Raja Mazumder &amp;lt;br /&amp;gt; &lt;br /&gt;
Pat McNeely &amp;lt;br /&amp;gt;  &lt;br /&gt;
Urnisha Bhuiyan &amp;lt;br /&amp;gt;&lt;br /&gt;
Lori Krammer &amp;lt;br /&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;External Collaborators&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
Sabyasachi Sen, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Jorge Sepulveda, &amp;lt;em&amp;gt;Medical Faculty Associates&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Atin Basu Choudhary, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
John David, &amp;lt;em&amp;gt;Virginia Military Institute&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
Vinod Aggarwal, &amp;lt;em&amp;gt;Veterans Administration&amp;lt;/em&amp;gt;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Former Contributors&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Miguel Mazumder&amp;lt;br /&amp;gt;&lt;br /&gt;
Abel Argaw&amp;lt;br /&amp;gt;&lt;br /&gt;
Stephanie Singleton&amp;lt;br /&amp;gt;&lt;br /&gt;
Sangeeta Agarwal&amp;lt;br /&amp;gt;&lt;br /&gt;
Zacharie Savarie&amp;lt;br /&amp;gt;&lt;br /&gt;
Janet Chrosniak&amp;lt;br /&amp;gt;&lt;br /&gt;
Josh Hakakian&amp;lt;br /&amp;gt;&lt;br /&gt;
Nicole Richmond&amp;lt;br /&amp;gt;&lt;br /&gt;
Wilma Jogunoori&amp;lt;br /&amp;gt;&lt;br /&gt;
Arad Jain&amp;lt;br /&amp;gt;&lt;br /&gt;
Hadley King&amp;lt;br /&amp;gt;&lt;br /&gt;
Robel Kahsay&amp;lt;p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Special thanks to our interns and volunteers.&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Publications&amp;diff=1171</id>
		<title>Publications</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Publications&amp;diff=1171"/>
		<updated>2026-03-09T13:53:59Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;All publications listed on this page should follow a modified National Library of Medicine (NLM) citation format, adapted for clarity and consistency. Here is the suggested format:&amp;lt;blockquote&amp;gt;&#039;&#039;Author(s). Title of article. Journal Name. Year Month Day;Volume(Issue):Page range. PMID: [if available] DOI: [if no PMID]&#039;&#039;&amp;lt;/blockquote&amp;gt;Some guidelines:&lt;br /&gt;
&lt;br /&gt;
* If a PubMed ID (PMID) is available, include it and omit the DOI.&lt;br /&gt;
* If no PMID is available, include the DOI instead.&lt;br /&gt;
* Journal names should be spelled out in full unless the journal is widely recognized by its acronym (e.g., &#039;&#039;PLoS&#039;&#039;).&lt;br /&gt;
* Use full publication dates when available (e.g., 2025 Mar 28); if only the year is known, include the year alone.&lt;br /&gt;
* Include all author names in the order listed in the publication.&lt;br /&gt;
&amp;lt;h2&amp;gt;HIVE Platform Publications&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Please cite use of HIVE with&amp;lt;/p&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. [https://www.ncbi.nlm.nih.gov/pubmed/25271953 PMID: 25271953]&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Simonyan V, Chumakov K, Dingerdissen H, et al. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis. Database (Oxford). 2016; 2016:baw022. [https://www.ncbi.nlm.nih.gov/pubmed/26989153 PMID: 26989153]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;HIVE Team Publications&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mazumder R, Keeney J, Johnson L, Krammer L, McNeely P, Sepulveda J, Hangen D, Martin M, Jyothi D, De Almeida J, McGarvey P, Alaoui A, Cha S, Sedrakyan A, Shoelle E, Matheny M, LeNoue-Newton M, Winter R, Deppen S, Simonyan V, Horvath A. From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research. J Am Med Inform Assoc. 2026 Jan 20:ocag001. Epub ahead of print. [https://pubmed.ncbi.nlm.nih.gov/41556955/ PMID: 41556955].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Krammer L, McNeely PM, Bhuiyan U, Singleton SS, Arethiya N, Argaw A, Aggarwal V, Basuchoudhary A, Mazumder M, David J, Agrawal S, Sen S, Mazumder R. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. Network and Systems Medicine&#039;&#039;.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kahsay R, Bhuiyan U, Au CCH, Edwards N, Johnson L, Kulkarni S, Martinez K, Ranzinger R, Vijay-Shanker K, Vora J, Warner K, Tiemeyer M, Mazumder R. GlycoSiteMiner: an ML/AI-assisted literature mining-based pipeline for extracting glycosylation sites from PubMed abstracts. Glycobiology. 2025 May 22. [https://pubmed.ncbi.nlm.nih.gov/40401984/ PMID: 40401984].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Aoki-Kinoshita KF, Lisacek F, Mazumder R, Ranzinger R, Tiemeyer M, Yamada I, Packer NH. Meeting report of the GlySpace alliance and GaLSIC symposium. Glycobiology. 2025 Mar 28:cwaf019. [https://pubmed.ncbi.nlm.nih.gov/40156285/ PMID: 40156285].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Clarke DJB, Evangelista JE, Xie Z, Marino GB, Byrd AI, Maurya MR, Srinivasan S, Yu K, Petrosyan V, Roth ME, Milinkov M, King CH, Vora JK, Keeney J, Nemarich C, Khan W, Lachmann A, Ahmed N, Agris A, Pan J, Ramachandran S, Fahy E, Esquivel E, Mihajlovic A, Jevtic B, Milinovic V, Kim S, McNeely P, Wang T, Wenger E, Brown MA, Sickler A, Zhu Y, Jenkins SL, Blood PD, Taylor DM, Resnick AC, Mazumder R, Milosavljevic A, Subramaniam S, Ma&#039;ayan A. Playbook workflow builder: Interactive construction of bioinformatics workflows. PLoS Comput Biol. 2025 Apr 3;21(4):e1012901. [https://pubmed.ncbi.nlm.nih.gov/40179105/ PMID: 40179105].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Keeney &#039;&#039;et al&#039;&#039;. Olduvai domain expression downregulates mitochondrial pathways: implications for human brain evolution and neoteny. October 22, 2024. bioRxiv. https://doi.org/10.1101/2024.10.21.619278&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Martinez K, Agirre J, Akune Y, Aoki-Kinoshita KF, Arighi C, Axelsen KB, Bolton E, Bordeleau E, Edwards NJ, Fadda E, Feizi T, Hayes C, Ives CM, Joshi HJ, Krishna Prasad K, Kossida S, Lisacek F, Liu Y, Lütteke T, Ma J, Malik A, Martin M, Mehta AY, Neelamegham S, Panneerselvam K, Ranzinger R, Ricard-Blum S, Sanou G, Shanker V, Thomas PD, Tiemeyer M, Urban J, Vita R, Vora J, Yamamoto Y, Mazumder R. Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. Database (Oxford). 2024 Aug 13;2024:baae073. [https://pubmed.ncbi.nlm.nih.gov/39137905/ PMID: 39137905].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kim S, Mazumder R. Enhancing scientific reproducibility through automated BioCompute Object creation using Retrieval-Augmented Generation from publications. Computer Science,  Computation and Language. https://doi.org/10.48550/arXiv.2409.15076&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://www.ncbi.nlm.nih.gov/pubmed/38313584 PMID: 38313584].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Keeney JG, Gulzar N, Baker JB, Klempir O, Hannigan GD, Bitton DA, Maritz JM, King CHS 4th, Patel JA, Duncan P, Mazumder R. Communicating computational workflows in a regulatory environment. Drug Discov Today. 2024 Jan 12; 103884. [https://www.ncbi.nlm.nih.gov/pubmed/38219969 PMID: 38219969].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Sylvetsky AC, Clement RA, Stearrett N, Issa NT, Dore FJ, Mazumder R, King CH, Hubal MJ, Walter PJ, Cai H, Sen S, Rother KI, Crandall KA. Consumption of sucralose and acesulfame-potassium containing diet soda alters the relative abundance of microbial taxa at the species level: findings of two pilot studies. Appl Physiol Nutr Metab. 2024 Jan 1; 49(1):125-134. [https://www.ncbi.nlm.nih.gov/pubmed/37902107 PMID: 37902107].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Vora J, Navelkar R, Vijay-Shanker K, Edwards N, Martinez K, Ding X, Wang T, Su P, Ross K, Lisacek F, Hayes C, Kahsay R, Ranzinger R, Tiemeyer M, Mazumder R. The glycan structure dictionary-a dictionary describing commonly used glycan structure terms. Glycobiology. 2023 Feb 17; cwad014 [https://www.ncbi.nlm.nih.gov/pubmed/36799723 PMID: 36799723].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lisacek F, Tiemeyer M, Mazumder R, Aoki-Kinoshita KF. Worldwide Glycoscience Informatics Infrastructure: The GlySpace Alliance. JACS Au. eCollection 2023 Jan 23; [https://www.ncbi.nlm.nih.gov/pubmed/36711080 PMID: 36711080].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Datta Chaudhuri R, Datta R, Rana S, Kar A, Vinh Nguyen Lam P, Mazumder R, Mohanty S, Sarkar S. Cardiomyocyte-specific regression of nitrosative stress-mediated S-Nitrosylation of IKKγ alleviates pathological cardiac hypertrophy. Cell Signal. 2022 Oct; 98:110403 [https://www.ncbi.nlm.nih.gov/pubmed/35835332 PMID: 35835332].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dahlin M, Singleton SS, David JA, Basuchoudhary A, Wickström R, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumour necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. Cell Signal. 2022 ; eBioMedicine (part of The Lancet discovery science) [https://www.ncbi.nlm.nih.gov/pubmed/35598439 PMID: 35598439].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lyman DF, Bell A, Black A, Dingerdissen H, Cauley E, Gogate N, Liu D, Joseph A, Kahsay R, Crichton DJ, Mehta A, Mazumder R. Modeling and integration of N-glycan biomarkers in a comprehensive biomarker data model. Glycobiology. August 2022; [https://academic.oup.com/glycob/article/32/10/855/6655823?login=false 35925813].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Torcivia J, Abdilleh K, Seidl F, Shahzada O, Rodriguez R, Pot D, Mazumder R. Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers. Onco (Basel). June 2022; 2(2):129-144. [https://www.ncbi.nlm.nih.gov/pubmed/37841494 PMID: 37841494].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://doi.org/10.1016/j.ebiom.2022.104061 https://doi.org/10.1016/j.ebiom.2022.104061].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;King CH, Keeney J, Guimera N, Das S, Weber M, Fochtman B, Walderhaug MO, Talwar S, Patel JA, Mazumder R, Donaldson EF. Communicating regulatory high-throughput sequencing data using BioCompute Objects. Drug Discov Today. 2022 Jan 22; [https://www.ncbi.nlm.nih.gov/pubmed/35077912 PMID: 35077912].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wang Z, Hopson L, Singleton S, Yang X, Jogunoori W, Mazumder R, Obias V, Lin P, Nguyen BN, Yao M, Miller L, White J, Rao S, Mishra L. Mice with dysfunctional TGF-β signaling develop altered intestinal microbiome and colorectal cancer resistant to 5FU. Biochim Biophys Acta Mol Basis Dis. 2021 Oct 1; 1867(10):166179. [https://www.ncbi.nlm.nih.gov/pubmed/34082069 PMID: 34082069].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lyman D, Natale D, Schriml L, Anton K, Crichton DC, Mazumder R. Analysis of Biomarker Data Towards Development of a Molecular Biomarker Ontology. Proceedings of the International Conference on Biomedical Ontologies 2021 (ICBO 2021) co-located with the Workshop on Ontologies for the Behavioural and Social Sciences (OntoBess 2021) as part of the Bolzano Summer of Knowledge (BOSK 2021) Bozen-Bolzano, Italy. 2021 Sep 16-18; [https://ceur-ws.org/Vol-3073/paper13.pdf https://ceur-ws.org/Vol-3073/paper13.pdf].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Patel JA, Dean DA, King CH, Xiao N, Koc S, Minina E, Golikov A, Brooks P, Kahsay R, Navelkar R, Ray M, Roberson D, Armstrong C, Mazumder R, Keeney J. Bioinformatics tools developed to support BioCompute Objects. Database (Oxford). 2021 March 31; [https://www.ncbi.nlm.nih.gov/pubmed/33784373 PMID: 33784373].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Hora B, Gulzar N, Chen Y, Karagiannis K, Cai F, Su C, Smith K, Simonyan V, Shah SA, Ahmed M, Sanchez AM, Stone M, Cohen MS, Denny TN, Mazumder R, Gao F. Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing. mSphere. 2020 Oct 14; [https://www.ncbi.nlm.nih.gov/pubmed/33055255 PMID: 33055255].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://www.ncbi.nlm.nih.gov/pubmed/33814114 PMID: 33814114].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Torcivia J, Mazumder R. Scanning window analysis of non-coding regions within normal-tumor whole-genome sequence samples. Briefings in Bioinformatics. 2020 Sep 17; [https://www.ncbi.nlm.nih.gov/pubmed/32940334 PMID: 32940334].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Gogate N, Lyman D, Bell A, Cauley E, Crandall KA, Joseph A, Kahsay R, Natale DA, Schriml LM, Sen S, Mazumder R. COVID-19 biomarkers and their overlap with comorbidities in a disease biomarker data model. Brief Bioinform. 2021 May 20; bbab191. doi: 10.1093/bib/bbab191. [https://www.ncbi.nlm.nih.gov/pubmed/34015823 PMID: 34015823].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kahsay R, Vora J, Navelkar R, Mousavi R, Fochtman BC, Holmes X, Pattabiraman N, Ranzinger R, Mahadik R, Williamson T, Kulkarni S, Agarwal G, Martin M, Vasudev P, Garcia L, Edwards N, Zhang W, Natale DA, Ross K, Aoki-Kinoshita KF, Campbell MP, York WS, Mazumder R. GlyGen data model and processing workflow. Bioinformatics. 2020; [https://www.ncbi.nlm.nih.gov/pubmed/32324859 PMID: 32324859].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kurnat-Thoma E, Baranova A, Baird P, Brodsky E, Butte AJ, Cheema AK, Cheng F, Dutta S, Grant C, Giordano J, Maitland-van der Zee AH, Fridsma DB, Jarrin R, Kann MG, Keeney J, Loscalzo J, Madhavan G, Maron BA, McBride DK, McKean M, Mun SK, Palmer JC, Patel B, Parakh K, Pariser AR, Pristipino C, Radstake TRDJ, Rajasimha HK, Rouse WB, Rozman D, Saleh A, Schmidt HHHW, Schultz N, Sethi T, Silverman EK, Skopac J, Svab I, Trujillo S, Valentine JE, Verma D, West BJ, Vasudevan S. Recent Advances in Systems and Network Medicine: Meeting Report from the First International Conference in Systems and Network Medicine. Syst Med (New Rochelle). 2020; 3(1):22-35. [https://www.ncbi.nlm.nih.gov/pubmed/32226924 PMID: 32226924].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dingerdissen HM, Bastian F, Vijay-Shanker K, Robinson-Rechavi M, Bell A, Gogate N, Gupta S, Holmes E, Kahsay R, Keeney J, Kincaid H, King CH, Liu D, Crichton DJ, Mazumder R. OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data. JCO Clin Cancer Inform. 2020; 4:210-220. [https://www.ncbi.nlm.nih.gov/pubmed/32142370 PMID: 32142370].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Aoki-Kinoshita KF, Lisacek F, Mazumder R, York WS, Packer NH. The GlySpace Alliance: toward a collaborative global glycoinformatics community. Glycobiology. 2020; 30(2):70-71. [https://www.ncbi.nlm.nih.gov/pubmed/31573039 PMID: 31573039].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;York WS, Mazumder R, Ranzinger R, et al. GlyGen: Computational and Informatics Resources for Glycoscience. Glycobiology. 2019. https://doi.org/10.1093/glycob/cwz080 [https://www.ncbi.nlm.nih.gov/pubmed/31616925 PMID: 31616925].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://www.ncbi.nlm.nih.gov/pubmed/31509535 PMID: 31509535].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Fan Y, Hu Y, Yan C, Goldman R, Pan Y, Mazumder R, Dingerdissen H. Loss and gain of N-linked glycosylation sequons due to single-nucleotide variation in cancer. Scientific Reports. PLoS One. 2018; 8():4322. [https://www.ncbi.nlm.nih.gov/pubmed/29531238 PMID: 29531238].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Baekdoo Kim, Thahmina Ali, Changsu Dong, Carlos Lijeron, Raja Mazumder, Claudia Wultsch, and Konstantinos Krampis. miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines. Journal of Computational Biology. 2018. http://doi.org/10.1089/cmb.2018.0218&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Alterovitz G, Dean D A, Goble C, Crusoe M R, Soiland-Reyes S, Bell A, Hayes A, King, C H S, Taylor D, Johanson E, Thompson E E, Donaldson E, Morizono H, Tsang H S, Goecks J, Yao J, Almeida J S, Krampis K, Guo L, Walderhaug M, Walsh P, Kahsay R, Gottipati S, Bloom T, Lai Y, Simonyan V, Mazumder R. Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results. PLOS Biology; 16(12): e3000099. 2018. https://doi.org/10.1371/journal.pbio.3000099&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Hu Y, Dingerdissen H, Gupta S, Kahsay R, Shanker V, Wan Q, Yan C, Mazumder R. Identification of key differentially expressed MicroRNAs in cancer patients through pan-cancer analysis. Computers in Biology and Medicine 2018; vol: 103 pp: 183-197. [https://www.ncbi.nlm.nih.gov/pubmed/30384176 PMID: 30384176].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Dingerdissen H, Torcivia-Rodriguez J, Hu Y, Chang T-C, Mazumder R, Kahsay R. BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery. Nucleic Acids Research. 2017. [https://pubmed.ncbi.nlm.nih.gov/30053270/ PMID: 30053270].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Karagiannis K, Simonyan V, Chumakov K, Mazumder R. Separation and assembly of deep sequencing data into discrete sub-population genomes. Nucleic Acids Research. 45(19):10989-11003. 2017. [https://www.ncbi.nlm.nih.gov/pubmed/28977510 PMID: 28977510].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Chen J, Zaidi S, Rao S, Chen J-S, Phan L, Farci P, Su X, Shetty K, White J, Zamboni F, Wu X, Rashid A, Pattabiraman N, Mazumder R, Horvath A, Wu R-C, Li S, Xiao C, Deng C-X, Wheeler D A, Mishra B, Akbani R, Mishra L. Analysis of Genomes and Transcriptomes of Hepatocellular Carcinomas Identifies Mutations and Gene Expression Changes in the Transforming Growth Factor beta Pathway. Gastroenterology. 2017; S0016-5085(17)36144-9. [https://www.ncbi.nlm.nih.gov/pubmed/28918914 PMID: 28918914].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics. 2017; 18(1):391. [https://www.ncbi.nlm.nih.gov/pubmed/28865429 PMID: 28865429].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Gannavaram S, Torcivia J, Gasparyan L, Kaul A, Ismail N, Simonyan V, Nakhasi HL. Whole genome sequencing of live attenuated Leishmania donovani parasites reveals novel biomarkers of attenuation and enables product characterization. Sci Rep. 2017; 7(1):4718. [https://www.ncbi.nlm.nih.gov/pubmed/28680050 PMID: 28680050].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Simonyan V, Chumakov K, Donaldson E, Karagiannis K, Lam PV, Dingerdissen H, Voskanian A. HIVE-heptagon: A sensible variant-calling algorithm with post-alignment quality controls. Genomics. 2017; 109(3-4):131-140. [https://www.ncbi.nlm.nih.gov/pubmed/28188908 PMID: 28188908].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Pan Y, Yan C, Fan Y, Pan Q, Wan Q, Torcivia-Rodriquez J, Mazumder R. Distribution bias analysis of germline and somatic single-nucleotide variations that impact protein functional site and neighboring amino acids. Scientific Reports. 2017; 7:42169 [https://www.ncbi.nlm.nih.gov/pubmed/28176830 PMID: 28176830].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Gulzar N, Dingerdissen H, Yan C, Mazumder R. Impact of Nonsynonymous Single-Nucleotide Variations on Post-Translational Modification Sites in Human Proteins. Methods Mol Biol. 2017; 1558:159-190. [https://www.ncbi.nlm.nih.gov/pubmed/28150238 PMID: 28150238].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Simonyan V, Goecks J, Mazumder R. BioCompute objects - a step towards evaluation and validation of bio-medical scientific computations. PDA J Pharm Sci Technol. 2017; 71(2):136-146 [https://www.ncbi.nlm.nih.gov/pubmed/27974626 PMID: 27974626].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Yan C, Pattabiraman N, Goecks J, Lam P, Nayak A, Pan Y, Torcivia-Rodriquez J, Voskanian A, Wan Q, Mazumder R. Impact of germline and somatic missense variations on drug binding sites. Pharmacogenomics J. 2017; 17(2):128-136 [https://www.ncbi.nlm.nih.gov/pubmed/26810135 PMID: 26810135].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Novatt H, Theisen TC, Massie T, Simonyan V, Voskanian-Kordi A, Renn LA, Rabin RL. Distinct Patterns of Expression of Transcription Factors in Response to Interferon Beta and Interferon lambda-1. J Interferon Cytokine Res. 2016; 36(10):589-598 [https://www.ncbi.nlm.nih.gov/pubmed/27447339 PMID: 27447339].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Chen C, Huang H, Mazumder R, Natale DA, McGarvey PB, Zhang J, Poison SW, Wang Y, Wu CH, UniProt Consortium. Computational clustering for viral reference proteomes. Bioinformatics. 2016; 32(13):2041-3 [https://www.ncbi.nlm.nih.gov/pubmed/27153712 PMID: 27153712].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mahmood AS, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A text-mining system for mutation-disease association extraction. PLoS One. 2016; 11(4):e0152725 [https://www.ncbi.nlm.nih.gov/pubmed/27073839 PMID: 27073839].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Goldweber S, Theodore J, Torcivia-Rodriquez J, Simonyan V, Mazumder R. Pubcast and Genecast: Browsing and exploring publications and associated curated content in biology through mobile devices. IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(2):498-500 [https://www.ncbi.nlm.nih.gov/pubmed/28113865 PMID: 28113865].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Laassri M, Zagorodnyaya T, Plant EP, Petrovskaya S, Bidzhieva B, Ye Z, Simonyan V, Chumakov K. Deep Sequencing for Evaluation of Genetic Stability of Influenza A/California/07/2009 (H1N1) Vaccine Viruses. PLoS One. 2015; 10(9):e0138650. [https://www.ncbi.nlm.nih.gov/pubmed/26407068 PMID: 26407068].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Sauder CJ, Ngo L, Simonyan V, Cong Y, Zhang C, Link M, Malik T, Rubin SA. Generation and propagation of recombinant mumps viruses exhibiting an additional U residue in the homopolymeric U tract of the F gene-end signal. Virus Genes. 2015; 51(1):12-24. [https://www.ncbi.nlm.nih.gov/pubmed/25962759 PMID: 25962759].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D, Mitraka E, Pan Y, Smith KM, Srivastava S, Ward S, Yan C, Mazumder R. Generating a focused view of Disease Ontology cancer terms for pan-cancer data integration and analysis. Database (Oxford). 2015; 2015:bav032. [https://www.ncbi.nlm.nih.gov/pubmed/25841438 PMID: 25841438].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu T-J, Yang C, Zhang H, Mazumder R. BioXpress: An integrated RNA-seq derived gene expression database for pan-cancer analysis. Database (Oxford). 2015; 2015. pii: bav019 [https://www.ncbi.nlm.nih.gov/pubmed/25819073 PMID: 25819073].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Kumari P, Mazumder R, Simonyan V, Krampis K. Advantages of distributed and parallel algorithms that leverage Cloud Computing platforms for large-scale genome assembly. F1000Research. 2015; 4(20). [https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&amp;amp;context=smhs_biochem_facpubs https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&amp;amp;context=smhs_biochem_facpubs].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Abunimer A, Dingerdissen H, Torcivia-Rodriguez J, Vinh Nguyen Lam P, Mazumder R. Non-synonymous Single-Nucleotide Variations as Cardiovascular System Disease Biomarkers and Their Roles in Bridging Genomic and Proteomic Technologies. Biomarkers in Cardiovascular Disease. 2015. [https://link.springer.com/referenceworkentry/10.1007/978-94-007-7741-5_40-1 Springer Nature link].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Adhikari S, Chetram MA, Woodrick J, Mitra PS, Manthena PV, Khatkar P, Dakshanamurthy S, Dixon M, Karmahapatra SK, Nuthalapati NK, Gupta S, Narasimhan G, Mazumder R, Loffredo CA, Uren A, Roy R. Germ-line variants of human N-methylpurine DNA glycosylase show impaired DNA repair activity and facilitate 1,N6 ethenoadenine induced mutations. J Biol Chem. 2014; 290(8):4966-80. [https://www.ncbi.nlm.nih.gov/pubmed/25538240 PMID: 25538240].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wilson CA and Simonyan V. FDA&#039;s Activities Supporting Regulatory Application of &amp;quot;Next Gen&amp;quot; Sequencing Technologies. PDA J Pharm Sci Technol. 2014; 68(6):626-630. [https://www.ncbi.nlm.nih.gov/pubmed/25475637 PMID: 25475637].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014; 15(1):918. [https://www.ncbi.nlm.nih.gov/pubmed/25336203 PMID: 25336203].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Pan Y, Karagiannis K, Zhang H, Dingerdissen H, Shamsaddini A, Wan Q, Simonyan V, Mazumder R. Human germline and pan-cancer variomes and their distinct functional profiles. Nucleic Acids Research. 2014; 42(18):11570-88. [https://www.ncbi.nlm.nih.gov/pubmed/25232094 PMID: 25232094].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Nayak A, Pattabiraman N, Fadra N, Goldman R, Pond S, Mazumder R. Structure-function analysis of hepatitis C virus envelope glycoproteins E1 and E2. J Biomol Struct Dyn. 2014; 33(8):1682-94. [https://www.ncbi.nlm.nih.gov/pubmed/25245635 PMID: 25245635].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Faison WJ, Rostovtsev A, Castro-Nallar E, Crandall KA, Chumakov K, Simonyan V, Mazumder R. Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics. 2014; 104(1):1-7. [https://www.ncbi.nlm.nih.gov/pubmed/24930720 PMID: 24930720].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014; 9(6):e99033. [https://www.ncbi.nlm.nih.gov/pubmed/24918764 PMID: 24918764].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dingerdissen H, Weaver DS, Karp PD, Pan Y, Simonyan V, Mazumder R. A framework for application of metabolic modeling in yeast to predict the effects of nsSNV in human orthologs. Biol Direct. 2014; 9:9. [https://www.ncbi.nlm.nih.gov/pubmed/24894379 PMID: 24894379].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Bidzhieva B, Zagorodnyaya T, Karagiannis K, Simonyan V, Laassri M, Chumakov K. Deep sequencing approach for genetic stability evaluation of influenza A viruses. J Virol Methods. 2014; 199(68):75. [https://www.ncbi.nlm.nih.gov/pubmed/24406624 PMID: 24406624].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Abunimer A, Smith K, Wu T-J, Lam P, Simonyan V, Mazumder R. Single-nucleotide variations in cardiac arrhythmias: prospects for genomics and proteomics based variation detection. Genes. 2014; 5(2):254-69. [https://www.ncbi.nlm.nih.gov/pubmed/24705329 PMID: 24705329].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wu T-J, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R. A framework for organizing cancer related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). Database. 2014; 2014:bau022. [https://www.ncbi.nlm.nih.gov/pubmed/24667251 PMID: 24667251].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dabrazhynetskaya A, Soika V, Volokhov D, Simonyan V, Chizhikov V. Genome Sequence of Mycoplasma hyorhinis Strain DBS 1050. Genome Announce. 2014; 2(2):pii: e00127-14. [https://www.ncbi.nlm.nih.gov/pubmed/24604646 PMID: 24604646].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Cole C, Krampis K, Karagiannis K, Almeida J, Faison JW, Motwani M, Wan Q, Golikov A, Pan Y, Simonyan V, Mazumder R. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics. 2014; 15:28. [https://www.ncbi.nlm.nih.gov/pubmed/24467687 PMID: 24467687].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mudvari P, Kowsari K, Cole C, Mazumder R, Horvath A. Extraction of molecular features through exome to transcriptome alignment. J Metabol Sys Biol. 2013; 1(1):7. [https://www.ncbi.nlm.nih.gov/pubmed/24791251 PMID: 24791251].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Basuchoudhary A, Simonyan V, Mazumder R. Community annotation and the evolution of cooperation: How patience matters. Open Bioinformatics Journal. 2013; 7:9-18.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Karagiannis K, Simonyan V, Mazumder R. SNVDis: A Proteome-wide Analysis Service for Evaluating nsSNVs in Protein Functional Sites and Pathways. Genomics Proteomics Bioinformatics. 2013; 11(2):122-126. [https://www.ncbi.nlm.nih.gov/pubmed/23618375 PMID: 23618375].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lam PV, Goldman R, Karagiannis K, Narsule T, Simonyan V, Soika V, Mazumder R. Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes. Genomics Proteomics Bioinformatics. 2013; 11(2):96-104. [https://www.ncbi.nlm.nih.gov/pubmed/23459159 PMID: 23459159].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dingerdissen H, Motwani M, Karagiannis K, Simonyan V, Mazumder R. Proteome-wide analysis of nonsynonymous single-nucleotide variations in active sites of human proteins. FEBS J. 2013; 280(6):1542-1562. [https://www.ncbi.nlm.nih.gov/pubmed/23350563 PMID: 23350563].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Gaudet P, Arighi C, Bastian F, Bateman A, Blake JA, Cherry MJ, D&#039;Eustachio P, Finn R, Giglio M, Hirschman L, Kania R, Klimke W, Martin MJ, Karsch-Mizrachi I, Munoz-Torres M, Natale D, O&#039;Donovan C, Ouellette F, Pruitt KD, Robinson-Rechavi M, Sansone SA, Schofield P, Sutton G, Van Auken K, Vasudevan S, Wu C, Young J, Mazumder R. Recent advances in biocuration: meeting report from the Fifth International Biocuration Conference. Database (Oxford). 2012; 2012:bas036. [https://www.ncbi.nlm.nih.gov/pubmed/23110974 PMID: 23110974].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Volokhov DV, Simonyan V, Davidson MK, Chizhikov VE. RNA polymerase beta subunit (rpoB) gene and the 16S-23S rRNA intergenic transcribed spacer region (ITS) as complementary molecular markers in addition to the 16S rRNA gene phylogenetic analysis and identification of the species of the family Mycoplasmataceae. Mol Phylogenet Evol. 2012; 62(1):515-28. [https://www.ncbi.nlm.nih.gov/pubmed/22115576 PMID: 22115576].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mazumder R, Morampudi KS, Motwani M, Vasudevan S, Goldman R. Proteome-wide analysis of single-nucleotide variations in the N-glycosylation sequon of human genes. PLoS One. 2012; 7(5):e36212. [https://www.ncbi.nlm.nih.gov/pubmed/22586465 PMID: 22586465].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=GW-FEAST&amp;diff=1170</id>
		<title>GW-FEAST</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=GW-FEAST&amp;diff=1170"/>
		<updated>2026-03-09T13:50:15Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to GW-FEAST Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the GW-FEAST project.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&#039;&#039;This research was funded, in part, by the Advanced Research Projects Agency for Health (ARPA-H). The views and conclusions contained in this documentation are those of the authors and should not be interpreted as representing the oﬃcial policies, either expressed or implied, of the U.S. Government.&#039;&#039;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;About GW-FEAST&amp;lt;/h2&amp;gt;&lt;br /&gt;
Federated Ecosystems for Analytics and Standardized Technologies ([https://hivelab.biochemistry.gwu.edu/gw-feast FEAST]) is a cloud-based, agile bioinformatics and data analysis platform under development through the ARPA-H Biomedical Data Fabric (BDF) toolbox program. The project is led by [https://dnahive.com DNA-HIVE] and other funded collaborators include Cornell University, Vanderbilt University, Georgetown University, European Bioinformatics Institute, and Kaiser Permanente. Our team is responsible for the GW instance of FEAST (GW-FEAST) and for co-leading the project with DNA-HIVE. This project is part of the ARPA-H FEAST performer team initiative to create bridges across data silos and make health data more accessible and usable. &lt;br /&gt;
&lt;br /&gt;
Several hospitals and cancer centers will have a FEAST platform, which enables cross-site data analysis without the need to export or transform the data. Currently, large chunks of data are used by insurance companies, pharmaceutical companies, and others for research and development purposes. The FEAST platform, which is particularly strong with noisy, real-world data, aims to enable more precise data selection for research use while preserving patient privacy. When clinical data is submitted to the suite of tools, submission is handled via the HL7 FHIR protocol, ensuring only authorized parties ever have access to protected data. Models that provide update mechanisms such as online training will be updated appropriately without retaining any personally identifiable information (PII). Thus, these tools support federated data sets and training without ever retaining clinical PII within the system. All services are treated as independent microservices through containerization within docker containers. &lt;br /&gt;
&lt;br /&gt;
[https://drive.google.com/file/d/1iv9VmFhNbd-5iwSwDMLVumCFnFN84cl8/view?usp=drive_link FEAST Video]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Quick Links ==&lt;br /&gt;
&lt;br /&gt;
* [[GW-FEAST Data|GW-FEAST Data Sources]]&lt;br /&gt;
* [[GW-FEAST Data Access Portal]]&lt;br /&gt;
* [[GW-FEAST Data De-identification|GW-FEAST De-identification]]&lt;br /&gt;
* [[FEAST Knowledgebase APIs|FEASTKB APIs]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;GW-FEAST Project Architecture&amp;lt;/h2&amp;gt;[[File:GW-FEAST_architecture.png|none|thumb|658x658px]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;The GW-FEAST architecture diagram showcases the GW environment set up to facilitate FEAST queries through the GW node (or instance of FEAST at GW). While other consortium sites may have slightly different environment configurations, the overall structure and security practices will be similar across all sites. This diagram is subject to change throughout the life of the project.&#039;&#039;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
* Mazumder R, Keeney J, Johnson L, Krammer L, McNeely P, Sepulveda J, Hangen D, Martin M, Jyothi D, De Almeida J, McGarvey P, Alaoui A, Cha S, Sedrakyan A, Shoelle E, Matheny M, LeNoue-Newton M, Winter R, Deppen S, Simonyan V, Horvath A. From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research. J Am Med Inform Assoc. 2026 Jan 20:ocag001. Epub ahead of print. [https://pubmed.ncbi.nlm.nih.gov/41556955/ PMID: 41556955].&amp;lt;/div&amp;gt; &lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Collaborating Institutions ==&lt;br /&gt;
&lt;br /&gt;
DNA-HIVE (prime)  &lt;br /&gt;
&lt;br /&gt;
Department of Biochemistry and Molecular Medicine, The George Washington University (Co-PI)  &lt;br /&gt;
&lt;br /&gt;
Innovation Center for Biomedical Informatics, Georgetown University  &lt;br /&gt;
&lt;br /&gt;
Division of Cancer Epidemiology and Genetics, National Cancer Institute  &lt;br /&gt;
&lt;br /&gt;
Vanderbilt University Medical Center   &lt;br /&gt;
&lt;br /&gt;
Weill Cornell Medical College  &lt;br /&gt;
&lt;br /&gt;
European Bioinformatics Institute, European Molecular Biology Laboratory   &lt;br /&gt;
&lt;br /&gt;
Kaiser Permanente  &lt;br /&gt;
&amp;lt;/div&amp;gt; &lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=GW-FEAST&amp;diff=1169</id>
		<title>GW-FEAST</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=GW-FEAST&amp;diff=1169"/>
		<updated>2026-03-09T13:49:04Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;margin:0.4em; text-align:center;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:160%; padding:.1em;&amp;quot;&amp;gt;Welcome to GW-FEAST Wiki!&amp;lt;/div&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the GW-FEAST project.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&#039;&#039;This research was funded, in part, by the Advanced Research Projects Agency for Health (ARPA-H). The views and conclusions contained in this documentation are those of the authors and should not be interpreted as representing the oﬃcial policies, either expressed or implied, of the U.S. Government.&#039;&#039;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;About GW-FEAST&amp;lt;/h2&amp;gt;&lt;br /&gt;
Federated Ecosystems for Analytics and Standardized Technologies ([https://hivelab.biochemistry.gwu.edu/gw-feast FEAST]) is a cloud-based, agile bioinformatics and data analysis platform under development through the ARPA-H Biomedical Data Fabric (BDF) toolbox program. The project is led by [https://dnahive.com DNA-HIVE] and other funded collaborators include Cornell University, Vanderbilt University, Georgetown University, European Bioinformatics Institute, and Kaiser Permanente. Our team is responsible for the GW instance of FEAST (GW-FEAST) and for co-leading the project with DNA-HIVE. This project is part of the ARPA-H FEAST performer team initiative to create bridges across data silos and make health data more accessible and usable. &lt;br /&gt;
&lt;br /&gt;
Several hospitals and cancer centers will have a FEAST platform, which enables cross-site data analysis without the need to export or transform the data. Currently, large chunks of data are used by insurance companies, pharmaceutical companies, and others for research and development purposes. The FEAST platform, which is particularly strong with noisy, real-world data, aims to enable more precise data selection for research use while preserving patient privacy. When clinical data is submitted to the suite of tools, submission is handled via the HL7 FHIR protocol, ensuring only authorized parties ever have access to protected data. Models that provide update mechanisms such as online training will be updated appropriately without retaining any personally identifiable information (PII). Thus, these tools support federated data sets and training without ever retaining clinical PII within the system. All services are treated as independent microservices through containerization within docker containers. &lt;br /&gt;
&lt;br /&gt;
[https://drive.google.com/file/d/1iv9VmFhNbd-5iwSwDMLVumCFnFN84cl8/view?usp=drive_link FEAST Video]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Quick Links ==&lt;br /&gt;
&lt;br /&gt;
* [[GW-FEAST Data|GW-FEAST Data Sources]]&lt;br /&gt;
* [[GW-FEAST Data Access Portal]]&lt;br /&gt;
* [[GW-FEAST Data De-identification|GW-FEAST De-identification]]&lt;br /&gt;
* [[FEAST Knowledgebase APIs|FEASTKB APIs]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h2&amp;gt;GW-FEAST Project Architecture&amp;lt;/h2&amp;gt;[[File:GW-FEAST_architecture.png|none|thumb|658x658px]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;The GW-FEAST architecture diagram showcases the GW environment set up to facilitate FEAST queries through the GW node (or instance of FEAST at GW). While other consortium sites may have slightly different environment configurations, the overall structure and security practices will be similar across all sites. This diagram is subject to change throughout the life of the project.&#039;&#039;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
Mazumder R, Keeney J, Johnson L, Krammer L, McNeely P, Sepulveda J, Hangen D, Martin M, Jyothi D, De Almeida J, McGarvey P, Alaoui A, Cha S, Sedrakyan A, Shoelle E, Matheny M, LeNoue-Newton M, Winter R, Deppen S, Simonyan V, Horvath A. From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research. J Am Med Inform Assoc. 2026 Jan 20:ocag001. doi: 10.1093/jamia/ocag001. Epub ahead of print. PMID: 41556955.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt; &lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
== Collaborating Institutions ==&lt;br /&gt;
&lt;br /&gt;
DNA-HIVE (prime)  &lt;br /&gt;
&lt;br /&gt;
Department of Biochemistry and Molecular Medicine, The George Washington University (Co-PI)  &lt;br /&gt;
&lt;br /&gt;
Innovation Center for Biomedical Informatics, Georgetown University  &lt;br /&gt;
&lt;br /&gt;
Division of Cancer Epidemiology and Genetics, National Cancer Institute  &lt;br /&gt;
&lt;br /&gt;
Vanderbilt University Medical Center   &lt;br /&gt;
&lt;br /&gt;
Weill Cornell Medical College  &lt;br /&gt;
&lt;br /&gt;
European Bioinformatics Institute, European Molecular Biology Laboratory   &lt;br /&gt;
&lt;br /&gt;
Kaiser Permanente  &lt;br /&gt;
&amp;lt;/div&amp;gt; &lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_ML_Pipeline_Tutorial&amp;diff=1167</id>
		<title>PredictMod ML Pipeline Tutorial</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=PredictMod_ML_Pipeline_Tutorial&amp;diff=1167"/>
		<updated>2026-02-24T16:42:18Z</updated>

		<summary type="html">&lt;p&gt;Lorikrammer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;Go Back to [[Modeling Tutorials]].&amp;lt;/small&amp;gt;&amp;lt;h1&amp;gt;Integration of a Machine Learning-based Approach for Predictive Clinical Decision-making using Python&amp;lt;/h1&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h2&amp;gt;Summary&amp;lt;/h2&amp;gt;&lt;br /&gt;
    &amp;lt;h3&amp;gt;Part I. Machine Learning Using Python&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;What is Python?&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Objectives&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Methodology of the Machine Learning Algorithms&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Software Installation&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Downloading the Input Files for Synthetic Data Generation&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Downloading the Input Files for Model Training&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
    &lt;br /&gt;
    &amp;lt;h3&amp;gt;Part II. Using Python scripts to detect signal difference in the Electronic Healthcare Records of responsive and unresponsive patients&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Process&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Interpreting the Results&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Further Analysis and Next Steps&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h2&amp;gt;Part I. Machine Learning Using Python&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;1. What is Python?&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;p&amp;gt;Python is a versatile programming language that supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It is widely used for tasks such as data manipulation, web development, scientific computing, and automation. Python’s extensive standard library and external packages make it particularly useful for data analysis, machine learning, and visualization. Through libraries like NumPy, pandas, matplotlib, and scikit-learn, Python excels at handling large datasets, building models, and visualizing results. Additionally, Python can easily interface with programs written in other languages and supports the integration of a wide range of toolkits to extend its functionality.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;2. Objectives&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;p&amp;gt;The general purpose of this protocol is to provide proof-of-concept through a Python workflow for creating predictive machine learning models using some form of data. In this tutorial, patient data will serve as an example input into the system while the output will determine whether the patient is a responder or a non-responder to the treatment assigned. The concepts in this tutorial will be applicable to most binary classification datasets for future model development.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;Two major machine learning concepts will be applied to this system as follows:&amp;lt;/p&amp;gt;&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Create a synthetic data set to be used as an input to a machine learning model to ensure consistency during the model training steps&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Input patient data through a series of machine learning classification models to predict whether or not a treatment is effective before dietary or medical intervention (e.g. responder vs. non-responder)&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;This tutorial utilizes patient data provided by &amp;lt;a href=&amp;quot;https://synthea.mitre.org/downloads&amp;quot;&amp;gt;Synthea&amp;lt;/a&amp;gt;. The process of how Synthea data was retrieved and filtered using MATLAB can be found in this &amp;lt;a href=&amp;quot;https://docs.google.com/document/d/1yfUjoaU0lfTx8blTCgZehR7Qdn0C0iTU3VTPAag9ITI/edit?usp=sharing&amp;quot;&amp;gt;link&amp;lt;/a&amp;gt;. This tutorial has its own retrieve and filter process written in Python that we will use. If interested, the full synthetic generation process can be found at this &amp;lt;a href=&amp;quot;https://github.com/GW-HIVE/PredictMod/tree/main/flask_backend/models/Diabetes_EHR_v1&amp;quot;&amp;gt;link&amp;lt;/a&amp;gt;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;3. Methodology of the Machine Learning Algorithms&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;p&amp;gt;1. Generating Synthetic Data: In order to generate synthetic data, the covariance, standard deviation, and mean are calculated for each variable (BMI, glucose etc.) from the patient dataset. The algorithm will also designate whether the variable is continuous or discrete. A “noise” data set is then generated based on these statistical calculations. This “noise” data is then refined when two classifier neural networks compete to label the data appropriately based on the training set. Once the synthetic data is labeled appropriately, it is stored in a matrix and this process is repeated until a sufficient number of values are generated. This algorithm is similar to a Generational Adversarial Network (GAN). More information regarding how GANs work can be found through the following &amp;lt;a href=&amp;quot;https://en.wikipedia.org/wiki/Generative_adversarial_network&amp;quot;&amp;gt;link&amp;lt;/a&amp;gt;. The purpose of the synthetic data generation step in this tutorial is to ensure that the multiple models we will apply to the dataset can handle the input data. Some traditional machine learning models cannot handle NAN/NULL values due to the mathematical operations involved in the process. Languages such as MATLAB that have built-in toolkits that account for this can take years to develop. Python does not have a library capable of this, so to avoid this issue, we can generate synthetic data that avoids this issue.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;2. Classifying Training Data: A classification system utilizes a neural network or decision tree to create a binary classifier that can predict whether or not a patient (Row of data from the dataset) will be responsive (Label) to the standard Type II diabetes intervention plan. This plan involves non-invasive lifestyle changes such as diet and exercise. There are two identifiable classes for pre-diabetic individuals who follow the intervention plan: responders and non-responders. Responders are individuals who remain at prediabetic levels or return to normal levels while non-responders are individuals who develop diabetic levels after following the intervention plan. The machine learning algorithm is provided with a fraction of the original patient dataset, known as the “training set”, and trains the model to then be able to predict new patient data (“test set”) without knowing its label.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;4. Software Installation&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;p&amp;gt;Software Installation Requirements for running Machine Learning Algorithms. This guide provides step-by-step instructions on how to set up your environment for running the machine learning algorithm. Follow the instructions below to ensure that the necessary libraries and software are installed correctly.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;NOTE:&amp;lt;/strong&amp;gt; There may be new versions of the software and libraries from when this tutorial was written. If you run into any issues with functions or methods being unusable, troubleshoot on forums such as stackoverflow or use older versions of software.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Install Python: First, ensure that Python is installed on your system. This tutorial uses Python 3.11. You can download Python from the official &amp;lt;a href=&amp;quot;https://www.python.org/downloads/&amp;quot;&amp;gt;website&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Install VScode from the official &amp;lt;a href=&amp;quot;https://code.visualstudio.com/download&amp;quot;&amp;gt;website&amp;lt;/a&amp;gt;. Ensure you are using the correct version based on your Operating System (O.S).&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Configure VSCode to be able to run Python following the instructions available on the official website: &amp;lt;a href=&amp;quot;https://code.visualstudio.com/docs/python/python-tutorial&amp;quot;&amp;gt;https://code.visualstudio.com/docs/python/python-tutorial&amp;lt;/a&amp;gt;. It is highly recommended to include Pylance: an extension that works alongside Python in Visual Studio Code to provide performant language support. Pylance can be added by opening Visual Studio Code, clicking on the extensions on the left-hand side and search for Pylance, and installing it.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Pip Install Python libraries required for this tutorial. PIP is the package installer for Python for libraries not included in the default python package. Run the following commands:&lt;br /&gt;
            &amp;lt;pre&amp;gt;pip install tensorflow imageio matplotlib numpy pandas scikit-learn&amp;lt;/pre&amp;gt;&#039;&#039;If for any reason Pylance cannot resolve an import of one of the required libraries, check which libraries you have installed by typing pip list in Visual Studio Code&#039;&#039; &amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;5. Downloading the Input Files for Synthetic Data Generation&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;p&amp;gt;Download the required material for this tutorial from this &amp;lt;a href=&amp;quot;https://drive.google.com/drive/folders/1U-TIZe-Iqmziijiiw-1VHZNaGhIXUerQ?usp=drive_link&amp;quot;&amp;gt;link&amp;lt;/a&amp;gt;. The Synthetic Generation folder contains the Python file and input excel files required for generating the dataset that we will use.&amp;lt;/p&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Python Project Materials List:&amp;lt;/strong&amp;gt;&lt;br /&gt;
            &amp;lt;ul&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Synthetic_EHR_data_diabetes.py:&amp;lt;/strong&amp;gt; A Python script that generates synthetic Electronic Health Record (EHR) data for diabetes-related studies, using GAN techniques&amp;lt;/li&amp;gt;&lt;br /&gt;
            &amp;lt;/ul&amp;gt;&lt;br /&gt;
        &amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Excel Data Files:&amp;lt;/strong&amp;gt;&lt;br /&gt;
            &amp;lt;ul&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;label_non_responsive.xlsx:&amp;lt;/strong&amp;gt; Contains labels for non-responsive patient data&amp;lt;/li&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;label_responsive.xlsx:&amp;lt;/strong&amp;gt; Contains labels for responsive patient data&amp;lt;/li&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;data_non_responsive.xlsx:&amp;lt;/strong&amp;gt; Contains observational data related to patients non-responsive to the treatment&amp;lt;/li&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;data_responsive.xlsx:&amp;lt;/strong&amp;gt; Contains observational data related to patients non-responsive to the treatment&amp;lt;/li&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;var_list_.xlsx:&amp;lt;/strong&amp;gt; A list of variables or features that are present in the dataset. This can be used to identify key variables of interest during analysis.&amp;lt;/li&amp;gt;&lt;br /&gt;
            &amp;lt;/ul&amp;gt;&lt;br /&gt;
        &amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Documentation:&amp;lt;/strong&amp;gt;&lt;br /&gt;
            &amp;lt;ul&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;README.md:&amp;lt;/strong&amp;gt; A markdown file providing an overview of the project, explaining the purpose of the scripts, and instructions on how to use the code and data.&amp;lt;/li&amp;gt;&lt;br /&gt;
            &amp;lt;/ul&amp;gt;&lt;br /&gt;
        &amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;6. Downloading the Input Files for Model Training&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;p&amp;gt;Download the required material for this tutorial from this link. The model training folder contains the Python file and input excel files required for testing multiple models and providing performance metrics.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h2&amp;gt;Part II. Using Python scripts to detect signal difference in the Electronic Healthcare Records of responsive and unresponsive patients&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;Process&amp;lt;/h3&amp;gt;&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Create GAN model that will take input data and generate synthetic data within the same distribution of the initial data but without NAN/NULL values to avoid errors in later steps.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Run multi-model analysis python script that tests a wide variety of machine learning models and outputs the accuracy and RMSE (Root Mean Squared Error) for each.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Step 1: Creating Synthetic Data&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Open Visual Studio Code (VSCode)&amp;lt;/strong&amp;gt; and go to the top-left corner, click on &amp;lt;strong&amp;gt;File → Open Folder&amp;lt;/strong&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Navigate to the folder where you’ve downloaded the &amp;lt;strong&amp;gt;Synthetic_EHR_data_diabetes.py&amp;lt;/strong&amp;gt; script.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Once the folder is open, you should see the file explorer in VSCode.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Double-click on the &amp;lt;strong&amp;gt;Synthetic_EHR_data_diabetes.py&amp;lt;/strong&amp;gt; file to open it in the center of the page.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;In the top-right of VSCode, click the &amp;lt;strong&amp;gt;Run Python&amp;lt;/strong&amp;gt; button to execute the script.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Ensure that you are generating synthetic data for both the responsive and non-responsive datasets. You can achieve this by adjusting the files the script is reading:&lt;br /&gt;
            &amp;lt;ul&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;Change &amp;lt;code&amp;gt;data_responsive.xlsx&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;data_non_responsive.xlsx&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;Do the same for &amp;lt;code&amp;gt;label_responsive.xlsx&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;label_non_responsive.xlsx&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
            &amp;lt;/ul&amp;gt;&lt;br /&gt;
        &amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Adjust the output file name at line 167 of the script to differentiate between the responsive and non-responsive datasets:&lt;br /&gt;
            &amp;lt;ul&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;For the &amp;lt;strong&amp;gt;responsive dataset&amp;lt;/strong&amp;gt;: &amp;lt;code&amp;gt;df.to_excel(&#039;EHR_responsive_at_epoch_{:04d}.xlsx&#039;.format(epoch))&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;For the &amp;lt;strong&amp;gt;non-responsive dataset&amp;lt;/strong&amp;gt;: &amp;lt;code&amp;gt;df.to_excel(&#039;EHR_non_responsive_at_epoch_{:04d}.xlsx&#039;.format(epoch))&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
            &amp;lt;/ul&amp;gt;&lt;br /&gt;
        &amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;After running the script, concatenate the two synthetic files (responsive and non-responsive) and make sure the &amp;lt;strong&amp;gt;response column&amp;lt;/strong&amp;gt; is populated correctly:&lt;br /&gt;
            &amp;lt;ul&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;code&amp;gt;1&amp;lt;/code&amp;gt; for the responsive dataset.&amp;lt;/li&amp;gt;&lt;br /&gt;
                &amp;lt;li&amp;gt;&amp;lt;code&amp;gt;0&amp;lt;/code&amp;gt; for the non-responsive dataset.&amp;lt;/li&amp;gt;&lt;br /&gt;
            &amp;lt;/ul&amp;gt;&lt;br /&gt;
        &amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;Step 2: Running the Multi-Model Analysis&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;The multi-model analysis consists of the following steps:&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Load the dataset&amp;lt;/strong&amp;gt; and extract &amp;lt;strong&amp;gt;X (features)&amp;lt;/strong&amp;gt; and &amp;lt;strong&amp;gt;y (labels)&amp;lt;/strong&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Split the data&amp;lt;/strong&amp;gt; into training and testing sets.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Define a list of machine learning models to test.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Train each model and evaluate its performance using accuracy and RMSE.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;Print the results for each model.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;Once you run the script, the performance metrics (accuracy and RMSE) will be printed in the terminal of VSCode. These metrics will provide a baseline to identify the most effective model for further analysis, including parameter and hyperparameter tuning.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;Interpreting the Results:&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;1. Logistic Regression:&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; ~89.5%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; Moderate&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; 21 incorrect predictions (11 false positives and 10 false negatives).&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Consider feature scaling, regularization, and hyperparameter tuning.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;2. Decision Tree:&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; 99%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; Low&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; 2 false negatives.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Prune the tree to avoid overfitting. Hyperparameter tuning and ensemble techniques like boosting or bagging could help further.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;3. Random Forest:&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; 98%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; Low&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; 4 samples misclassified.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Hyperparameter tuning (number of trees, maximum depth, etc.) and feature importance analysis.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;4. Gradient Boosting:&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; 99.5%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; Lowest among the models.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; 1 false negative.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Tune learning rate, number of boosting stages, and depth. Dimensionality reduction may also help.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;5. K-Nearest Neighbors (KNN):&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; 94.5%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; 0.2345 (slightly higher compared to tree-based methods).&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; 11 false negatives.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Scale the data, tune the number of neighbors (k), and explore distance metrics.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;6. Support Vector Machine (SVM):&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; 81.5%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; 0.4301 (highest RMSE).&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; Significant number of false negatives (37).&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Tune kernel type, C parameter, and gamma. Try dimensionality reduction techniques.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;7. Extra Trees:&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; 97.5%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; 0.1581.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; 5 false negatives.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Hyperparameter tuning (number of trees, depth), ensemble techniques, or feature selection.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h4&amp;gt;8. AdaBoost:&amp;lt;/h4&amp;gt;&lt;br /&gt;
    &amp;lt;ul&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Accuracy:&amp;lt;/strong&amp;gt; 98.5%&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;RMSE:&amp;lt;/strong&amp;gt; 0.1225 (low RMSE).&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Misclassification:&amp;lt;/strong&amp;gt; Only 3 misclassified samples.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Next Steps:&amp;lt;/strong&amp;gt; Tune learning rate and number of estimators. Consider ensemble techniques or cross-validation.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;h3&amp;gt;Further Analysis and Next Steps:&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;While most models performed exceptionally well, additional techniques could further enhance results:&amp;lt;/p&amp;gt;&lt;br /&gt;
    &amp;lt;ol&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Hyperparameter Tuning:&amp;lt;/strong&amp;gt; Fine-tuning model parameters using techniques like &amp;lt;em&amp;gt;Grid Search&amp;lt;/em&amp;gt; or &amp;lt;em&amp;gt;Random Search&amp;lt;/em&amp;gt; will likely yield improvements. Parameters like learning rate, depth, number of estimators, and regularization strength could be optimized.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Cross-Validation:&amp;lt;/strong&amp;gt; Apply k-fold cross-validation for a more reliable estimate of model performance and to avoid overfitting on the test set.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Feature Selection and Dimensionality Reduction:&amp;lt;/strong&amp;gt; Implement Principal Component Analysis (PCA) or feature selection methods to reduce noise, improve computation efficiency, and enhance predictive power, especially for models like &amp;lt;strong&amp;gt;KNN&amp;lt;/strong&amp;gt; and &amp;lt;strong&amp;gt;SVM&amp;lt;/strong&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
        &amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Handling Class Imbalance:&amp;lt;/strong&amp;gt; Techniques like &amp;lt;em&amp;gt;SMOTE&amp;lt;/em&amp;gt; or class weighting could be useful if class imbalance is present in the dataset.&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
    &amp;lt;p&amp;gt;By applying these further techniques, we can continue to refine the model performance, increase predictive accuracy, and reduce error metrics across the board.&amp;lt;/p&amp;gt;&lt;/div&gt;</summary>
		<author><name>Lorikrammer</name></author>
	</entry>
</feed>