<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://hivelab.biochemistry.gwu.edu/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Hivelabwikiadmin</id>
	<title>HIVE Lab - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://hivelab.biochemistry.gwu.edu/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Hivelabwikiadmin"/>
	<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/Special:Contributions/Hivelabwikiadmin"/>
	<updated>2026-06-01T18:10:55Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.42.1</generator>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1302</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1302"/>
		<updated>2026-05-29T14:06:21Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date June 1, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Sujeet Kulkarni &lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Neha Rao&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Navya Sinha&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|John McCaffrey*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1301</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1301"/>
		<updated>2026-05-29T14:05:48Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date June 1, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Rene Ranzinger &lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan, Sujeet Kulkarni &lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Neha Rao&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Navya Sinha&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|John McCaffrey&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Kate Warner, Robel Kahsay &lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;†&amp;lt;/sup&amp;gt;Masters Degree Student&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sup&amp;gt;‡&amp;lt;/sup&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1293</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1293"/>
		<updated>2026-05-26T14:25:56Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: Reverted edit by Maria.kim (talk) to last revision by Urnisha.bhuiyan&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Swapnaneel Chatterjee&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|New Project: GlycoChatbot Project&lt;br /&gt;
|-&lt;br /&gt;
|Caleb Hailu&lt;br /&gt;
|Pending&lt;br /&gt;
|Pending&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Pat McNeely&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Cynthia Li&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran**&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Cyrus Au Yeung, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Dia Jhaveri&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, BCO, PredictMod, GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Arthur Issler&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim, Jeet Vora&lt;br /&gt;
|BiomarkerKB, GlycoSiteMiner, GlyGen&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1284</id>
		<title>Volunteership Summer 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Summer_2026&amp;diff=1284"/>
		<updated>2026-05-19T20:30:32Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Dates */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2026 Summer Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
May 15, 2026 | 12:00 PM ET&lt;br /&gt;
&lt;br /&gt;
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Date TBD | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: June 1, 2026 –  July 31, 2026&#039;&#039;&#039; (9 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Spring 2026|Spring 2026 Volunteership]]&lt;br /&gt;
&lt;br /&gt;
Presentation slides from the Spring 2026 volunteership symposium are publicly available on [https://zenodo.org/records/20072087 Zenodo] to highlight student research contributions from the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 20 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Volunteers should be responsive to email/slack communications. &lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Volunteers are expected to attend volunteership events such as a symposium.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
# This volunteership does not allow for vacation time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; &#039;&#039;&#039;If the scrum is not updated for 2 consecutive working days,&#039;&#039;&#039; &#039;&#039;&#039;the candidate will be automatically dropped from the program.&#039;&#039;&#039;&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteership Support ===&lt;br /&gt;
Each group has dedicated Points of Contact (PoCs) who are your main resource for questions and guidance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to Get Help&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Slack Group Channel&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Use your group Slack channel as the primary place to ask questions and share ideas. This is strongly encouraged so everyone can learn together. Direct messages to PoCs are discouraged.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Office Hours&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
PoCs will host group office hours every two weeks once the program begins. These sessions are a space to ask questions, discuss ideas, and collaborate live.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;u&amp;gt;How to get support&amp;lt;/u&amp;gt;&lt;br /&gt;
&lt;br /&gt;
- Use the Slack channel as your first point of contact (if you are not yet in the Slack channel, then email your PoC at mazumder_lab AT gwu.edu)&lt;br /&gt;
&lt;br /&gt;
- Follow up with your PoCs in the group channel&lt;br /&gt;
&lt;br /&gt;
- Come prepared with questions for office hours&lt;br /&gt;
&lt;br /&gt;
- Participate in discussions and support your peers&lt;br /&gt;
&lt;br /&gt;
Our goal is to create an open, collaborative environment where everyone can learn and contribute.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email &#039;&#039;mazumder_lab@gwu.edu&#039;&#039; your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. GlycoSiteMiner Curation Project ====&lt;br /&gt;
POC: Kate Warner  &lt;br /&gt;
&lt;br /&gt;
GlycoSiteMiner (PMID: [https://pubmed.ncbi.nlm.nih.gov/40401984/ 40401984]) is a large language model (LLM)-based tool developed by the GlyGen team to automate a literature-mining pipeline that extracts experimentally validated, protein sequence–specific glycosylation sites from PubMed abstracts. By leveraging natural language processing, GlycoSiteMiner accelerates the identification of glycosylation evidence that would otherwise require extensive manual review.&lt;br /&gt;
&lt;br /&gt;
The objective of this project is to validate these text-mined entries and curate them into structured datasets using GlyTableMaker (https://glygen.ccrc.uga.edu/tablemaker), a companion tool designed to support the deposition of glycans and glycoproteins, assignment of standardized metadata, and generation of high-quality Excel/CSV tables. This process ensures that extracted information is accurate, consistent, and suitable for integration into GlyGen’s knowledgebase.&lt;br /&gt;
&lt;br /&gt;
This opportunity provides hands-on experience in biocuration workflows, including data validation, standardization, and quality control. Participants will deepen their understanding of glycobiology concepts, gain practical experience working with biological databases, and develop skills in evaluating and refining LLM-generated outputs for scientific applications.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.&lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
Two projects:&lt;br /&gt;
&lt;br /&gt;
# Taking predicted sites and curating them using table maker&lt;br /&gt;
# Website testing (all volunteers)&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).&lt;br /&gt;
&lt;br /&gt;
==== 3. GlyGen Publication Analysis Project ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our [[Recommended Publications for Intervention Outcome Prediction Models|Recommended Publications for IOPMs]] page). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets &amp;amp; trained model scripts pushed to GitHub&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 5. BioCompute Objects User Research Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with the project include:&lt;br /&gt;
&lt;br /&gt;
# Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.&lt;br /&gt;
# Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.&lt;br /&gt;
# Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.&lt;br /&gt;
&lt;br /&gt;
Deliverables will include:&lt;br /&gt;
&lt;br /&gt;
# User research report with user story maps&lt;br /&gt;
# BCO documentation improvement plan&lt;br /&gt;
# Volunteership documentation (final report, progress updates, symposium presentation)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
==== 6. BiomarkerKB Biocuration Project ====&lt;br /&gt;
POC: Jeet Vora (primary), Maria Kim, Cyrus Au-Yeung&lt;br /&gt;
&lt;br /&gt;
[https://biomarkerkb.org/about/ BiomarkerKB] is a biomedical knowledgebase project focused on harmonizing and structuring biomarker knowledge from scientific literature and public resources. We are currently recruiting individuals with experience working with LLMs (e.g. Claude, ChatGPT) to support the following tasks:&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;&#039;Validation of existing published biomarkers from scientific literature (JV, MK, CA)&#039;&#039;&#039;&lt;br /&gt;
#* Review and validate previously reported biomarkers by checking the original literature, confirming evidence support, and standardizing biomarker annotations&lt;br /&gt;
#* Assess the evidence strength of biomarkers and identify additional literature to strengthen the support for biomarker claims&lt;br /&gt;
# &#039;&#039;&#039;Curation of novel biomarkers from scientific literature (MK)&#039;&#039;&#039;&lt;br /&gt;
#* Curate high-quality biomarkers for a selected disease area, organize the findings into a structured dataset&lt;br /&gt;
#* Standardize biomarker representations using controlled vocabularies and ontologies and classify biomarkers by their biomarker types&lt;br /&gt;
#* Construct and test-query a disease-specific biomarker knowledge graph (optional)&lt;br /&gt;
# &#039;&#039;&#039;Electronic Health Records Normal Entity Data Integration (JV)&#039;&#039;&#039;&lt;br /&gt;
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures)&lt;br /&gt;
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)&lt;br /&gt;
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology&lt;br /&gt;
# &#039;&#039;&#039;Front-end testing for BiomarkerKB.org (MK, JV)&#039;&#039;&#039;&lt;br /&gt;
#* Test the BiomarkerKB web interface for functionality and data presentation, and document issues / improvement suggestions for the development team&lt;br /&gt;
# &#039;&#039;&#039;Benchmarking and LLM-based biomarker extraction (optional*) (CA)&#039;&#039;&#039;&lt;br /&gt;
#* Construct manually curated biomarker reference sets in the glycobiology domain to support benchmarking of LLM-based knowledge extraction pipelines.&lt;br /&gt;
#* Apply an LLM workflow to extract disease-specific biomarkers from literature and comparing model outputs against the manually curated benchmark sets&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; Participation in the benchmarking and LLM-based biomarker extraction subproject depends on sufficient progress in either task 1 or task 2. Volunteers are expected to first complete either validation of an LLM-extracted glycobiology subset or comprehensive curation of a disease-specific biomarker set before beginning this component. Because this volunteership is structured around a 20-hour-per-week commitment, participation in this part is not guaranteed.&lt;br /&gt;
&lt;br /&gt;
Individuals interested in this opportunity may reach out to Jeet Vora ([mailto:jeetvora@gwu.edu jeetvora@gwu.edu]) for project details.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 9-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Sahana Adusumilli&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora&lt;br /&gt;
|Review EHR Normal Ranges&lt;br /&gt;
|-&lt;br /&gt;
|Abhirama Chillara&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Jeet Vora/Maria&lt;br /&gt;
|TBD&lt;br /&gt;
|-&lt;br /&gt;
|Rhea Charles&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|Sri Piramanayagam&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlyGen, BCO&lt;br /&gt;
|-&lt;br /&gt;
|Taylor Dimenna&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Biocuration Project&lt;br /&gt;
|-&lt;br /&gt;
|Daniel Auerbach&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan&lt;br /&gt;
|GlyGen Publication Analysis Project&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|GlyGen &lt;br /&gt;
|Kate Warner&lt;br /&gt;
|GlycoSiteMiner Curation Project&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|BCO User Research&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|BCO, GlycoSiteMiner, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arjun Agnihothram&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod ML, BiomarkerKB, GlycoSiteMiner&lt;br /&gt;
|-&lt;br /&gt;
|Aryan Jagani&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod ML&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Summer 2026 Symposium ==&lt;br /&gt;
The Summer symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; TBD&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1269</id>
		<title>AI ML Bootcamp 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1269"/>
		<updated>2026-05-08T14:54:42Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This full-day bootcamp provides an intensive introduction to artificial intelligence applications in biomedical research. Participants will gain practical experience with modern AI methods and learn how these tools can be applied across research and development workflows.&lt;br /&gt;
&lt;br /&gt;
The program is designed to move beyond basic usage. Attendees will learn how to run and evaluate their own models, understand the underlying principles of AI systems, and begin integrating these approaches into real biomedical research projects. By the end of the session, participants will have a clear pathway for transitioning from basic users of AI tools to advanced practitioners, and potentially developers, capable of building and adapting AI-driven solutions within the biomedical research ecosystem.&lt;br /&gt;
&lt;br /&gt;
== Program Information ==&lt;br /&gt;
Date: Friday, August 21&lt;br /&gt;
&lt;br /&gt;
Time: 9:00 AM – 5:00 PM&lt;br /&gt;
&lt;br /&gt;
Location: Gelman Library, Rooms 323–324&lt;br /&gt;
&lt;br /&gt;
Registration: &lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;br /&gt;
All times are in Eastern Standard Time.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Time (ET)&lt;br /&gt;
!Title&lt;br /&gt;
!Presenter&lt;br /&gt;
|-&lt;br /&gt;
|9:00 am&lt;br /&gt;
|Introduction and Welcome&lt;br /&gt;
|Raja Mazumder (20 mins)&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1268</id>
		<title>AI ML Bootcamp 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1268"/>
		<updated>2026-05-08T14:41:04Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This full-day bootcamp provides an intensive introduction to artificial intelligence applications in biomedical research. Participants will gain practical experience with modern AI methods and learn how these tools can be applied across research and development workflows.&lt;br /&gt;
&lt;br /&gt;
The program is designed to move beyond basic usage. Attendees will learn how to run and evaluate their own models, understand the underlying principles of AI systems, and begin integrating these approaches into real biomedical research projects. By the end of the session, participants will have a clear pathway for transitioning from basic users of AI tools to advanced practitioners, and potentially developers, capable of building and adapting AI-driven solutions within the biomedical research ecosystem.&lt;br /&gt;
&lt;br /&gt;
== Program Information ==&lt;br /&gt;
Date: Friday, August 21&lt;br /&gt;
&lt;br /&gt;
Time: 9:00 AM – 5:00 PM&lt;br /&gt;
&lt;br /&gt;
Location: Gelman Library, Rooms 323–324&lt;br /&gt;
&lt;br /&gt;
Registration: &lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1267</id>
		<title>AI ML Bootcamp 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1267"/>
		<updated>2026-05-08T14:40:37Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This full-day bootcamp provides an intensive introduction to artificial intelligence applications in biomedical research. Participants will gain practical experience with modern AI methods and learn how these tools can be applied across research and development workflows.&lt;br /&gt;
&lt;br /&gt;
The program is designed to move beyond basic usage. Attendees will learn how to run and evaluate their own models, understand the underlying principles of AI systems, and begin integrating these approaches into real biomedical research projects. By the end of the session, participants will have a clear pathway for transitioning from basic users of AI tools to advanced practitioners, and potentially developers, capable of building and adapting AI-driven solutions within the biomedical research ecosystem.&lt;br /&gt;
&lt;br /&gt;
=== Program Information ===&lt;br /&gt;
Date: Friday, August 21&lt;br /&gt;
&lt;br /&gt;
Time: 9:00 AM – 5:00 PM&lt;br /&gt;
&lt;br /&gt;
Location: Gelman Library, Rooms 323–324&lt;br /&gt;
&lt;br /&gt;
Registration: &lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1266</id>
		<title>AI ML Bootcamp 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1266"/>
		<updated>2026-05-08T14:39:36Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This full-day bootcamp provides an intensive introduction to artificial intelligence applications in biomedical research. Participants will gain practical experience with modern AI methods and learn how these tools can be applied across research and development workflows.&lt;br /&gt;
&lt;br /&gt;
The program is designed to move beyond basic usage. Attendees will learn how to run and evaluate their own models, understand the underlying principles of AI systems, and begin integrating these approaches into real biomedical research projects. By the end of the session, participants will have a clear pathway for transitioning from basic users of AI tools to advanced practitioners, and potentially developers, capable of building and adapting AI-driven solutions within the biomedical research ecosystem.&lt;br /&gt;
&lt;br /&gt;
== Program Information ==&lt;br /&gt;
Date: Friday, August 21&lt;br /&gt;
&lt;br /&gt;
Time: 9:00 AM – 5:00 PM&lt;br /&gt;
&lt;br /&gt;
Location: Gelman Library, Rooms 323–324&lt;br /&gt;
&lt;br /&gt;
Registration: &lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1265</id>
		<title>AI ML Bootcamp 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=AI_ML_Bootcamp_2026&amp;diff=1265"/>
		<updated>2026-05-08T14:37:58Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: Created page with &amp;quot;This full-day bootcamp provides an intensive introduction to artificial intelligence applications in biomedical research. Participants will gain practical experience with modern AI methods and learn how these tools can be applied across research and development workflows.  The program is designed to move beyond basic usage. Attendees will learn how to run and evaluate their own models, understand the underlying principles of AI systems, and begin integrating these approa...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This full-day bootcamp provides an intensive introduction to artificial intelligence applications in biomedical research. Participants will gain practical experience with modern AI methods and learn how these tools can be applied across research and development workflows.&lt;br /&gt;
&lt;br /&gt;
The program is designed to move beyond basic usage. Attendees will learn how to run and evaluate their own models, understand the underlying principles of AI systems, and begin integrating these approaches into real biomedical research projects. By the end of the session, participants will have a clear pathway for transitioning from basic users of AI tools to advanced practitioners, and potentially developers, capable of building and adapting AI-driven solutions within the biomedical research ecosystem.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Publications&amp;diff=1264</id>
		<title>Publications</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Publications&amp;diff=1264"/>
		<updated>2026-05-07T16:01:09Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;All publications listed on this page should follow a modified National Library of Medicine (NLM) citation format, adapted for clarity and consistency. Here is the suggested format:&amp;lt;blockquote&amp;gt;&#039;&#039;Author(s). Title of article. Journal Name. Year Month Day;Volume(Issue):Page range. PMID: [if available] DOI: [if no PMID]&#039;&#039;&amp;lt;/blockquote&amp;gt;Some guidelines:&lt;br /&gt;
&lt;br /&gt;
* If a PubMed ID (PMID) is available, include it and omit the DOI.&lt;br /&gt;
* If no PMID is available, include the DOI instead.&lt;br /&gt;
* Journal names should be spelled out in full unless the journal is widely recognized by its acronym (e.g., &#039;&#039;PLoS&#039;&#039;).&lt;br /&gt;
* Use full publication dates when available (e.g., 2025 Mar 28); if only the year is known, include the year alone.&lt;br /&gt;
* Include all author names in the order listed in the publication.&lt;br /&gt;
&amp;lt;h2&amp;gt;HIVE Platform Publications&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Please cite use of HIVE with&amp;lt;/p&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Simonyan V and Mazumder R. High-performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes, 2014 Sep 30;5(4): 957-981. [https://www.ncbi.nlm.nih.gov/pubmed/25271953 PMID: 25271953]&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Simonyan V, Chumakov K, Dingerdissen H, et al. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis. Database (Oxford). 2016; 2016:baw022. [https://www.ncbi.nlm.nih.gov/pubmed/26989153 PMID: 26989153]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;HIVE Team Publications&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mazumder, R., Krammer, L., Vora, J. K., Au Yeung, C. C. H., Kim, M., Bhuiyan, U., Ranzinger, R., Woodside, C. R., Kamalabharathy, D., Kim, I., Gulati, V., Muthusekaran, V., Cognata, C., Chakravorty, S., Tien, A., &amp;amp; Bakshi, V. (2026, May 7). 2026 Spring Volunteership Symposium Presentations – Mazumder Lab, The George Washington University. Zenodo. &amp;lt;nowiki&amp;gt;https://doi.org/10.5281/zenodo.20072087&amp;lt;/nowiki&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Arethiya NJ, Krammer L, David J, Bakshi V, BasuChoudhary A, Bhuiyan U, Sen S, Mazumder R, McNeely P. Enhancing prediabetes diagnosis from continuous glucose monitoring data via iterative label cleaning and deep learning of Bridge2AI AI-READI data. medRxiv. 2026 Mar 4. Preprint. [https://www.medrxiv.org/content/10.64898/2026.03.04.26347604v1 DOI: 10.64898/2026.03.04.26347604].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mazumder R, Keeney J, Johnson L, Krammer L, McNeely P, Sepulveda J, Hangen D, Martin M, Jyothi D, De Almeida J, McGarvey P, Alaoui A, Cha S, Sedrakyan A, Shoelle E, Matheny M, LeNoue-Newton M, Winter R, Deppen S, Simonyan V, Horvath A. From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research. J Am Med Inform Assoc. 2026 Jan 20:ocag001. Epub ahead of print. [https://pubmed.ncbi.nlm.nih.gov/41556955/ PMID: 41556955].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Krammer L, McNeely PM, Bhuiyan U, Singleton SS, Arethiya N, Argaw A, Aggarwal V, Basuchoudhary A, Mazumder M, David J, Agrawal S, Sen S, Mazumder R. PredictMod: A Platform for Predicting Medical Intervention Outcomes and Sharing Custom ML/AI Models. Network and Systems Medicine&#039;&#039;.&#039;&#039; 2025. Vol. 1(1):57-66. [https://drugrepocentral.scienceopen.com/hosted-document?doi=10.14293/NSM.25.1.0007 DOI: 10.14293/NSM.25.1.0007].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kahsay R, Bhuiyan U, Au CCH, Edwards N, Johnson L, Kulkarni S, Martinez K, Ranzinger R, Vijay-Shanker K, Vora J, Warner K, Tiemeyer M, Mazumder R. GlycoSiteMiner: an ML/AI-assisted literature mining-based pipeline for extracting glycosylation sites from PubMed abstracts. Glycobiology. 2025 May 22. [https://pubmed.ncbi.nlm.nih.gov/40401984/ PMID: 40401984].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Aoki-Kinoshita KF, Lisacek F, Mazumder R, Ranzinger R, Tiemeyer M, Yamada I, Packer NH. Meeting report of the GlySpace alliance and GaLSIC symposium. Glycobiology. 2025 Mar 28:cwaf019. [https://pubmed.ncbi.nlm.nih.gov/40156285/ PMID: 40156285].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Clarke DJB, Evangelista JE, Xie Z, Marino GB, Byrd AI, Maurya MR, Srinivasan S, Yu K, Petrosyan V, Roth ME, Milinkov M, King CH, Vora JK, Keeney J, Nemarich C, Khan W, Lachmann A, Ahmed N, Agris A, Pan J, Ramachandran S, Fahy E, Esquivel E, Mihajlovic A, Jevtic B, Milinovic V, Kim S, McNeely P, Wang T, Wenger E, Brown MA, Sickler A, Zhu Y, Jenkins SL, Blood PD, Taylor DM, Resnick AC, Mazumder R, Milosavljevic A, Subramaniam S, Ma&#039;ayan A. Playbook workflow builder: Interactive construction of bioinformatics workflows. PLoS Comput Biol. 2025 Apr 3;21(4):e1012901. [https://pubmed.ncbi.nlm.nih.gov/40179105/ PMID: 40179105].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Keeney &#039;&#039;et al&#039;&#039;. Olduvai domain expression downregulates mitochondrial pathways: implications for human brain evolution and neoteny. October 22, 2024. bioRxiv. https://doi.org/10.1101/2024.10.21.619278&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Martinez K, Agirre J, Akune Y, Aoki-Kinoshita KF, Arighi C, Axelsen KB, Bolton E, Bordeleau E, Edwards NJ, Fadda E, Feizi T, Hayes C, Ives CM, Joshi HJ, Krishna Prasad K, Kossida S, Lisacek F, Liu Y, Lütteke T, Ma J, Malik A, Martin M, Mehta AY, Neelamegham S, Panneerselvam K, Ranzinger R, Ricard-Blum S, Sanou G, Shanker V, Thomas PD, Tiemeyer M, Urban J, Vita R, Vora J, Yamamoto Y, Mazumder R. Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. Database (Oxford). 2024 Aug 13;2024:baae073. [https://pubmed.ncbi.nlm.nih.gov/39137905/ PMID: 39137905].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kim S, Mazumder R. Enhancing scientific reproducibility through automated BioCompute Object creation using Retrieval-Augmented Generation from publications. Computer Science,  Computation and Language. https://doi.org/10.48550/arXiv.2409.15076&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wu J, Singleton SS, Bhuiyan U, Krammer L, Mazumder R. Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning. Front. Mol. Biosci.. 19 January 2024; Sec. Molecular Diagnostics and Therapeutics. Volume 10 – 2023. [https://www.ncbi.nlm.nih.gov/pubmed/38313584 PMID: 38313584].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Keeney JG, Gulzar N, Baker JB, Klempir O, Hannigan GD, Bitton DA, Maritz JM, King CHS 4th, Patel JA, Duncan P, Mazumder R. Communicating computational workflows in a regulatory environment. Drug Discov Today. 2024 Jan 12; 103884. [https://www.ncbi.nlm.nih.gov/pubmed/38219969 PMID: 38219969].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Sylvetsky AC, Clement RA, Stearrett N, Issa NT, Dore FJ, Mazumder R, King CH, Hubal MJ, Walter PJ, Cai H, Sen S, Rother KI, Crandall KA. Consumption of sucralose and acesulfame-potassium containing diet soda alters the relative abundance of microbial taxa at the species level: findings of two pilot studies. Appl Physiol Nutr Metab. 2024 Jan 1; 49(1):125-134. [https://www.ncbi.nlm.nih.gov/pubmed/37902107 PMID: 37902107].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Vora J, Navelkar R, Vijay-Shanker K, Edwards N, Martinez K, Ding X, Wang T, Su P, Ross K, Lisacek F, Hayes C, Kahsay R, Ranzinger R, Tiemeyer M, Mazumder R. The glycan structure dictionary-a dictionary describing commonly used glycan structure terms. Glycobiology. 2023 Feb 17; cwad014 [https://www.ncbi.nlm.nih.gov/pubmed/36799723 PMID: 36799723].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lisacek F, Tiemeyer M, Mazumder R, Aoki-Kinoshita KF. Worldwide Glycoscience Informatics Infrastructure: The GlySpace Alliance. JACS Au. eCollection 2023 Jan 23; [https://www.ncbi.nlm.nih.gov/pubmed/36711080 PMID: 36711080].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Datta Chaudhuri R, Datta R, Rana S, Kar A, Vinh Nguyen Lam P, Mazumder R, Mohanty S, Sarkar S. Cardiomyocyte-specific regression of nitrosative stress-mediated S-Nitrosylation of IKKγ alleviates pathological cardiac hypertrophy. Cell Signal. 2022 Oct; 98:110403 [https://www.ncbi.nlm.nih.gov/pubmed/35835332 PMID: 35835332].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dahlin M, Singleton SS, David JA, Basuchoudhary A, Wickström R, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumour necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. Cell Signal. 2022 ; eBioMedicine (part of The Lancet discovery science) [https://www.ncbi.nlm.nih.gov/pubmed/35598439 PMID: 35598439].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lyman DF, Bell A, Black A, Dingerdissen H, Cauley E, Gogate N, Liu D, Joseph A, Kahsay R, Crichton DJ, Mehta A, Mazumder R. Modeling and integration of N-glycan biomarkers in a comprehensive biomarker data model. Glycobiology. August 2022; [https://academic.oup.com/glycob/article/32/10/855/6655823?login=false 35925813].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Torcivia J, Abdilleh K, Seidl F, Shahzada O, Rodriguez R, Pot D, Mazumder R. Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers. Onco (Basel). June 2022; 2(2):129-144. [https://www.ncbi.nlm.nih.gov/pubmed/37841494 PMID: 37841494].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dahlin M, Singleton S, David J, Basuchoudhary A, Wickstrom, Mazumder R, Prast-Nielsen S. Higher levels of Bifidobacteria and tumor necrosis factor in children with drug-resistant epilepsy are associated with anti-seizure response to the ketogenic diet. eBioMedicine. June 2022; vol: 80. [https://doi.org/10.1016/j.ebiom.2022.104061 https://doi.org/10.1016/j.ebiom.2022.104061].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;King CH, Keeney J, Guimera N, Das S, Weber M, Fochtman B, Walderhaug MO, Talwar S, Patel JA, Mazumder R, Donaldson EF. Communicating regulatory high-throughput sequencing data using BioCompute Objects. Drug Discov Today. 2022 Jan 22; [https://www.ncbi.nlm.nih.gov/pubmed/35077912 PMID: 35077912].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wang Z, Hopson L, Singleton S, Yang X, Jogunoori W, Mazumder R, Obias V, Lin P, Nguyen BN, Yao M, Miller L, White J, Rao S, Mishra L. Mice with dysfunctional TGF-β signaling develop altered intestinal microbiome and colorectal cancer resistant to 5FU. Biochim Biophys Acta Mol Basis Dis. 2021 Oct 1; 1867(10):166179. [https://www.ncbi.nlm.nih.gov/pubmed/34082069 PMID: 34082069].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lyman D, Natale D, Schriml L, Anton K, Crichton DC, Mazumder R. Analysis of Biomarker Data Towards Development of a Molecular Biomarker Ontology. Proceedings of the International Conference on Biomedical Ontologies 2021 (ICBO 2021) co-located with the Workshop on Ontologies for the Behavioural and Social Sciences (OntoBess 2021) as part of the Bolzano Summer of Knowledge (BOSK 2021) Bozen-Bolzano, Italy. 2021 Sep 16-18; [https://ceur-ws.org/Vol-3073/paper13.pdf https://ceur-ws.org/Vol-3073/paper13.pdf].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Patel JA, Dean DA, King CH, Xiao N, Koc S, Minina E, Golikov A, Brooks P, Kahsay R, Navelkar R, Ray M, Roberson D, Armstrong C, Mazumder R, Keeney J. Bioinformatics tools developed to support BioCompute Objects. Database (Oxford). 2021 March 31; [https://www.ncbi.nlm.nih.gov/pubmed/33784373 PMID: 33784373].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Hora B, Gulzar N, Chen Y, Karagiannis K, Cai F, Su C, Smith K, Simonyan V, Shah SA, Ahmed M, Sanchez AM, Stone M, Cohen MS, Denny TN, Mazumder R, Gao F. Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing. mSphere. 2020 Oct 14; [https://www.ncbi.nlm.nih.gov/pubmed/33055255 PMID: 33055255].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Hopson L, Singleton S, David J, Basuchoudhary A, Prast-Nielsen S, Klein P, Sen S, Mazumder R. Bioinformatics and machine learning in gastrointestinal microbiome research and clinical application. Prog Mol Biol Transl Sci. 2020 Sep 30; 176:141-178. [https://www.ncbi.nlm.nih.gov/pubmed/33814114 PMID: 33814114].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Torcivia J, Mazumder R. Scanning window analysis of non-coding regions within normal-tumor whole-genome sequence samples. Briefings in Bioinformatics. 2020 Sep 17; [https://www.ncbi.nlm.nih.gov/pubmed/32940334 PMID: 32940334].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Gogate N, Lyman D, Bell A, Cauley E, Crandall KA, Joseph A, Kahsay R, Natale DA, Schriml LM, Sen S, Mazumder R. COVID-19 biomarkers and their overlap with comorbidities in a disease biomarker data model. Brief Bioinform. 2021 May 20; bbab191. doi: 10.1093/bib/bbab191. [https://www.ncbi.nlm.nih.gov/pubmed/34015823 PMID: 34015823].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kahsay R, Vora J, Navelkar R, Mousavi R, Fochtman BC, Holmes X, Pattabiraman N, Ranzinger R, Mahadik R, Williamson T, Kulkarni S, Agarwal G, Martin M, Vasudev P, Garcia L, Edwards N, Zhang W, Natale DA, Ross K, Aoki-Kinoshita KF, Campbell MP, York WS, Mazumder R. GlyGen data model and processing workflow. Bioinformatics. 2020; [https://www.ncbi.nlm.nih.gov/pubmed/32324859 PMID: 32324859].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Kurnat-Thoma E, Baranova A, Baird P, Brodsky E, Butte AJ, Cheema AK, Cheng F, Dutta S, Grant C, Giordano J, Maitland-van der Zee AH, Fridsma DB, Jarrin R, Kann MG, Keeney J, Loscalzo J, Madhavan G, Maron BA, McBride DK, McKean M, Mun SK, Palmer JC, Patel B, Parakh K, Pariser AR, Pristipino C, Radstake TRDJ, Rajasimha HK, Rouse WB, Rozman D, Saleh A, Schmidt HHHW, Schultz N, Sethi T, Silverman EK, Skopac J, Svab I, Trujillo S, Valentine JE, Verma D, West BJ, Vasudevan S. Recent Advances in Systems and Network Medicine: Meeting Report from the First International Conference in Systems and Network Medicine. Syst Med (New Rochelle). 2020; 3(1):22-35. [https://www.ncbi.nlm.nih.gov/pubmed/32226924 PMID: 32226924].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dingerdissen HM, Bastian F, Vijay-Shanker K, Robinson-Rechavi M, Bell A, Gogate N, Gupta S, Holmes E, Kahsay R, Keeney J, Kincaid H, King CH, Liu D, Crichton DJ, Mazumder R. OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data. JCO Clin Cancer Inform. 2020; 4:210-220. [https://www.ncbi.nlm.nih.gov/pubmed/32142370 PMID: 32142370].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Aoki-Kinoshita KF, Lisacek F, Mazumder R, York WS, Packer NH. The GlySpace Alliance: toward a collaborative global glycoinformatics community. Glycobiology. 2020; 30(2):70-71. [https://www.ncbi.nlm.nih.gov/pubmed/31573039 PMID: 31573039].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;York WS, Mazumder R, Ranzinger R, et al. GlyGen: Computational and Informatics Resources for Glycoscience. Glycobiology. 2019. https://doi.org/10.1093/glycob/cwz080 [https://www.ncbi.nlm.nih.gov/pubmed/31616925 PMID: 31616925].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;King CH, Desai H, Sylvetsky AC, LoTempio J, Ayanyan S, Carrie J, Crandall K, Fochtman B, Gasparyan L, Gulzar N, Howell P, Issa N, Krampis K, Mishra L, Morizono H, Pisegna JR, Rao S, Ren Y, Simonyan V, Smith K, VedBrat S, Yao M, Mazumder R. Baseline human gut microbiota profile in healthy people and standard reporting template. PLOS ONE. 2019. [https://www.ncbi.nlm.nih.gov/pubmed/31509535 PMID: 31509535].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Fan Y, Hu Y, Yan C, Goldman R, Pan Y, Mazumder R, Dingerdissen H. Loss and gain of N-linked glycosylation sequons due to single-nucleotide variation in cancer. Scientific Reports. PLoS One. 2018; 8():4322. [https://www.ncbi.nlm.nih.gov/pubmed/29531238 PMID: 29531238].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Baekdoo Kim, Thahmina Ali, Changsu Dong, Carlos Lijeron, Raja Mazumder, Claudia Wultsch, and Konstantinos Krampis. miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines. Journal of Computational Biology. 2018. http://doi.org/10.1089/cmb.2018.0218&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Alterovitz G, Dean D A, Goble C, Crusoe M R, Soiland-Reyes S, Bell A, Hayes A, King, C H S, Taylor D, Johanson E, Thompson E E, Donaldson E, Morizono H, Tsang H S, Goecks J, Yao J, Almeida J S, Krampis K, Guo L, Walderhaug M, Walsh P, Kahsay R, Gottipati S, Bloom T, Lai Y, Simonyan V, Mazumder R. Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results. PLOS Biology; 16(12): e3000099. 2018. https://doi.org/10.1371/journal.pbio.3000099&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Hu Y, Dingerdissen H, Gupta S, Kahsay R, Shanker V, Wan Q, Yan C, Mazumder R. Identification of key differentially expressed MicroRNAs in cancer patients through pan-cancer analysis. Computers in Biology and Medicine 2018; vol: 103 pp: 183-197. [https://www.ncbi.nlm.nih.gov/pubmed/30384176 PMID: 30384176].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Dingerdissen H, Torcivia-Rodriguez J, Hu Y, Chang T-C, Mazumder R, Kahsay R. BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery. Nucleic Acids Research. 2017. [https://pubmed.ncbi.nlm.nih.gov/30053270/ PMID: 30053270].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Karagiannis K, Simonyan V, Chumakov K, Mazumder R. Separation and assembly of deep sequencing data into discrete sub-population genomes. Nucleic Acids Research. 45(19):10989-11003. 2017. [https://www.ncbi.nlm.nih.gov/pubmed/28977510 PMID: 28977510].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Chen J, Zaidi S, Rao S, Chen J-S, Phan L, Farci P, Su X, Shetty K, White J, Zamboni F, Wu X, Rashid A, Pattabiraman N, Mazumder R, Horvath A, Wu R-C, Li S, Xiao C, Deng C-X, Wheeler D A, Mishra B, Akbani R, Mishra L. Analysis of Genomes and Transcriptomes of Hepatocellular Carcinomas Identifies Mutations and Gene Expression Changes in the Transforming Growth Factor beta Pathway. Gastroenterology. 2017; S0016-5085(17)36144-9. [https://www.ncbi.nlm.nih.gov/pubmed/28918914 PMID: 28918914].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics. 2017; 18(1):391. [https://www.ncbi.nlm.nih.gov/pubmed/28865429 PMID: 28865429].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Gannavaram S, Torcivia J, Gasparyan L, Kaul A, Ismail N, Simonyan V, Nakhasi HL. Whole genome sequencing of live attenuated Leishmania donovani parasites reveals novel biomarkers of attenuation and enables product characterization. Sci Rep. 2017; 7(1):4718. [https://www.ncbi.nlm.nih.gov/pubmed/28680050 PMID: 28680050].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Simonyan V, Chumakov K, Donaldson E, Karagiannis K, Lam PV, Dingerdissen H, Voskanian A. HIVE-heptagon: A sensible variant-calling algorithm with post-alignment quality controls. Genomics. 2017; 109(3-4):131-140. [https://www.ncbi.nlm.nih.gov/pubmed/28188908 PMID: 28188908].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Pan Y, Yan C, Fan Y, Pan Q, Wan Q, Torcivia-Rodriquez J, Mazumder R. Distribution bias analysis of germline and somatic single-nucleotide variations that impact protein functional site and neighboring amino acids. Scientific Reports. 2017; 7:42169 [https://www.ncbi.nlm.nih.gov/pubmed/28176830 PMID: 28176830].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Gulzar N, Dingerdissen H, Yan C, Mazumder R. Impact of Nonsynonymous Single-Nucleotide Variations on Post-Translational Modification Sites in Human Proteins. Methods Mol Biol. 2017; 1558:159-190. [https://www.ncbi.nlm.nih.gov/pubmed/28150238 PMID: 28150238].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Simonyan V, Goecks J, Mazumder R. BioCompute objects - a step towards evaluation and validation of bio-medical scientific computations. PDA J Pharm Sci Technol. 2017; 71(2):136-146 [https://www.ncbi.nlm.nih.gov/pubmed/27974626 PMID: 27974626].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Yan C, Pattabiraman N, Goecks J, Lam P, Nayak A, Pan Y, Torcivia-Rodriquez J, Voskanian A, Wan Q, Mazumder R. Impact of germline and somatic missense variations on drug binding sites. Pharmacogenomics J. 2017; 17(2):128-136 [https://www.ncbi.nlm.nih.gov/pubmed/26810135 PMID: 26810135].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Novatt H, Theisen TC, Massie T, Simonyan V, Voskanian-Kordi A, Renn LA, Rabin RL. Distinct Patterns of Expression of Transcription Factors in Response to Interferon Beta and Interferon lambda-1. J Interferon Cytokine Res. 2016; 36(10):589-598 [https://www.ncbi.nlm.nih.gov/pubmed/27447339 PMID: 27447339].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Chen C, Huang H, Mazumder R, Natale DA, McGarvey PB, Zhang J, Poison SW, Wang Y, Wu CH, UniProt Consortium. Computational clustering for viral reference proteomes. Bioinformatics. 2016; 32(13):2041-3 [https://www.ncbi.nlm.nih.gov/pubmed/27153712 PMID: 27153712].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mahmood AS, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A text-mining system for mutation-disease association extraction. PLoS One. 2016; 11(4):e0152725 [https://www.ncbi.nlm.nih.gov/pubmed/27073839 PMID: 27073839].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Goldweber S, Theodore J, Torcivia-Rodriquez J, Simonyan V, Mazumder R. Pubcast and Genecast: Browsing and exploring publications and associated curated content in biology through mobile devices. IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(2):498-500 [https://www.ncbi.nlm.nih.gov/pubmed/28113865 PMID: 28113865].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Laassri M, Zagorodnyaya T, Plant EP, Petrovskaya S, Bidzhieva B, Ye Z, Simonyan V, Chumakov K. Deep Sequencing for Evaluation of Genetic Stability of Influenza A/California/07/2009 (H1N1) Vaccine Viruses. PLoS One. 2015; 10(9):e0138650. [https://www.ncbi.nlm.nih.gov/pubmed/26407068 PMID: 26407068].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Sauder CJ, Ngo L, Simonyan V, Cong Y, Zhang C, Link M, Malik T, Rubin SA. Generation and propagation of recombinant mumps viruses exhibiting an additional U residue in the homopolymeric U tract of the F gene-end signal. Virus Genes. 2015; 51(1):12-24. [https://www.ncbi.nlm.nih.gov/pubmed/25962759 PMID: 25962759].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D, Mitraka E, Pan Y, Smith KM, Srivastava S, Ward S, Yan C, Mazumder R. Generating a focused view of Disease Ontology cancer terms for pan-cancer data integration and analysis. Database (Oxford). 2015; 2015:bav032. [https://www.ncbi.nlm.nih.gov/pubmed/25841438 PMID: 25841438].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu T-J, Yang C, Zhang H, Mazumder R. BioXpress: An integrated RNA-seq derived gene expression database for pan-cancer analysis. Database (Oxford). 2015; 2015. pii: bav019 [https://www.ncbi.nlm.nih.gov/pubmed/25819073 PMID: 25819073].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Kumari P, Mazumder R, Simonyan V, Krampis K. Advantages of distributed and parallel algorithms that leverage Cloud Computing platforms for large-scale genome assembly. F1000Research. 2015; 4(20). [https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&amp;amp;context=smhs_biochem_facpubs https://hsrc.himmelfarb.gwu.edu/cgi/viewcontent.cgi?article=1167&amp;amp;context=smhs_biochem_facpubs].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Abunimer A, Dingerdissen H, Torcivia-Rodriguez J, Vinh Nguyen Lam P, Mazumder R. Non-synonymous Single-Nucleotide Variations as Cardiovascular System Disease Biomarkers and Their Roles in Bridging Genomic and Proteomic Technologies. Biomarkers in Cardiovascular Disease. 2015. [https://link.springer.com/referenceworkentry/10.1007/978-94-007-7741-5_40-1 Springer Nature link].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Adhikari S, Chetram MA, Woodrick J, Mitra PS, Manthena PV, Khatkar P, Dakshanamurthy S, Dixon M, Karmahapatra SK, Nuthalapati NK, Gupta S, Narasimhan G, Mazumder R, Loffredo CA, Uren A, Roy R. Germ-line variants of human N-methylpurine DNA glycosylase show impaired DNA repair activity and facilitate 1,N6 ethenoadenine induced mutations. J Biol Chem. 2014; 290(8):4966-80. [https://www.ncbi.nlm.nih.gov/pubmed/25538240 PMID: 25538240].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wilson CA and Simonyan V. FDA&#039;s Activities Supporting Regulatory Application of &amp;quot;Next Gen&amp;quot; Sequencing Technologies. PDA J Pharm Sci Technol. 2014; 68(6):626-630. [https://www.ncbi.nlm.nih.gov/pubmed/25475637 PMID: 25475637].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Shamsaddini A, Pan Y, Johnson WE, Krampis K, Shcheglovitova M, Simonyan V, Zanne A, Mazumder R. Census-based rapid and accurate metagenome taxonomic profiling. BMC Genomics. 2014; 15(1):918. [https://www.ncbi.nlm.nih.gov/pubmed/25336203 PMID: 25336203].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Pan Y, Karagiannis K, Zhang H, Dingerdissen H, Shamsaddini A, Wan Q, Simonyan V, Mazumder R. Human germline and pan-cancer variomes and their distinct functional profiles. Nucleic Acids Research. 2014; 42(18):11570-88. [https://www.ncbi.nlm.nih.gov/pubmed/25232094 PMID: 25232094].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Nayak A, Pattabiraman N, Fadra N, Goldman R, Pond S, Mazumder R. Structure-function analysis of hepatitis C virus envelope glycoproteins E1 and E2. J Biomol Struct Dyn. 2014; 33(8):1682-94. [https://www.ncbi.nlm.nih.gov/pubmed/25245635 PMID: 25245635].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Faison WJ, Rostovtsev A, Castro-Nallar E, Crandall KA, Chumakov K, Simonyan V, Mazumder R. Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics. 2014; 104(1):1-7. [https://www.ncbi.nlm.nih.gov/pubmed/24930720 PMID: 24930720].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis. PLOS One. 2014; 9(6):e99033. [https://www.ncbi.nlm.nih.gov/pubmed/24918764 PMID: 24918764].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dingerdissen H, Weaver DS, Karp PD, Pan Y, Simonyan V, Mazumder R. A framework for application of metabolic modeling in yeast to predict the effects of nsSNV in human orthologs. Biol Direct. 2014; 9:9. [https://www.ncbi.nlm.nih.gov/pubmed/24894379 PMID: 24894379].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Bidzhieva B, Zagorodnyaya T, Karagiannis K, Simonyan V, Laassri M, Chumakov K. Deep sequencing approach for genetic stability evaluation of influenza A viruses. J Virol Methods. 2014; 199(68):75. [https://www.ncbi.nlm.nih.gov/pubmed/24406624 PMID: 24406624].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Abunimer A, Smith K, Wu T-J, Lam P, Simonyan V, Mazumder R. Single-nucleotide variations in cardiac arrhythmias: prospects for genomics and proteomics based variation detection. Genes. 2014; 5(2):254-69. [https://www.ncbi.nlm.nih.gov/pubmed/24705329 PMID: 24705329].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Wu T-J, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R. A framework for organizing cancer related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). Database. 2014; 2014:bau022. [https://www.ncbi.nlm.nih.gov/pubmed/24667251 PMID: 24667251].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dabrazhynetskaya A, Soika V, Volokhov D, Simonyan V, Chizhikov V. Genome Sequence of Mycoplasma hyorhinis Strain DBS 1050. Genome Announce. 2014; 2(2):pii: e00127-14. [https://www.ncbi.nlm.nih.gov/pubmed/24604646 PMID: 24604646].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Cole C, Krampis K, Karagiannis K, Almeida J, Faison JW, Motwani M, Wan Q, Golikov A, Pan Y, Simonyan V, Mazumder R. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics. 2014; 15:28. [https://www.ncbi.nlm.nih.gov/pubmed/24467687 PMID: 24467687].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mudvari P, Kowsari K, Cole C, Mazumder R, Horvath A. Extraction of molecular features through exome to transcriptome alignment. J Metabol Sys Biol. 2013; 1(1):7. [https://www.ncbi.nlm.nih.gov/pubmed/24791251 PMID: 24791251].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Basuchoudhary A, Simonyan V, Mazumder R. Community annotation and the evolution of cooperation: How patience matters. Open Bioinformatics Journal. 2013; 7:9-18.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Karagiannis K, Simonyan V, Mazumder R. SNVDis: A Proteome-wide Analysis Service for Evaluating nsSNVs in Protein Functional Sites and Pathways. Genomics Proteomics Bioinformatics. 2013; 11(2):122-126. [https://www.ncbi.nlm.nih.gov/pubmed/23618375 PMID: 23618375].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Lam PV, Goldman R, Karagiannis K, Narsule T, Simonyan V, Soika V, Mazumder R. Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes. Genomics Proteomics Bioinformatics. 2013; 11(2):96-104. [https://www.ncbi.nlm.nih.gov/pubmed/23459159 PMID: 23459159].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Dingerdissen H, Motwani M, Karagiannis K, Simonyan V, Mazumder R. Proteome-wide analysis of nonsynonymous single-nucleotide variations in active sites of human proteins. FEBS J. 2013; 280(6):1542-1562. [https://www.ncbi.nlm.nih.gov/pubmed/23350563 PMID: 23350563].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Gaudet P, Arighi C, Bastian F, Bateman A, Blake JA, Cherry MJ, D&#039;Eustachio P, Finn R, Giglio M, Hirschman L, Kania R, Klimke W, Martin MJ, Karsch-Mizrachi I, Munoz-Torres M, Natale D, O&#039;Donovan C, Ouellette F, Pruitt KD, Robinson-Rechavi M, Sansone SA, Schofield P, Sutton G, Van Auken K, Vasudevan S, Wu C, Young J, Mazumder R. Recent advances in biocuration: meeting report from the Fifth International Biocuration Conference. Database (Oxford). 2012; 2012:bas036. [https://www.ncbi.nlm.nih.gov/pubmed/23110974 PMID: 23110974].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Volokhov DV, Simonyan V, Davidson MK, Chizhikov VE. RNA polymerase beta subunit (rpoB) gene and the 16S-23S rRNA intergenic transcribed spacer region (ITS) as complementary molecular markers in addition to the 16S rRNA gene phylogenetic analysis and identification of the species of the family Mycoplasmataceae. Mol Phylogenet Evol. 2012; 62(1):515-28. [https://www.ncbi.nlm.nih.gov/pubmed/22115576 PMID: 22115576].&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Mazumder R, Morampudi KS, Motwani M, Vasudevan S, Goldman R. Proteome-wide analysis of single-nucleotide variations in the N-glycosylation sequon of human genes. PLoS One. 2012; 7(5):e36212. [https://www.ncbi.nlm.nih.gov/pubmed/22586465 PMID: 22586465].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Synthea_Data&amp;diff=1258</id>
		<title>Synthea Data</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Synthea_Data&amp;diff=1258"/>
		<updated>2026-04-27T21:03:31Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://synthea.mitre.org/downloads Synthea] is an open-source patient population simulation.&lt;br /&gt;
&lt;br /&gt;
This article is a stub. You can help HIVE Wiki by adding missing information.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Main_Page&amp;diff=1257</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Main_Page&amp;diff=1257"/>
		<updated>2026-04-27T21:00:22Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;font-size:290%; padding:.1em; text-align: center;&amp;quot;&amp;gt;[https://hivelab.biochemistry.gwu.edu/ Welcome] to the Mazumder Research Group (HIVE Lab) Wiki&amp;lt;/div&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;text-align: center; width: 100%;&amp;quot;&amp;gt;&lt;br /&gt;
  [[File:Grouppic.jpg|center|950px]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
{| style=&amp;quot;width:100%;&amp;quot;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;The Mazumder Research Group (HIVE Lab) is involved in developing the High-performance Integrated Virtual Environment (HIVE) which is a collaborative project between the CBER Food and Drug Administration (FDA) and Dr. Raja Mazumder&#039;s team at The George Washington University (GW). In addition to the HIVE platform, the group is involved in developing standards for bioinformatics communication via BioCompute Objects, knowledgebases for glycoinformatics (GlyGen), cancer research (BiomarkerKB, OncoMX, BioMuta, BioXpress), and microbiome analysis via the GutFeeling Knowledge Base (GFKB). The lab leverages knowledge graphs and advanced ML/AI technologies, including large language models, to harmonize, map, and uncover valuable insights from clinical data, omics datasets, knowledgebases, and scientific publications.&amp;lt;/div&amp;gt;The lab manages several social media accounts dedicated to various projects.&amp;lt;br&amp;gt;GlyGen: [https://www.linkedin.com/company/glygen/ LinkedIn], [https://mstdn.science/@glygen Mastadon]; [https://bsky.app/profile/glygen.bsky.social BlueSky] | BioCompute: [https://www.linkedin.com/company/biocompute-partnership LinkedIn] | GW Bioinfo &amp;amp; Biochem students and alumni: [https://www.linkedin.com/groups/8313079/ LinkedIn]&amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/ HIVE Lab] and is divided into the following main sections:&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Projects]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Projects]]&#039;&#039;&amp;lt;br&amp;gt;HIVE team projects fall into two major categories:&amp;lt;br&amp;gt;&lt;br /&gt;
1) Developing infrastructure for biomedical data analysis.&amp;lt;br&amp;gt;&lt;br /&gt;
2) Using that infrastructure to integrate and mine the data for knowledge.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Publications]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Publications]]&#039;&#039;&amp;lt;br&amp;gt;The HIVE team has a variety of peer-reviewed publications, book chapters, posters, brochures, and multimedia available for the public.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[People]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[People]]&#039;&#039;&amp;lt;br&amp;gt;This page contains details about the HIVE Lab team.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Collaborators]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Collaborators]]&#039;&#039;&amp;lt;br&amp;gt;This page contains details about the HIVE Lab Collaborators.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Contact]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Contact]]&#039;&#039;&amp;lt;br&amp;gt;To contact us for more info. Please visit this page.&lt;br /&gt;
&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== FAQs ===&lt;br /&gt;
&lt;br /&gt;
==== How can I access the HIVE platform? ====&lt;br /&gt;
At this time, we can only provide access to the HIVE platform to a limited number of researchers. If interested in using the HIVE platform, please tell us a little bit about your research needs and complete the [https://docs.google.com/forms/d/1cjaQ1AT_nMHQ2stJmOQ0XesGqoXE0cgLNViTXjZtUJ8/edit HIVE access request form]. Please wait to create an account until we have reviewed your research goals and are sure that we can support your research needs. &lt;br /&gt;
&lt;br /&gt;
==== What do I do once I&#039;m approved to use the HIVE platform? ====&lt;br /&gt;
If approved, please create an account. You will receive a confirmation email stating that your account was successfully created. Please verify your email by clicking the link included in the confirmation email. &lt;br /&gt;
&lt;br /&gt;
==== What do I do if I created an account but did not receive a confirmation email? ====&lt;br /&gt;
Make sure you entered a valid email address. If you are sure that you used a valid email and have not received a confirmation email, please wait 24 hours and check again. &lt;br /&gt;
&lt;br /&gt;
Be sure to fill out the HIVE access request form if you have not already done so. We can only grant access after ensuring that our platform can support your research needs. &lt;br /&gt;
&lt;br /&gt;
==== Account activation ====&lt;br /&gt;
We can only activate you account once you have submitted the HIVE access request form and verified your email. If approved, please allow up to 24 hours for account activation. You will receive an email confirming that your account has been activated. &lt;br /&gt;
&lt;br /&gt;
==== Requesting a demonstration ====&lt;br /&gt;
Once your account has been activated, someone from our team will contact you to schedule a 30 minute demonstration. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=External links=&lt;br /&gt;
&lt;br /&gt;
;Official&lt;br /&gt;
&lt;br /&gt;
*[https://hivelab.biochemistry.gwu.edu/ HIVE Lab official website]&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Main_Page&amp;diff=1256</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Main_Page&amp;diff=1256"/>
		<updated>2026-04-27T21:00:08Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{DISPLAYTITLE:&amp;lt;span style=&amp;quot;position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);&amp;quot;&amp;gt;{{FULLPAGENAME}}&amp;lt;/span&amp;gt;}}&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&amp;lt;!-- BANNER ACROSS TOP OF PAGE --&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;font-size:290%; padding:.1em; text-align: center;&amp;quot;&amp;gt;[https://hivelab.biochemistry.gwu.edu/ Welcome] to the Mazumder Research Group (HIVE Lab) Wiki&amp;lt;/div&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;text-align: center; width: 100%;&amp;quot;&amp;gt;&lt;br /&gt;
  [[File:Grouppic.jpg|center|950px]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
{| style=&amp;quot;width:100%;&amp;quot;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;div id=&amp;quot;ggw-topbanner&amp;quot; style=&amp;quot;clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;margin:0.4em; text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;The Mazumder Research Group (HIVE Lab) is involved in developing the High-performance Integrated Virtual Environment (HIVE) which is a collaborative project between the CBER Food and Drug Administration (FDA) and Dr. Raja Mazumder&#039;s team at The George Washington University (GW). In addition to the HIVE platform, the group is involved in developing standards for bioinformatics communication via BioCompute Objects, knowledgebases for glycoinformatics (GlyGen), cancer research (BiomarkerKB, OncoMX, BioMuta, BioXpress), and microbiome analysis via the GutFeeling Knowledge Base (GFKB). The lab leverages knowledge graphs and advanced ML/AI technologies, including large language models, to harmonize, map, and uncover valuable insights from clinical data, omics datasets, knowledgebases, and scientific publications.&amp;lt;/div&amp;gt;The lab manages several social media accounts dedicated to various projects.&amp;lt;br&amp;gt;GlyGen: [https://www.linkedin.com/company/glygen/ LinkedIn], [https://mstdn.science/@glygen Mastadon]; [https://bsky.app/profile/glygen.bsky.social BlueSky] | BioCompute: [https://www.linkedin.com/company/biocompute-partnership LinkedIn] | GW Bioinfo &amp;amp; Biochem students and alumni: [https://www.linkedin.com/groups/8313079/ LinkedIn]&amp;lt;div style=&amp;quot;font-size:100%;&amp;quot;&amp;gt;This wiki system provides complementary information to the [https://hivelab.biochemistry.gwu.edu/ HIVE Lab] and is divided into the following main sections:&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Projects]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Projects]]&#039;&#039;&amp;lt;br&amp;gt;HIVE team projects fall into two major categories:&amp;lt;br&amp;gt;&lt;br /&gt;
1) Developing infrastructure for biomedical data analysis.&amp;lt;br&amp;gt;&lt;br /&gt;
2) Using that infrastructure to integrate and mine the data for knowledge.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Publications]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Publications]]&#039;&#039;&amp;lt;br&amp;gt;The HIVE team has a variety of peer-reviewed publications, book chapters, posters, brochures, and multimedia available for the public.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[People]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[People]]&#039;&#039;&amp;lt;br&amp;gt;This page contains details about the HIVE Lab team.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Collaborators]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Collaborators]]&#039;&#039;&amp;lt;br&amp;gt;This page contains details about the HIVE Lab Collaborators.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div style=&amp;quot;clear: both;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&amp;lt;div id=&amp;quot;ggw_row3&amp;quot; style=&amp;quot;display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;&amp;quot;&amp;gt;&lt;br /&gt;
    &amp;lt;div style=&amp;quot;flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC;	padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;&amp;quot;&amp;gt;&lt;br /&gt;
        &amp;lt;h3&amp;gt;[[Contact]]&amp;lt;/h3&amp;gt;&lt;br /&gt;
        &amp;lt;div style=&amp;quot;border-top: 1px solid #CCC; padding-top: 0.5em;&amp;quot;&amp;gt;&lt;br /&gt;
&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&#039;&#039;&#039;&#039;&#039;Main article:&#039;&#039;&#039; [[Contact]]&#039;&#039;&amp;lt;br&amp;gt;To contact us for more info. Please visit this page.&lt;br /&gt;
&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
    &amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== FAQs ===&lt;br /&gt;
&lt;br /&gt;
==== How can I access the HIVE platform? ====&lt;br /&gt;
At this time, we can only provide access to the HIVE platform to a limited number of researchers. If interested in using the HIVE platform, please tell us a little bit about your research needs and complete the [https://docs.google.com/forms/d/1cjaQ1AT_nMHQ2stJmOQ0XesGqoXE0cgLNViTXjZtUJ8/edit HIVE access request form]. Please wait to create an account until we have reviewed your research goals and are sure that we can support your research needs. &lt;br /&gt;
&lt;br /&gt;
==== What do I do once I&#039;m approved to use the HIVE platform? ====&lt;br /&gt;
If approved, please create an account. You will receive a confirmation email stating that your account was successfully created. Please verify your email by clicking the link included in the confirmation email. &lt;br /&gt;
&lt;br /&gt;
==== What do I do if I created an account but did not receive a confirmation email? ====&lt;br /&gt;
Make sure you entered a valid email address. If you are sure that you used a valid email and have not received a confirmation email, please wait 24 hours and check again. &lt;br /&gt;
&lt;br /&gt;
Be sure to fill out the HIVE access request form if you have not already done so. We can only grant access after ensuring that our platform can support your research needs. &lt;br /&gt;
&lt;br /&gt;
==== Account activation ====&lt;br /&gt;
We can only activate you account once you have submitted the HIVE access request form and verified your email. If approved, please allow up to 24 hours for account activation. You will receive an email confirming that your account has been activated. &lt;br /&gt;
&lt;br /&gt;
==== Requesting a demonstration ====&lt;br /&gt;
Once your account has been activated, someone from our team will contact you to schedule a 30 minute demonstration. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= External links =&lt;br /&gt;
&lt;br /&gt;
;Official&lt;br /&gt;
&lt;br /&gt;
*[https://hivelab.biochemistry.gwu.edu/ HIVE Lab official website]&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1251</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1251"/>
		<updated>2026-04-14T16:52:31Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Spring 2026 Symposium */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, Urnisha Bhuiyan &lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced. &lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod; GlyGen&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang**&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside; Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi**&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Spring 2026 Symposium ==&lt;br /&gt;
The Spring symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; April 15th, 2026 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link:&#039;&#039;&#039; [https://gwu-edu.zoom.us/j/93790551366?pwd=C0aN4b95CUbxahO9By6pTj35D9lFIx.1&amp;amp;jst=2#success https://gwu-edu.zoom.us/j/93790551366?pwd=C0aN4b95CUbxahO9By6pTj35D9lFIx.1&amp;amp;jst=2]&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|4:00-4:05 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|4:05-4:30 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Diya Kamalabharathy; Isaac Kim&lt;br /&gt;
|-&lt;br /&gt;
|4:30-4:45 PM&lt;br /&gt;
|ARGOS&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|-&lt;br /&gt;
|4:45-5:10 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Muthusekaran; Conner Cognata&lt;br /&gt;
|-&lt;br /&gt;
|5:10-5:30 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 15 + 5 mins QA - group presentation &lt;br /&gt;
|Diya Kamalabharathy; Sampurna Chakravorty; Ashley Tien&lt;br /&gt;
|-&lt;br /&gt;
|5:30-5:45PM&lt;br /&gt;
| PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|-&lt;br /&gt;
|5:45-6:00 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1250</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1250"/>
		<updated>2026-04-14T16:51:38Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Agenda (All times are in Eastern Standard Time) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, Urnisha Bhuiyan &lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced. &lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod; GlyGen&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang**&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside; Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi**&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Spring 2026 Symposium ==&lt;br /&gt;
The Spring symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; April 15th, 2026 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link:&#039;&#039;&#039; https://gwu-edu.zoom.us/j/93790551366?pwd=C0aN4b95CUbxahO9By6pTj35D9lFIx.1&amp;amp;jst=2#success&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|4:00-4:05 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|4:05-4:30 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Diya Kamalabharathy; Isaac Kim&lt;br /&gt;
|-&lt;br /&gt;
|4:30-4:45 PM&lt;br /&gt;
|ARGOS&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|-&lt;br /&gt;
|4:45-5:10 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Muthusekaran; Conner Cognata&lt;br /&gt;
|-&lt;br /&gt;
|5:10-5:30 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 15 + 5 mins QA - group presentation &lt;br /&gt;
|Diya Kamalabharathy; Sampurna Chakravorty; Ashley Tien&lt;br /&gt;
|-&lt;br /&gt;
|5:30-5:45PM&lt;br /&gt;
| PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|-&lt;br /&gt;
|5:45-6:00 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1175</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1175"/>
		<updated>2026-03-10T17:24:25Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, Urnisha Bhuiyan &lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced. &lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod; GlyGen&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang**&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside; Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi**&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Spring 2026 Symposium ==&lt;br /&gt;
The Spring symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; April 15th, 2026 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|4:00-4:05 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|4:05-4:30 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Diya Kamalabharathy; Isaac Kim&lt;br /&gt;
|-&lt;br /&gt;
|4:30-4:45 PM&lt;br /&gt;
|ARGOS&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|-&lt;br /&gt;
|4:45-5:10 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Muthusekaran; Conner Cognata&lt;br /&gt;
|-&lt;br /&gt;
|5:10-5:30 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 15 + 5 mins QA - group presentation &lt;br /&gt;
|Diya Kamalabharathy; Sampurna Chakravorty; Ashley Tien&lt;br /&gt;
|-&lt;br /&gt;
|5:30-5:55PM&lt;br /&gt;
| -&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Bakshi; Miao Wang&lt;br /&gt;
|-&lt;br /&gt;
|5:55-6:00 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1168</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1168"/>
		<updated>2026-03-04T19:36:52Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, Urnisha Bhuiyan &lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced. &lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod; GlyGen&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang**&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside; Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi**&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;**&amp;lt;/nowiki&amp;gt;Not directly involved in the semester curriculum; long-term volunteer.&lt;br /&gt;
&lt;br /&gt;
== Spring 2026 Symposium ==&lt;br /&gt;
The Spring symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; April 8th, 2026 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 4 - 6 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - TBA&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|4:00-4:05 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|4:05-4:30 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Diya Kamalabharathy; Isaac Kim&lt;br /&gt;
|-&lt;br /&gt;
|4:30-4:45 PM&lt;br /&gt;
|ARGOS&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|-&lt;br /&gt;
|4:45-5:10 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Muthusekaran; Conner Cognata&lt;br /&gt;
|-&lt;br /&gt;
|5:10-5:30 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 15 + 5 mins QA - group presentation &lt;br /&gt;
|Diya Kamalabharathy; Sampurna Chakravorty; Ashley Tien&lt;br /&gt;
|-&lt;br /&gt;
|5:30-5:55PM&lt;br /&gt;
| -&lt;br /&gt;
|&lt;br /&gt;
* 8 + 5 mins QA - presentation #1&lt;br /&gt;
* 8 + 5 mins QA - presentation #2&lt;br /&gt;
|Vishal Bakshi; Miao Wang&lt;br /&gt;
|-&lt;br /&gt;
|5:55-6:00 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1145</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1145"/>
		<updated>2026-02-10T19:16:59Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Kate Warner, Urnisha Bhuiyan &lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.&lt;br /&gt;
&lt;br /&gt;
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced. &lt;br /&gt;
&lt;br /&gt;
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.&lt;br /&gt;
&lt;br /&gt;
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.&lt;br /&gt;
&lt;br /&gt;
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1141</id>
		<title>Tradewinds Solutions Marketplace Awardable Status</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1141"/>
		<updated>2026-01-14T19:27:56Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:Awardable logo.png|thumb|227x227px|Mazumder Research Group at the George Washington University Designated “Awardable” Vendor for Department of Defense Chief Digital and Artificial Intelligence Office’s Tradewinds Solutions Marketplace]]&lt;br /&gt;
&#039;&#039;&#039;Washington, D.C. – July 11, 2025 –&#039;&#039;&#039; Mazumder Research Group at the George Washington University (GWU), a leading provider of scalable ML/AI technology for biomedical data analysis and intervention outcome prediction&#039;&#039;&#039;,&#039;&#039;&#039; today announced that it has achieved &#039;&#039;&#039;“Awardable” status&#039;&#039;&#039; through the Chief Digital and Artificial Intelligence Office’s (CDAO) Tradewinds Solutions Marketplace.&lt;br /&gt;
&lt;br /&gt;
The Tradewinds Solutions Marketplace is the premier offering of Tradewinds, the Department of Defense’s (DoD’s) suite of tools and services designed to accelerate the procurement and adoption of Artificial Intelligence (AI)/Machine Learning (ML), data, and analytics capabilities.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;&#039;&#039;We are excited that [[GW-FEAST|Federated Ecosystems for Analytics and Standardized Technologies (FEAST)]] has achieved awardable status on the Tradewinds Marketplace&#039;&#039;,&amp;quot; said Raja Mazumder, Principal Investigator of FEAST and Professor at GWU. &amp;quot;&#039;&#039;This recognition highlights the potential of our ML/AI platform to transform how healthcare interventions are guided through predictive analytics. It also increases our visibility within the government ecosystem. Being part of the Tradewinds Marketplace also creates valuable opportunities to connect with teams across agencies, academia, and industry. We look forward to working with DoD and collaborating with other innovators in the ML and AI space&#039;&#039;.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Mazumder Group’s solutions are designed to enable secure, scalable intervention outcome modeling and the creation of federated data ecosystems for the DoD and beyond. They are used by a wide range of businesses, including academic institutions, biomedical researchers and clinicians.&lt;br /&gt;
&lt;br /&gt;
[https://cdao.appiancloud.us/suite/sites/tsm-submission-portal/record/lQBHZiMy1aRWfjCLrPMB2tlYAIndwC-PEG_POB-JBeM3S5M81ZNZcoZdN1Mdglj-3XfZ_xPqB2yw6QbyXedKBVNAtvS-o5iqAm5Ruky826PflWM0Ws/view/summary. Mazumder Group&#039;s video], &#039;&#039;&#039;AI/ML forecasting and federated data ecosystems: machine learning for improving warfighter readiness, health, and recovery&#039;&#039;&#039;, accessible only by government customers on the Tradewinds Solutions Marketplace, presents an actual use case in which the group demonstrates a cloud-based federated data ecosystem designed to predict clinical outcomes and generate interpretable machine learning models across secure, siloed datasets.&lt;br /&gt;
&lt;br /&gt;
Mazumder Group was recognized among a competitive field of applicants to the Tradewinds Solutions Marketplace whose solutions demonstrated innovation, scalability, and potential impact on DoD missions. Government customers interested in viewing the video solution can create a Tradewinds Solutions Marketplace account at [http://tradewindAI.com tradewindAI.com].&lt;br /&gt;
&lt;br /&gt;
== About the Tradewinds Solutions Marketplace ==&lt;br /&gt;
The Tradewinds Solutions Marketplace is a digital repository of post-competition, readily awardable pitch videos that address the Department of Defense’s (DoD) most significant challenges in the Artificial Intelligence/Machine Learning (AI/ML), data, and analytics space. All awardable solutions have been assessed through complex scoring rubrics and competitive procedures and are available to Government customers with a Marketplace account. Government customers can create an account at www.tradewindai.com. Tradewinds is housed in the DoD’s Chief Digital Artificial Intelligence Office.&lt;br /&gt;
&lt;br /&gt;
== About Mazumder Group ==&lt;br /&gt;
The Mazumder Research Group at The George Washington University (GWU) is involved in developing the High‑performance Integrated Virtual Environment (HIVE) which is a cloud‑based bioinformatics platform co‑created with the FDA for analyzing large omics and clinical datasets. The team also leads efforts in defining bioinformatics communication standards, and builds knowledgebases focused on glycoinformatics (GlyGen), cancer biomarkers (BiomarkerKB, OncoMX, BioMuta, BioXpress), and microbiome analysis (GutFeeling KB). The group uses knowledge graphs and advanced AI/ML methods to analyze data and uncover valuable insights from clinical records, omics research, and scientific literature. Additional information is available at [[Main Page|HIVE Lab]].&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1140</id>
		<title>Tradewinds Solutions Marketplace Awardable Status</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1140"/>
		<updated>2026-01-14T19:26:45Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:Awardable logo.png|thumb|227x227px|Mazumder Research Group at the George Washington University Designated “Awardable” Vendor for Department of Defense Chief Digital and Artificial Intelligence Office’s Tradewinds Solutions Marketplace]]&lt;br /&gt;
&#039;&#039;&#039;Washington, D.C. – July 11, 2025 –&#039;&#039;&#039; Mazumder Research Group at the George Washington University (GWU), a leading provider of scalable ML/AI technology for biomedical data analysis and intervention outcome prediction&#039;&#039;&#039;,&#039;&#039;&#039; today announced that it has achieved &#039;&#039;&#039;“Awardable” status&#039;&#039;&#039; through the Chief Digital and Artificial Intelligence Office’s (CDAO) Tradewinds Solutions Marketplace.&lt;br /&gt;
&lt;br /&gt;
The Tradewinds Solutions Marketplace is the premier offering of Tradewinds, the Department of Defense’s (DoD’s) suite of tools and services designed to accelerate the procurement and adoption of Artificial Intelligence (AI)/Machine Learning (ML), data, and analytics capabilities.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;&#039;&#039;We are excited that Federated Ecosystems for Analytics and Standardized Technologies (FEAST) has achieved awardable status on the Tradewinds Marketplace&#039;&#039;,&amp;quot; said Raja Mazumder, Principal Investigator of FEAST and Professor at GWU. &amp;quot;&#039;&#039;This recognition highlights the potential of our ML/AI platform to transform how healthcare interventions are guided through predictive analytics. It also increases our visibility within the government ecosystem. Being part of the Tradewinds Marketplace also creates valuable opportunities to connect with teams across agencies, academia, and industry. We look forward to working with DoD and collaborating with other innovators in the ML and AI space&#039;&#039;.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Mazumder Group’s solutions are designed to enable secure, scalable intervention outcome modeling and the creation of federated data ecosystems for the DoD and beyond. They are used by a wide range of businesses, including academic institutions, biomedical researchers and clinicians.&lt;br /&gt;
&lt;br /&gt;
[https://cdao.appiancloud.us/suite/sites/tsm-submission-portal/record/lQBHZiMy1aRWfjCLrPMB2tlYAIndwC-PEG_POB-JBeM3S5M81ZNZcoZdN1Mdglj-3XfZ_xPqB2yw6QbyXedKBVNAtvS-o5iqAm5Ruky826PflWM0Ws/view/summary. Mazumder Group&#039;s video], &#039;&#039;&#039;AI/ML forecasting and federated data ecosystems: machine learning for improving warfighter readiness, health, and recovery&#039;&#039;&#039;, accessible only by government customers on the Tradewinds Solutions Marketplace, presents an actual use case in which the group demonstrates a cloud-based federated data ecosystem designed to predict clinical outcomes and generate interpretable machine learning models across secure, siloed datasets.&lt;br /&gt;
&lt;br /&gt;
Mazumder Group was recognized among a competitive field of applicants to the Tradewinds Solutions Marketplace whose solutions demonstrated innovation, scalability, and potential impact on DoD missions. Government customers interested in viewing the video solution can create a Tradewinds Solutions Marketplace account at [http://tradewindAI.com tradewindAI.com].&lt;br /&gt;
&lt;br /&gt;
== About the Tradewinds Solutions Marketplace ==&lt;br /&gt;
The Tradewinds Solutions Marketplace is a digital repository of post-competition, readily awardable pitch videos that address the Department of Defense’s (DoD) most significant challenges in the Artificial Intelligence/Machine Learning (AI/ML), data, and analytics space. All awardable solutions have been assessed through complex scoring rubrics and competitive procedures and are available to Government customers with a Marketplace account. Government customers can create an account at www.tradewindai.com. Tradewinds is housed in the DoD’s Chief Digital Artificial Intelligence Office.&lt;br /&gt;
&lt;br /&gt;
== About Mazumder Group ==&lt;br /&gt;
The Mazumder Research Group at The George Washington University (GWU) is involved in developing the High‑performance Integrated Virtual Environment (HIVE) which is a cloud‑based bioinformatics platform co‑created with the FDA for analyzing large omics and clinical datasets. The team also leads efforts in defining bioinformatics communication standards, and builds knowledgebases focused on glycoinformatics (GlyGen), cancer biomarkers (BiomarkerKB, OncoMX, BioMuta, BioXpress), and microbiome analysis (GutFeeling KB). The group uses knowledge graphs and advanced AI/ML methods to analyze data and uncover valuable insights from clinical records, omics research, and scientific literature. Additional information is available at [[Main Page|HIVE Lab]].&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1139</id>
		<title>Tradewinds Solutions Marketplace Awardable Status</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1139"/>
		<updated>2026-01-14T19:21:53Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:Awardable logo.png|thumb|227x227px|Mazumder Research Group at the George Washington University Designated “Awardable” Vendor for Department of Defense Chief Digital and Artificial Intelligence Office’s Tradewinds Solutions Marketplace]]&lt;br /&gt;
Washington, D.C. – July 11, 2025 – &#039;&#039;&#039;Mazumder Research Group at the George Washington University (GWU)&#039;&#039;&#039;, a &#039;&#039;&#039;leading provider&#039;&#039;&#039; of &#039;&#039;&#039;scalable ML/AI technology for biomedical data analysis and intervention outcome prediction,&#039;&#039;&#039; today announced that it has achieved “Awardable” status through the Chief Digital and Artificial Intelligence Office’s (CDAO) Tradewinds Solutions Marketplace.&lt;br /&gt;
&lt;br /&gt;
The Tradewinds Solutions Marketplace is the premier offering of Tradewinds, the Department of Defense’s (DoD’s) suite of tools and services designed to accelerate the procurement and adoption of Artificial Intelligence (AI)/Machine Learning (ML), data, and analytics capabilities.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&amp;quot;We are excited that Federated Ecosystems for Analytics and Standardized Technologies (FEAST) has achieved awardable status on the Tradewinds Marketplace,&amp;quot; said Raja Mazumder, Principal Investigator of FEAST and Professor at GWU. &amp;quot;This recognition highlights the potential of our ML/AI platform to transform how healthcare interventions are guided through predictive analytics. It also increases our visibility within the government ecosystem. Being part of the Tradewinds Marketplace also creates valuable opportunities to connect with teams across agencies, academia, and industry. We look forward to working with DoD and collaborating with other innovators in the ML and AI space.&amp;quot;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mazumder Group’s&#039;&#039;&#039; solutions are designed to &#039;&#039;&#039;enable secure, scalable intervention outcome modeling and the creation of federated data ecosystems for the DoD and beyond&#039;&#039;&#039;. They are used by a wide range of businesses, including academic institutions, biomedical researchers and clinicians.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mazumder Group&#039;s video&#039;&#039;&#039;, &#039;&#039;&#039;AI/ML forecasting and federated data ecosystems: machine learning for improving warfighter readiness, health, and recovery&#039;&#039;&#039;, accessible only by government customers on the Tradewinds Solutions Marketplace, presents an actual use case in which the group &#039;&#039;&#039;demonstrates a cloud-based federated data ecosystem designed to predict clinical outcomes and generate interpretable machine learning models across secure, siloed datasets&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mazumder Group&#039;&#039;&#039; was recognized among a competitive field of applicants to the Tradewinds Solutions Marketplace whose solutions demonstrated innovation, scalability, and potential impact on DoD missions. Government customers interested in viewing the video solution can create a Tradewinds Solutions Marketplace account at tradewindAI.com.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=File:Awardable_logo.png&amp;diff=1138</id>
		<title>File:Awardable logo.png</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=File:Awardable_logo.png&amp;diff=1138"/>
		<updated>2026-01-14T19:18:12Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Awardable logo&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1137</id>
		<title>Tradewinds Solutions Marketplace Awardable Status</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Tradewinds_Solutions_Marketplace_Awardable_Status&amp;diff=1137"/>
		<updated>2026-01-14T19:15:23Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: Created blank page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1136</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1136"/>
		<updated>2026-01-14T19:11:55Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Yashitha Pobbareddy&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS; GlyGen biocuration; BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1135</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1135"/>
		<updated>2026-01-14T19:10:52Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isaac Kim&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod; GlyGen biocuration; ARGOS&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1133</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1133"/>
		<updated>2026-01-13T16:37:58Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|-&lt;br /&gt;
|Venya Gulati&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside&lt;br /&gt;
|ARGOS; PredictMod; BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1132</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1132"/>
		<updated>2026-01-13T16:36:35Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/conner-cognata/ Conner Cognata]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB; PredictMod; GlyGen biocuration&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1129</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1129"/>
		<updated>2026-01-12T14:04:41Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 11:00 AM to 12:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning (ML) Modeling Project ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see [[Recommended Publications for Intervention Outcome Prediction Models|https://hivelab.biochemistry.gwu.edu/wiki/Recommended_Publications_for_Intervention_Outcome_Prediction_Models]]). This volunteership will involve data harmonization, model training, and pipeline documentation.&lt;br /&gt;
&lt;br /&gt;
Tasks associated with this project include:&lt;br /&gt;
&lt;br /&gt;
# Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.&lt;br /&gt;
# Preparing the data for model training and model performance evaluation&lt;br /&gt;
# Testing the modeling tutorial, PredictMod platform, and associated project tools&lt;br /&gt;
# Documentation of the ML pipeline and testing results&lt;br /&gt;
Deliverables for this project include:&lt;br /&gt;
&lt;br /&gt;
# ML-ready datasets&lt;br /&gt;
# Trained model scripts&lt;br /&gt;
# Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports&lt;br /&gt;
# Volunteership documentation (final report or weekly progress reports)&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Maria Kim; Cyrus Yeung; Jeet Vora&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1111</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1111"/>
		<updated>2026-01-05T15:11:56Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* 1. BiomarkerKB Biocuration Project Ideas */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
::: The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and weekly 1-2 paragraph reports.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1110</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1110"/>
		<updated>2026-01-05T15:09:53Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* 1. BiomarkerKB Biocuration Project Ideas */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease or for a treatment&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on NLP/LLM methods.&lt;br /&gt;
# Continue working on LLM methods started by volunteers in Fall 2025.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and weekly 1-2 paragraph reports.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer; Urnisha Bhuiyan; Rene Ranzinger&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1107</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1107"/>
		<updated>2025-12-09T15:31:33Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Maria Kim, Cyrus Yeung, Jeet Vora&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and weekly 1-2 paragraph reports.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1106</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1106"/>
		<updated>2025-12-05T19:21:56Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and weekly 1-2 paragraph reports.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1105</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1105"/>
		<updated>2025-12-05T19:15:51Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. &amp;lt;u&amp;gt;We are also looking for individuals who have previously worked with us to take on a coordinator role&amp;lt;/u&amp;gt;.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and weekly 1-2 paragraph reports.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note:&#039;&#039; For anyone interested in ARGOS, you may be assigned to another project of your choice. This project is contingent on a contract extension. Please complete your project selection in order of preference.&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
Qualifications: basic/medium programming skills, knowledgeable of basic bioinformatics platforms and skills.&lt;br /&gt;
&lt;br /&gt;
# Curate and report on currently circulating pathogens to upload to ARGOS&lt;br /&gt;
## The student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
# Report Results&lt;br /&gt;
## Defend your pathogens you have selected to be added to the database. Explain their importance and what value they would hold to the scientific community if they were added.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Spring.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to top three volunteers with exceptional presentations at the end of the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Bakshi&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Sampurna Chakravorty&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer&lt;br /&gt;
|PredictMod; ARGOS; BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1093</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1093"/>
		<updated>2025-11-25T18:09:07Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Agenda (All times are in Eastern Standard Time) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/farah-kamila/ Farah Kamila]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|ARGOS, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Namrata Oruganti&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
== Fall Symposium ==&lt;br /&gt;
The Fall symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; Nov 26th, 2025 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 3 - 5 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - https://gwu-edu.zoom.us/j/96518488501?jst=2&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|3 - 3:10 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|3:10 - 3:35 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC (Tianyi &amp;amp; Lori) intro&lt;br /&gt;
* 15 mins - PredictMod: PMID Curation for Intervention Outcome Prediction Models (IOPMs)&lt;br /&gt;
* 5 min QA&lt;br /&gt;
|Diya Kamalabharathy; Anika Sikka; Ashley Tien; Farah Kamila&lt;br /&gt;
|-&lt;br /&gt;
|3:35 - 4:00 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC intro&lt;br /&gt;
* 15 mins - Curation of species metadata using LLM &amp;amp; Visualizing glycomics databases and their features&lt;br /&gt;
* 5 min QA&lt;br /&gt;
|Diya Kamalabharathy; Harivinay P. Gujjula&lt;br /&gt;
|-&lt;br /&gt;
|4:00 - 4:25 PM&lt;br /&gt;
|Argos&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC intro&lt;br /&gt;
* 15 mins -Curation of Pathogens and QC Analysis for the Argos Project QC analysis, representative genome selection Curation of genomes 1 &amp;amp; 2&lt;br /&gt;
* 5 mins QA&lt;br /&gt;
|Miao Wang; Arhamur Rauf&lt;br /&gt;
|-&lt;br /&gt;
|4:25 - 4:50 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC (Daniall and Maria) intro &lt;br /&gt;
* 15 mins - Leveraging Large Language Models to collect Biomarker data from PubMed Abstracts&lt;br /&gt;
* 5 mins QA&lt;br /&gt;
|Namrata Oruganti; Vishal Muthusekaran; Sparsh Gupta&lt;br /&gt;
|-&lt;br /&gt;
|4: 50 - 5 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1087</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1087"/>
		<updated>2025-11-13T17:56:03Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/farah-kamila/ Farah Kamila]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|ARGOS, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Namrata Oruganti&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
== Fall Symposium ==&lt;br /&gt;
The Fall symposium will be held virtually.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Date:&#039;&#039;&#039; Nov 26th, 2025 (Wednesday)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; 3 - 5 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Zoom Link&#039;&#039;&#039; - https://gwu-edu.zoom.us/j/96518488501?jst=2&lt;br /&gt;
&lt;br /&gt;
=== Agenda (All times are in Eastern Standard Time) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Time&lt;br /&gt;
!Project&lt;br /&gt;
!Presentation Title&lt;br /&gt;
!Presenter(s)&lt;br /&gt;
|-&lt;br /&gt;
|3 - 3:10 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Welcome &amp;amp; Introduction&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|-&lt;br /&gt;
|3:10 - 3:35 PM&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC (Tianyi &amp;amp; Lori) intro&lt;br /&gt;
* 15 mins - PredictMod: PubMed Curation for Training an LLM for Recommendation&lt;br /&gt;
* 5 min QA&lt;br /&gt;
|Diya Kamalabharathy; Anika Sikka; Ashley Tien; Farah Kamila&lt;br /&gt;
|-&lt;br /&gt;
|3:35 - 4:00 PM&lt;br /&gt;
|GlyGen&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC intro&lt;br /&gt;
* 15 mins - Curation of species metadata using LLM &amp;amp; Visualizing glycomics databases and their features&lt;br /&gt;
* 5 min QA&lt;br /&gt;
|Diya Kamalabharathy; Harivinay P. Gujjula&lt;br /&gt;
|-&lt;br /&gt;
|4:00 - 4:25 PM&lt;br /&gt;
|Argos&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC intro&lt;br /&gt;
* 15 mins -Curation of Pathogens and QC Analysis for the Argos Project QC analysis, representative genome selection Curation of genomes 1 &amp;amp; 2&lt;br /&gt;
* 5 mins QA&lt;br /&gt;
|Miao Wang; Arhamur Rauf; Linford&lt;br /&gt;
|-&lt;br /&gt;
|4:25 - 4:50 PM&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
* 5 min POC (Daniall and Maria) intro &lt;br /&gt;
* 15 mins - Leveraging Large Language Models to collect Biomarker data from PubMed Abstracts&lt;br /&gt;
* 5 mins QA&lt;br /&gt;
|Namrata Oruganti; Vishal Muthusekaran&lt;br /&gt;
|-&lt;br /&gt;
|4: 50 - 5 PM&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; |Remarks&lt;br /&gt;
|Raja Mazumder&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1084</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1084"/>
		<updated>2025-11-11T19:54:28Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== 2026 Spring Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 9, 2026, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
January 12, 2026 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: January, 2026 –  April, 2026&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[[Volunteership Fall 2025|Fall 2025 Volunteership]] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;br /&gt;
&lt;br /&gt;
== Spring Symposium (TBD) ==&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1083</id>
		<title>Volunteership Spring 2026</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Spring_2026&amp;diff=1083"/>
		<updated>2025-11-11T19:47:12Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: Created page with &amp;quot;vdv&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;vdv&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1080</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1080"/>
		<updated>2025-11-03T19:04:45Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/isil-erbasol-serbes/ Isil Erbasol Serbes]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/farah-kamila/ Farah Kamila]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|ARGOS, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Namrata Oruganti&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1079</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1079"/>
		<updated>2025-10-31T19:58:26Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/farah-kamila/ Farah Kamila]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|ARGOS, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Namrata Oruganti&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1078</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=1078"/>
		<updated>2025-10-31T13:17:19Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/farah-kamila/ Farah Kamila]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Robert Ziebich&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ashley-tien/ Ashley Tien]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|ARGOS, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Namrata Oruganti&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=984</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=984"/>
		<updated>2025-09-05T14:10:11Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy*]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Farah Kamila&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Robert Ziebich&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Ashley Tien&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|ARGOS, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Namrata Oruganti&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=981</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=981"/>
		<updated>2025-09-02T17:34:07Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy*&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Farah Kamila&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Robert Ziebich&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Ashley Tien&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS, PredictMod&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_2025&amp;diff=980</id>
		<title>Volunteership 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_2025&amp;diff=980"/>
		<updated>2025-08-29T15:34:23Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;For Fall opportunities, [[Volunteership Fall 2025|click here to view our Fall 2025 Volunteership Program]].&amp;lt;h2&amp;gt;2025 Volunteer Program Details&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Dates&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;strong&amp;gt;Volunteer Zoom Kick-Off Meeting&amp;lt;/strong&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
May 27, 2025 | 3:30 to 4:30 PM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Program Dates: June 2nd, 2025 – July 25th, 2025&amp;lt;/strong&amp;gt; (8 weeks)&amp;lt;br&amp;gt;&lt;br /&gt;
Monday to Friday | Remote | No breaks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Volunteer Expectations&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Daily progress updates via Slack (scrum).&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;Regular Zoom meetings with the assigned project point of contact.&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;Expected to dedicate 5–6 hours per day to project work, with the remaining time focused on skill development or reading. &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;p style=&amp;quot;color: red;&amp;quot;&amp;gt;&amp;lt;strong&amp;gt;Important:&amp;lt;/strong&amp;gt; If the scrum is not updated for 2 consecutive days, the candidate will be &amp;lt;u&amp;gt;automatically dropped&amp;lt;/u&amp;gt; from the program.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Potential Projects&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;BiomarkerKB ([https://biomarkerkb.org biomarkerkb.org]) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;GlyGen ([https://glygen.org glygen.org]) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information. &amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;ARGOS ([https://argosdb.org argosdb.org]) project: Analyze genomics data using HIVE to identify reference genome assemblies. &amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;PredictMod ([https://hivelab.biochemistry.gwu.edu/predictmod hivelab.biochemistry.gwu.edu/predictmod]) project. Identifying datasets and harmonizing them so that they can be used to generate ML models.  &amp;lt;/li&amp;gt;&amp;lt;/ol&amp;gt;&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&amp;lt;hr&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;1. BiomarkerKB Biocuration Project Ideas&amp;lt;/h4&amp;gt;POC: Daniall Masood, Maria Kim&lt;br /&gt;
# Curate biomarkers for a specific disease (Alzheimers)&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer&lt;br /&gt;
&lt;br /&gt;
Data Identification &amp;amp; Curation: &lt;br /&gt;
&lt;br /&gt;
# Identify publicly-available datasets from scientific literature that can be used for intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
&lt;br /&gt;
Modeling &amp;amp; Integration (for those with experience in programming/ML)&lt;br /&gt;
&lt;br /&gt;
# Conduct data harmonization and pre-processing following established project pipelines to make ML-ready dataset and data dictionary.&lt;br /&gt;
# Perform model training and document ML pipeline in a BioCompute Object (BCO).&lt;br /&gt;
# Integrate model into PredictMod platform.&lt;br /&gt;
&lt;br /&gt;
Individuals with a background or interest in machine learning should reach out to lorikrammer@gwu.edu with a potential dataset to determine if it is a feasible project for the summer.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail. ~1 week&#039;s worth of work&lt;br /&gt;
## Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found. ~4-10 weeks worth of work&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;Requirements for Completion&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Note:&amp;lt;/strong&amp;gt; The following are &amp;lt;u&amp;gt;mandatory&amp;lt;/u&amp;gt;. Failure to complete any will result in an incomplete volunteer record.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Documentation&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Written Report&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Presentation &amp;amp; Slide Submission&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Present your work last week of the 8-week period.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Slides must be submitted to the Admin Team and should include:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
  &amp;lt;li&amp;gt;See Symposium Slides Guidelines below&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
Contact the Admin Team to access previously submitted slides.&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
=== Volunteers ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
|-&lt;br /&gt;
! Name&lt;br /&gt;
!Project&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.linkedin.com/in/gracesjchong/ Grace Chong]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
# PredictMod&lt;br /&gt;
# BiomarkerKB Biocuration&lt;br /&gt;
# GlyGen Biocuration&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/alma-ogunsina-4959072b1/ Alma Ogunsina]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB&lt;br /&gt;
# ARGOS&lt;br /&gt;
# PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/diya-kamalabharathy-62557935a/ Diya Kamalabharathy]&lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB Biocuration&lt;br /&gt;
# PredictMod Machine Learning&lt;br /&gt;
# GlyGen Biocuration&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/harivinay-prasad-reddy-gujjula-a06ba71bb/ Harivinay P. Gujjula]&lt;br /&gt;
|GlyGen curation&lt;br /&gt;
|&lt;br /&gt;
# GlyGen Biocuration&lt;br /&gt;
# BioMarkerKB Biocuration&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/miao-wang-88b602290/Miao&amp;amp;#x20;Wang Miao Wang]&lt;br /&gt;
|ARGOS&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB Biocuration Project Ideas&lt;br /&gt;
# FDA-ARGOS Computation and Pathogen Curation Project&lt;br /&gt;
# PredictMod Machine Learning Project Ideas&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/nahom-gebreselassie-1545ab336/ Nahom Abel]&lt;br /&gt;
|GlyGen curation&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB Biocuration&lt;br /&gt;
# GlyGen Biocuration&lt;br /&gt;
# PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/kajal-patel-cs/ Kajal Sanjaykumar Patel]&lt;br /&gt;
|GlyGen and PubMed project&lt;br /&gt;
|&lt;br /&gt;
#PredictMod&lt;br /&gt;
#BiomarkerKB&lt;br /&gt;
#GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/john-mccaffrey-b8850930a/ John McCaffrey]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# PredictMod&lt;br /&gt;
# BiomarkerKB&lt;br /&gt;
# GlyGen Biocuration &lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/nathan-ressom/ Nathan Ressom]&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|&lt;br /&gt;
# PredictMod &lt;br /&gt;
# GlyGen Biocuration&lt;br /&gt;
# BiomarkerKB Biocuration&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/aaron-ressom/ Aaron Ressom] &lt;br /&gt;
|PredictMod&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB &lt;br /&gt;
# PredictMod &lt;br /&gt;
# GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/akale-kinfe/ Akale Kinfe]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB Biocuration&lt;br /&gt;
# GlyGen Biocuration&lt;br /&gt;
# ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/aise-arpinar-a8bb9b373/?original_referer= Aise Arpinar]&lt;br /&gt;
|GlyGen curation&lt;br /&gt;
|&lt;br /&gt;
# GlyGen Biocuration&lt;br /&gt;
# BiomarkerKB Biocuration&lt;br /&gt;
# GlyGen Publication Analysis&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/piyush-pandey-906b582b5/ Piyush Pandey]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB Biocuration &lt;br /&gt;
# PredictMod &lt;br /&gt;
# GlyGen Biocuration &lt;br /&gt;
|-&lt;br /&gt;
|[http://www.linkedin.com/in/filmawit-zeru-203272363 Filmawit Zeru]&lt;br /&gt;
|GlycoSiteMiner project&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB&lt;br /&gt;
# GlyGen&lt;br /&gt;
# ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/mathias-belay-03b51a2a3/ Mathias Belay]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# GlyGen&lt;br /&gt;
# PredictMod&lt;br /&gt;
# BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/isaac-kim-b644bb231/ Isaac Kim]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB&lt;br /&gt;
# PredictMod&lt;br /&gt;
# GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/sohana-bahl-6549a2376/ Sohana Bahl]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|[https://www.linkedin.com/in/ana-vohralikova-794a4433a?utm_source=share&amp;amp;utm_campaign=share_via&amp;amp;utm_content=profile&amp;amp;utm_medium=ios_app Ana Vohralikova]&lt;br /&gt;
|Biomarker curation&lt;br /&gt;
|&lt;br /&gt;
# BiomarkerKB Biocuration Project&lt;br /&gt;
# GlyGen Biocuration Project&lt;br /&gt;
# FDA-ARGOS Computation and Pathogen&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Symposium Slide Guidelines ===&lt;br /&gt;
&#039;&#039;&#039;Content Clarity&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
   •       &#039;&#039;&#039;Keep It Simple:&#039;&#039;&#039; Use concise bullet points instead of long paragraphs. Aim for no more than 6 bullet points per slide. &lt;br /&gt;
&lt;br /&gt;
  •        &#039;&#039;&#039;Focus on Key Points:&#039;&#039;&#039; Highlight the main ideas or data you want your audience to remember. &lt;br /&gt;
&lt;br /&gt;
  •        &#039;&#039;&#039;Consistent Layout:&#039;&#039;&#039; Use a consistent layout for each slide, including fonts, colors, and background. This helps maintain a professional look. &lt;br /&gt;
&lt;br /&gt;
  •        &#039;&#039;&#039;High-Quality Images:&#039;&#039;&#039; Use high-resolution images and graphics to illustrate your points. Avoid using clip art. &lt;br /&gt;
&lt;br /&gt;
  •        &#039;&#039;&#039;Readable Fonts:&#039;&#039;&#039; Use easy-to-read fonts (e.g., Arial, Calibri) and ensure font sizes are large enough to be seen from a distance (24 pt or larger for main text). &lt;br /&gt;
&lt;br /&gt;
  •        &#039;&#039;&#039;Contrast:&#039;&#039;&#039; Ensure there is high contrast between text and background (e.g., dark text on a light background). &lt;br /&gt;
&lt;br /&gt;
  •        &#039;&#039;&#039;Citation:&#039;&#039;&#039; Cite a publication to support the information presented in proper citation format. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Outline for Symposium presentation&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
1.       Introduction: &lt;br /&gt;
&lt;br /&gt;
2.       Project Descriptions &lt;br /&gt;
&lt;br /&gt;
3.       Objectives and Goals: &lt;br /&gt;
&lt;br /&gt;
4.       Methods, Results, Achievements and Contributions: &lt;br /&gt;
&lt;br /&gt;
5.       Future Plans: &lt;br /&gt;
&lt;br /&gt;
6.       Skills and Knowledge Gained: &lt;br /&gt;
&lt;br /&gt;
7.       Acknowledgments: &lt;br /&gt;
&lt;br /&gt;
8.       Q&amp;amp;A Session: &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Outline&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;1. Introduction:&#039;&#039;&#039;  (1 slide)&lt;br /&gt;
&lt;br /&gt;
  - Briefly introduce yourself.  &lt;br /&gt;
&lt;br /&gt;
  - Add your picture and name on the introduction slide.  If it is group add the group picture.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. Project Descriptions:&#039;&#039;&#039;  (1 slide)&lt;br /&gt;
&lt;br /&gt;
  - Provide context and background information about the project.  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. Project Objectives and Goals:&#039;&#039;&#039;  (1 slide)&lt;br /&gt;
&lt;br /&gt;
  - Describe the main objectives of the project or initiative.  &lt;br /&gt;
&lt;br /&gt;
  - Discuss any additional goals or desired outcomes.  &lt;br /&gt;
&lt;br /&gt;
  - Explain why these objectives and goals are important.  &lt;br /&gt;
&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4. Methods, Results, Achievements and Contributions:&#039;&#039;&#039;  &lt;br /&gt;
&lt;br /&gt;
  -  Highlight the methods/tools used in the project.  &lt;br /&gt;
&lt;br /&gt;
  - Highlight the key results and outcomes of the project.  &lt;br /&gt;
&lt;br /&gt;
  - Discuss the most significant achievements and milestones reached.  &lt;br /&gt;
&lt;br /&gt;
  - Explain how each member of the team project contributed to the project (for group project) &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; &#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. Future Plans&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
  - Next steps or future plans for the project&lt;br /&gt;
&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6. Skills and Knowledge Gained:&#039;&#039;&#039;  (1 slide)&lt;br /&gt;
&lt;br /&gt;
  -   Detail any technical skills acquired or improved.  &lt;br /&gt;
&lt;br /&gt;
  - Highlight any soft skills, such as communication or teamwork, that were developed.  &lt;br /&gt;
&lt;br /&gt;
  - Discuss new knowledge gained in specific areas or subjects.  &lt;br /&gt;
&lt;br /&gt;
  -  Share any personal reflections on the experience and what was learned.  &lt;br /&gt;
&lt;br /&gt;
  - Discuss any challenges or obstacles encountered and how they were overcome.  &lt;br /&gt;
&lt;br /&gt;
  - Provide key insights or lessons learned from the project.  &lt;br /&gt;
&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;7. Acknowledgments:&#039;&#039;&#039;  &#039;&#039;&#039;:&#039;&#039;&#039;  (1 slide)&lt;br /&gt;
&lt;br /&gt;
  - Acknowledge the contributions of team members and collaborators.  &lt;br /&gt;
&lt;br /&gt;
- Recognize the guidance and support of mentors and advisors.  &lt;br /&gt;
&lt;br /&gt;
  - Acknowledge the Project Funding.  Eg. CFDE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;8. Q&amp;amp;A Session:&#039;&#039;&#039;  &lt;br /&gt;
&lt;br /&gt;
  - Invite the audience to ask questions and engage in discussion.  &lt;br /&gt;
&lt;br /&gt;
  - Provide clear and thoughtful responses to audience questions.  &lt;br /&gt;
&lt;br /&gt;
  - Offer closing remarks and thank the audience for their participation. &lt;br /&gt;
&lt;br /&gt;
Note – If you have limited presentation time you can also merge few topics into one.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=979</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=979"/>
		<updated>2025-08-29T15:23:10Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy*&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Farah Kamila&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Robert Ziebich&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|ARGOS&lt;br /&gt;
|Christie Woodside, Jonathon Keeney&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=951</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=951"/>
		<updated>2025-08-26T19:00:36Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy*&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Farah Kamila&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Robert Ziebich&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=950</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=950"/>
		<updated>2025-08-25T20:43:15Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy*&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Farah Kamila&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, ARGOS, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Robert Ziebich&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod, BioMarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Arhamur Rauf&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS, GlyGen, PredictMod&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=946</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=946"/>
		<updated>2025-08-22T16:12:28Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast cancers, biomarkers and glycans, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
# Prepare a Wikipage to showcase the validated PMIDs.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy*&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula*&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Anagha Kalle&lt;br /&gt;
|PredictMod&lt;br /&gt;
|Lori Krammer, Tianyi Wang&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|GlyGen&lt;br /&gt;
|Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Farah Kamila&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod, ARGOS, BioMarkerKB&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=939</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=939"/>
		<updated>2025-08-20T18:19:17Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast, and liver cancer, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the Fall.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy*&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay*&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Anagha Kalle&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|Daniall Masood, Maria Kim&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Miao Wang*&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|ARGOS&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Returning volunteer.&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=935</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=935"/>
		<updated>2025-08-20T18:15:15Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: /* Volunteers (TBD) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast, and liver cancer, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!POC Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Isil Erbasol Serbes&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod, BiomarkerKB, ARGOS&lt;br /&gt;
|-&lt;br /&gt;
|Ramtin Mashhoon&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Anagha Kalle&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|PredictMod&lt;br /&gt;
|-&lt;br /&gt;
|Vishal Muthusekaran&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Adonay Awet&lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
	<entry>
		<id>https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=933</id>
		<title>Volunteership Fall 2025</title>
		<link rel="alternate" type="text/html" href="https://hivelab.biochemistry.gwu.edu/wiki/index.php?title=Volunteership_Fall_2025&amp;diff=933"/>
		<updated>2025-08-18T15:08:58Z</updated>

		<summary type="html">&lt;p&gt;Hivelabwikiadmin: Added Mathias&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2025 Volunteer Program Details ==&lt;br /&gt;
&lt;br /&gt;
=== Dates ===&lt;br /&gt;
&#039;&#039;&#039;Application Deadline&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 22, 2025, Noon (email your updated resume and projects in order of preference). Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Volunteer Zoom Kick-Off Meeting&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
August 25, 2025 | 4:00 to 5:00 PM&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Program Dates: September 1st, 2025 – November 30th, 2025&#039;&#039;&#039; (13 weeks)&lt;br /&gt;
&lt;br /&gt;
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)&lt;br /&gt;
&lt;br /&gt;
[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteer Expectations ===&lt;br /&gt;
&lt;br /&gt;
# Minimum commitment of 10 hours per week.&lt;br /&gt;
# Progress updates via Slack at least 3 days per week (scrum).&lt;br /&gt;
# Regular Zoom meetings with the assigned project point of contact.&lt;br /&gt;
# Attend some lectures or seminars remotely (max 4-5).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&#039;&#039;Important:&#039;&#039;&#039; If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Potential Projects ===&lt;br /&gt;
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.&lt;br /&gt;
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.&lt;br /&gt;
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.&lt;br /&gt;
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.&lt;br /&gt;
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.&#039;&#039;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== 1. BiomarkerKB Biocuration Project Ideas ====&lt;br /&gt;
POC: Daniall Masood, Maria Kim&lt;br /&gt;
&lt;br /&gt;
# Curate biomarkers for a specific disease&lt;br /&gt;
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.&lt;br /&gt;
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.&lt;br /&gt;
# Top 50 biomarkers&lt;br /&gt;
## Curate the top 50 biomarkers for biomarkerkb.org.&lt;br /&gt;
## Define what constitutes a top 50 biomarker.&lt;br /&gt;
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.&lt;br /&gt;
# Biocuration of biomarkers from NLP/LLM work&lt;br /&gt;
## Use the biomarkers collected from NLP work.&lt;br /&gt;
## Curate biomarkers. Data provided was not provided in the biomarker data model.&lt;br /&gt;
## While curating the biomarkers, check if data collected from NLP is correct.&lt;br /&gt;
## After completion, the student can start using curated data to work on the NLP/LLM method.&lt;br /&gt;
# Curate biomarkers for a treatment&lt;br /&gt;
## See #1 above.&lt;br /&gt;
# Continue working on LLM methods started by volunteers over the summer.&lt;br /&gt;
## The data is available as well as some preliminary research and work done by previous volunteers in this area.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
&lt;br /&gt;
==== 2. GlyGen Biocuration Project Ideas ====&lt;br /&gt;
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner&lt;br /&gt;
&lt;br /&gt;
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, &amp;quot;human,&amp;quot; &amp;quot;man,&amp;quot; and &amp;quot;h. sapiens&amp;quot; all map to the scientific species name &amp;quot;Homo sapiens.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.&lt;br /&gt;
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.&lt;br /&gt;
# Finding papers based on titles and author lists that may contain spelling errors.&lt;br /&gt;
# Interacting and discussing with other curators in case terms are mapped differently.&lt;br /&gt;
&lt;br /&gt;
If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. GlyGen Publication Analysis Project Ideas&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Rene Ranzinger and Urnisha Bhuiyan&lt;br /&gt;
&lt;br /&gt;
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.&lt;br /&gt;
&lt;br /&gt;
The project involves:&lt;br /&gt;
&lt;br /&gt;
# Using the PubMed web API to filter publications based on keywords.&lt;br /&gt;
# Analyzing paper abstracts to identify research institutions and groups that form the community.&lt;br /&gt;
# Filtering the community list to exclude unrelated co-authors.&lt;br /&gt;
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker&lt;br /&gt;
&lt;br /&gt;
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.&lt;br /&gt;
&lt;br /&gt;
==== 4. PredictMod Machine Learning Project Ideas ====&lt;br /&gt;
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)&lt;br /&gt;
&lt;br /&gt;
Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast, and liver cancer, and focus on indicators such as condition, intervention, and response.&lt;br /&gt;
&lt;br /&gt;
PMID curation involves:&lt;br /&gt;
&lt;br /&gt;
# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.&lt;br /&gt;
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.&lt;br /&gt;
# Review peer curations and resolve annotation conflicts.&lt;br /&gt;
&lt;br /&gt;
Interested individuals should reach out to lorikrammer@gwu.edu.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5. FDA-ARGOS Computation and Pathogen Curation Project&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
POC: Christie Woodside, Jonathon Keeney&lt;br /&gt;
&lt;br /&gt;
# Update data tables for more efficient computations&lt;br /&gt;
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.&lt;br /&gt;
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.&lt;br /&gt;
# Curate and report on current pathogens to upload to ARGOS&lt;br /&gt;
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.&lt;br /&gt;
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.&lt;br /&gt;
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.&lt;br /&gt;
# QC Analysis using HIVE&lt;br /&gt;
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.&lt;br /&gt;
## The results will be added to our ARGOS database.&lt;br /&gt;
&lt;br /&gt;
If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Requirements for Completion ===&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The following are mandatory. Failure to complete any will result in an incomplete volunteer record.&lt;br /&gt;
&lt;br /&gt;
==== Documentation ====&lt;br /&gt;
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.&lt;br /&gt;
&lt;br /&gt;
==== Written Report ====&lt;br /&gt;
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.&lt;br /&gt;
&lt;br /&gt;
==== Presentation &amp;amp; Slide Submission ====&lt;br /&gt;
Present your work last week of the 13-week period.&lt;br /&gt;
&lt;br /&gt;
Slides must be submitted to the POCs.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Completion Certificate ===&lt;br /&gt;
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Contact ===&lt;br /&gt;
mazumder_lab@gwu.edu.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Volunteers (TBD) ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
!Name&lt;br /&gt;
!Project Assigned&lt;br /&gt;
!Projects Interested&lt;br /&gt;
|-&lt;br /&gt;
|Diya Kamalabharathy&lt;br /&gt;
|&lt;br /&gt;
|PredictMod; Glyco web development&lt;br /&gt;
|-&lt;br /&gt;
|Anika Sikka&lt;br /&gt;
|&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Akale Kinfe&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Nahom Abel&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Harivinay P. Gujjula&lt;br /&gt;
|&lt;br /&gt;
|GlyGen&lt;br /&gt;
|-&lt;br /&gt;
|Sparsh Gupta&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|-&lt;br /&gt;
|Mathias Belay&lt;br /&gt;
|&lt;br /&gt;
|BiomarkerKB&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Hivelabwikiadmin</name></author>
	</entry>
</feed>