Symposium 2025
The HIVE Lab summer symposium is scheduled for Thursday July 31, 2025. It is an exciting time for the lab volunteers and interns to present their findings on the projects they worked on for 8 weeks.

Program and Information and Registeration
Symposium Venue
The HIVE lab symposium will held in person at The George Washington University, Washington DC with an option to join virtually.
In Person - Ross 647, Ross Hall, School of Health and Medical Sciences, The George Washington University, Washington DC (MAP)
Virtual - Zoom (Click on the Link to register below)
Zoom Registration
Registration Link - https://gwu-edu.zoom.us/meeting/register/nziV1ZJQRt6V4ZOEfjpU_g
Add to - Google Calendar | Outlook Calendar | Yahoo Calendar
Agenda
All times in Eastern Standard Time
Time (ET) | Project | Title | Presenter |
10:00am | Welcome and Introduction | Michael Tiemeyer (10 min) | |
Group 1 Moderator : Nathan Edwards | |||
10:10am | CFDE | Integrating Biocuration and Data Standardization to Generate Machine Learning-Ready Glycan Datasets | Ana Jaramillo and Yuxin Zou (20 min) |
10:30am | CFDE | Machine Learning Models for Linkage Prediction in Glycan Images | Campbell Ross (15 min) |
10:45am | CFDE | A Graph-Based AI Workflow for Mining Glycan Biomarkers and Related Annotations from Publications | Cyrus Chun Hong Au Yeung (15 min) |
11:00am | BiomarkerKB | TBA | Sohana Bahl, Isaac Kim, Sparsh Gupta (15 min) |
11:15am | BiomarkerKB | TBA | Nathan Ressom, Ana Vohralikova, Mathias Belay (15 min) |
11:30am | BiomarkerKB | TBA | John McCaffery, Alma Ogunsina, Akale Kinfe (15 min) |
11:45am | Open Q and A | All (30 min) | |
12:30pm | LUNCH (90 mins) | ||
Group 2 Moderator : Rene Ranzinger | |||
2:00pm | Predictmod AI-READI | Robust Classification of Glycemic Health States from Continuous Glucose | Nikhil Arethiya (15 min) |
2:15pm | Predictmod Curation | PredictMod: PubMed Curation for Training an LLM for Recommendation | Grace Chong, Aaron Ressom, Diya Kamalabharathy (15 min) |
2:30pm | Glycobiology Web Development | A Resource Drill Down and Visualization for the Glyspace Alliance | Diya Kamalabharathy (5 min) |
2:35pm | Argos | Curation of Emerging Pathogen Genomes for FDA-ARGOS Database Expansion | Miao Wang (15 min) |
2:50pm | GlyGen | GlyGen Biocuration Project | Aise Arpinar, Haravinay P. Gujjulla, Nahom Abel (20 min) |
3:10pm | GlycoSiteMiner | TBA | (15 min) |
3:25pm | Open Q and A | Closing Remarks | All (20 min) | Raja Mazumder |
Project Description
CFDE Project
The CFDE project focuses on integrating biocuration and data standardization to generate machine learning-ready glycan datasets. It brings together curated information and structured metadata to ensure that glycan-related data is both interoperable and computationally accessible. As part of this effort, the project supports the development of machine learning models for linkage prediction in glycan images, enabling automated interpretation of glycan structures from visual representations. In addition, a graph-based AI workflow is being implemented to mine glycan biomarkers and related annotations from scientific publications, helping to uncover novel insights and associations. These approaches collectively advance the integration of glycobiology into broader biomedical research by making glycan data more usable for downstream AI applications.
GlyGen Project
The GlyGen Biocuration project focuses on integrating legacy, yet valuable, data from the CarbBank and CFG databases into the GlyGen infrastructure. A key challenge is mapping metadata, such as species names and publication references, to standardized dictionaries and ontologies. While most entries have been automatically matched using custom scripts, remaining inconsistencies, including outdated, misspelled, or abbreviated terms, require manual curation using resources such as Google, PubMed, and domain-specific dictionaries and ontologies.
BiomarkerKB Biocuration Project
The Biomarker Biocuration project focuses on biomarker curation from abstracts and publications in the BiomarkerKB data model. A key challenge in curating biomarkers is the vast amount of data that is present over various publications. Manual curation requires reading, inferring, and understanding key elements of biomarker data and being able to map it to the defined biomarker data model. LLM methodologies will help immensely in being able to recognize biomarker and condition data and being able to map information found into the data model while also automatically mapping other contextual and standardized data to the model to allow data to be AI andmachine leanring ready.
ArgosDB Curation Project
This project focuses on evaluating and curating high-quality genomes of emerging and clinically relevant pathogens, with an emphasis on fungal species. Using public genomic repositories and FDA-ARGOS inclusion criteria, I identify candidate organisms for database expansion to support diagnostic assay development and public health surveillance.