Symposium 2025: Difference between revisions

Latest revision as of 13:39, 31 July 2025

The HIVE Lab summer symposium is scheduled for Thursday July 31, 2025. It is an exciting time for the lab volunteers and interns to present their findings on the projects they worked on for 8 weeks.

Program and Information

Symposium Venue

The HIVE lab symposium will held in person at The George Washington University, Washington DC with an option to join virtually.

In Person - Ross 643, Ross Hall, School of Health and Medical Sciences, The George Washington University, Washington DC (MAP)

Virtual - Zoom

Zoom Link - https://gwu-edu.zoom.us/j/98841344003

Add to - Google Calendar | Outlook Calendar | Yahoo Calendar

Agenda

All times in Eastern Standard Time

Time (ET)	Project	Title	Presenter
10:00am	Welcome and Introduction		Michael Tiemeyer (10 min)
Group 1 Moderator : Nathan Edwards
10:10am	CFDE	Integrating Biocuration and Data Standardization to Generate Machine Learning-Ready Glycan Datasets + 5 min Q/A	Ana Jaramillo and Yuxin Zou (20 min)
10:30am	CFDE	Machine Learning Models for Linkage Prediction in Glycan Images + 5 min Q/A	Campbell Ross (15 min)
10:45am	CFDE	A Graph-Based AI Workflow for Mining Glycan Biomarkers and Related Annotations from Publications + 5 min Q/A	Cyrus Chun Hong Au Yeung (15 min)
11:00am	BiomarkerKB	Comprehensive Identification and LLM Based Curation of the Top 50 Clinically Relevant Disease Biomarkers + 5 min Q/A	Sohana Bahl, Isaac Kim, Sparsh Gupta (15 min)
11:15am	BiomarkerKB	Lupus Discovery Project + 5 min Q/A	Nathan Ressom, Ana Vohralikova, Mathias Belay (15 min)
11:30am	BiomarkerKB	Systematic Curation and Large Language Model-Based Extraction of Alzheimer’s Disease Biomarkers	John McCaffery, Alma Ogunsina, Akale Kinfe (15 min)
11:45am	Open Q and A		All (30 min)
12:30pm	LUNCH Ross 505 (90 mins)
Group 2 Moderator : Rene Ranzinger
1:55pm	Introduction		Raja Mazumder
2:00pm	GlyGen	GlyGen Biocuration Project + 5 min Q/A	Aise Arpinar, Haravinay P. Gujjulla, Nahom Abel (20 min)
2:20pm	Predictmod Curation	PredictMod: PubMed Curation for Training an LLM for Recommendation + 5 min Q/A	Grace Chong, Aaron Ressom, Diya Kamalabharathy (15 min)
2:35pm	Glycobiology Web Development	A Resource Drill Down and Visualization for the Glyspace Alliance + 5 min Q/A	Diya Kamalabharathy (5 min)
2:40pm	Predictmod AI-READI	Robust Classification of Glycemic Health States from Continuous Glucose + 5 min Q/A	Nikhil Arethiya (15 min)
2:55pm	Argos	Curation of Emerging Pathogen Genomes for FDA-ARGOS Database Expansion + 5 min Q/A	Miao Wang (15 min)
3:10pm	GlycoSiteMiner	Categorization of glycan names	Filmawit Zeru (15 min)
3:25pm	Open Q and A \| Closing Remarks		All (20 min) \| Raja Mazumder
3:45pm	Certificate Distribution \| Photo \| Break (15mins)
4:00pm	Guest Lecture: Crafting a Strong LinkedIn Profile and Resume		Sara Orrick (Senior Career Consultant (45 Mins)

Project Description

CFDE Project

The CFDE project focuses on integrating biocuration and data standardization to generate machine learning-ready glycan datasets. It brings together curated information and structured metadata to ensure that glycan-related data is both interoperable and computationally accessible. As part of this effort, the project supports the development of machine learning models for linkage prediction in glycan images, enabling automated interpretation of glycan structures from visual representations. In addition, a graph-based AI workflow is being implemented to mine glycan biomarkers and related annotations from scientific publications, helping to uncover novel insights and associations. These approaches collectively advance the integration of glycobiology into broader biomedical research by making glycan data more usable for downstream AI applications.

GlyGen Project

The GlyGen Biocuration project focuses on integrating legacy, yet valuable, data from the CarbBank and CFG databases into the GlyGen infrastructure. A key challenge is mapping metadata, such as species names and publication references, to standardized dictionaries and ontologies. While most entries have been automatically matched using custom scripts, remaining inconsistencies, including outdated, misspelled, or abbreviated terms, require manual curation using resources such as Google, PubMed, and domain-specific dictionaries and ontologies.

BiomarkerKB Biocuration Project

The Biomarker Biocuration project focuses on biomarker curation from abstracts and publications in the BiomarkerKB data model. A key challenge in curating biomarkers is the vast amount of data that is present over various publications. Manual curation requires reading, inferring, and understanding key elements of biomarker data and being able to map it to the defined biomarker data model. LLM methodologies will help immensely in being able to recognize biomarker and condition data and being able to map information found into the data model while also automatically mapping other contextual and standardized data to the model to allow data to be AI and machine learning ready.

ArgosDB Curation Project

This project focuses on evaluating and curating high-quality genomes of emerging and clinically relevant pathogens, with an emphasis on fungal species. Using public genomic repositories and FDA-ARGOS inclusion criteria, I identify candidate organisms for database expansion to support diagnostic assay development and public health surveillance.

PredictMod Curation Project

Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast, and liver cancer, and focus on indicators such as condition, intervention, and response.

PredictMod AI-READI Project

This project creates a data-driven pipeline that uses continuous glucose monitoring (CGM) data to distinguish truly healthy individuals from those with underlying glycemic dysregulation, even if they're mislabeled. Using the AI-READI dataset funded by the NIH Bridge2AI program, the pipeline combines unsupervised clustering, handcrafted feature engineering, and LSTM-based deep learning to identify metabolic health states and extract insights into glycemic variability, with potential real-time applications in personalized health monitoring.

Glycobiology Web Development

This project involves creating an EDAM ontology of glycobiology and glycoinformatics resources for the Glyspace alliance web development. It includes a resource compilation and organization of everything related to Glyspace to create a more user friendly tool to access these resources. This process involves compiling a list of resources that are associated with the Glyspace alliance and sorting them by type, topics, tool operation, and data.

GlycositeMiner Project

Establishing a set of rules to broadly classify glycan names into structure-based and function-based categories, and applying these rules to organize entries in the glycan dictionary. These classification rules will evolve over time, enabling the creation of more refined hierarchical categories. The ultimate goal of this rule-based glycan name categorization is to support automated literature mining, specifically for identifying glycan names in PubMed articles.

@@ Line 8: / Line 8: @@
 The HIVE lab symposium will held in person at The George Washington University, Washington DC with an option to join virtually.
-In Person - Ross 637, Ross Hall, School of Health and Medical Sciences, The George Washington University, Washington DC ([https://maps.app.goo.gl/PHQmZacA4hWDvTCh6 MAP])
+In Person - Ross 643, Ross Hall, School of Health and Medical Sciences, The George Washington University, Washington DC ([https://maps.app.goo.gl/PHQmZacA4hWDvTCh6 MAP])
 Virtual - Zoom
@@ Line 32: / Line 32: @@
 |10:10am
 |CFDE
-|Integrating Biocuration and Data Standardization to Generate Machine Learning-Ready Glycan Datasets
+|Integrating Biocuration and Data Standardization to Generate Machine Learning-Ready Glycan Datasets + 5 min Q/A
 |Ana Jaramillo and Yuxin Zou (20 min)
 |-
 |10:30am
 |CFDE
-|Machine Learning Models for Linkage Prediction in Glycan Images
+|Machine Learning Models for Linkage Prediction in Glycan Images + 5 min Q/A
 |Campbell Ross (15 min)
 |-
 |10:45am
 |CFDE
-|A Graph-Based AI Workflow for Mining Glycan Biomarkers and Related Annotations from Publications
+|A Graph-Based AI Workflow for Mining Glycan Biomarkers and Related Annotations from Publications + 5 min Q/A
 |Cyrus Chun Hong Au Yeung (15 min)
 |-
 |11:00am
 |BiomarkerKB
-|Comprehensive Identification and LLM Based Curation of the Top 50 Clinically Relevant Disease Biomarkers
+|Comprehensive Identification and LLM Based Curation of the Top 50 Clinically Relevant Disease Biomarkers + 5 min Q/A
 |Sohana Bahl, Isaac Kim, Sparsh Gupta (15 min)
 |-
 |11:15am
 |BiomarkerKB
-|TBA
+|Lupus Discovery Project + 5 min Q/A
 |Nathan Ressom, Ana Vohralikova, Mathias Belay (15 min)
 |-
@@ Line 64: / Line 64: @@
 |'''All (30 min)'''
 |-
-|12:30pm
+|'''12:30pm'''
-| colspan="3" |                                                                                                          '''LUNCH (90 mins)'''
+| colspan="3" |                                                                                                          '''LUNCH Ross 505 (90 mins)'''
 |-
 | colspan="4" |                                                                                                                         ''Group 2 Moderator : Rene Ranzinger''
+|-
+|1:55pm
+| colspan="2" |'''Introduction'''
+|Raja Mazumder
 |-
 |2:00pm
 |GlyGen
-|GlyGen Biocuration Project
+|GlyGen Biocuration Project + 5 min Q/A
 |Aise Arpinar, Haravinay P. Gujjulla, Nahom Abel (20 min)
 |-
 |2:20pm
+|Predictmod Curation
+|PredictMod: PubMed Curation for Training an LLM for Recommendation + 5 min Q/A
+|Grace Chong, Aaron Ressom, Diya Kamalabharathy (15 min)
+|-
+|2:35pm
 |Glycobiology Web Development
-|A Resource Drill Down and Visualization for the Glyspace Alliance
+|A Resource Drill Down and Visualization for the Glyspace Alliance + 5 min Q/A
 |Diya Kamalabharathy (5 min)
-|-
-|2:25pm
-|Predictmod Curation
-|PredictMod: PubMed Curation for Training an LLM for Recommendation
-|Grace Chong, Aaron Ressom, Diya Kamalabharathy (15 min)
 |-
 |2:40pm
 |Predictmod AI-READI
-|Robust Classification of Glycemic Health States from Continuous Glucose
+|Robust Classification of Glycemic Health States from Continuous Glucose + 5 min Q/A
 |Nikhil Arethiya (15 min)
 |-
 |2:55pm
 |Argos
-|Curation of Emerging Pathogen Genomes for FDA-ARGOS Database Expansion
+|Curation of Emerging Pathogen Genomes for FDA-ARGOS Database Expansion + 5 min Q/A
 |Miao Wang (15 min)
 |-
 |3:10pm
 |GlycoSiteMiner
-|TBA
+|Categorization of glycan names
-|(15 min)
+|Filmawit Zeru (15 min)
 |-
 |'''3:25pm'''
 | colspan="2" |'''<nowiki>Open Q and A  | Closing Remarks</nowiki>'''
 |'''<nowiki>All (20 min) | Raja Mazumder</nowiki>'''
+|-
+|''3:45pm''
+| colspan="3" |''<nowiki>Certificate Distribution | Photo | Break (15mins)</nowiki>''
+|-
+|4:00pm
+| colspan="2" |'''Guest Lecture: Crafting a Strong LinkedIn Profile and Resume'''
+|'''Sara Orrick (Senior Career Consultant (45 Mins)'''
 |}
@@ Line 113: / Line 124: @@
 === BiomarkerKB Biocuration Project ===
-The Biomarker Biocuration project focuses on biomarker curation from abstracts and publications in the BiomarkerKB data model. A key challenge in curating biomarkers is the vast amount of data that is present over various publications. Manual curation requires reading, inferring, and understanding key elements of biomarker data and being able to map it to the defined biomarker data model. LLM methodologies will help immensely in being able to recognize biomarker and condition data and being able to map information found into the data model while also automatically mapping other contextual and standardized data to the model to allow data to be AI andmachine leanring ready.
+The Biomarker Biocuration project focuses on biomarker curation from abstracts and publications in the BiomarkerKB data model. A key challenge in curating biomarkers is the vast amount of data that is present over various publications. Manual curation requires reading, inferring, and understanding key elements of biomarker data and being able to map it to the defined biomarker data model. LLM methodologies will help immensely in being able to recognize biomarker and condition data and being able to map information found into the data model while also automatically mapping other contextual and standardized data to the model to allow data to be AI and machine learning ready.
 === ArgosDB Curation Project ===
 This project focuses on evaluating and curating high-quality genomes of emerging and clinically relevant pathogens, with an emphasis on fungal species. Using public genomic repositories and FDA-ARGOS inclusion criteria, I identify candidate organisms for database expansion to support diagnostic assay development and public health surveillance.
+=== PredictMod Curation Project ===
+Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast, and liver cancer, and focus on indicators such as condition, intervention, and response.
+=== PredictMod AI-READI Project ===
+This project creates a data-driven pipeline that uses continuous glucose monitoring (CGM) data to distinguish truly healthy individuals from those with underlying glycemic dysregulation, even if they're mislabeled. Using the AI-READI dataset funded by the NIH Bridge2AI program, the pipeline combines unsupervised clustering, handcrafted feature engineering, and LSTM-based deep learning to identify metabolic health states and extract insights into glycemic variability, with potential real-time applications in personalized health monitoring.
+=== Glycobiology Web Development ===
+This project involves creating an EDAM ontology of glycobiology and glycoinformatics resources for the Glyspace alliance web development. It includes a resource compilation and organization of everything related to Glyspace to create a more user friendly tool to access these resources. This process involves compiling a list of resources that are associated with the Glyspace alliance and sorting them by type, topics, tool operation, and data.
+=== '''GlycositeMiner Project''' ===
+Establishing a set of rules to broadly classify glycan names into structure-based and function-based categories, and applying these rules to organize entries in the glycan dictionary. These classification rules will evolve over time, enabling the creation of more refined hierarchical categories. The ultimate goal of this rule-based glycan name categorization is to support automated literature mining, specifically for identifying glycan names in PubMed articles.

Symposium 2025: Difference between revisions

Latest revision as of 13:39, 31 July 2025

Contents

Program and Information

Symposium Venue

Agenda

Project Description

CFDE Project

GlyGen Project

BiomarkerKB Biocuration Project

ArgosDB Curation Project

PredictMod Curation Project

PredictMod AI-READI Project

Glycobiology Web Development

GlycositeMiner Project

Navigation menu

Symposium 2025: Difference between revisions

Latest revision as of 13:39, 31 July 2025

Program and Information

Symposium Venue

Agenda

Project Description

CFDE Project

GlyGen Project

BiomarkerKB Biocuration Project

ArgosDB Curation Project

PredictMod Curation Project

PredictMod AI-READI Project

Glycobiology Web Development

GlycositeMiner Project

Navigation menu

Search