HIVE Lab - User contributions [en]

Volunteership Fall 2025

2025-08-15T16:31:57Z

Daniallmasood: /* 1. BiomarkerKB Biocuration Project Ideas */

== 2025 Volunteer Program Details ==

=== Dates ===
'''Application Deadline'''

August 22, 2025, Noon (email your updated resume and projects in order of preference)

'''Volunteer Zoom Kick-Off Meeting'''

August 25, 2025 | 4:00 to 5:00 PM

'''Program Dates: September 1st, 2025 – November 30th, 2025''' (13 weeks)

Remote | Hybrid for GW employees and students (Ross Hall 5th floor)

[https://hivelab.biochemistry.gwu.edu/wiki/Volunteership_2025 Summer 2025 Volunteership] (Closed)
----

=== Volunteer Expectations ===

# Minimum commitment of 10 hours per week.
# Progress updates via Slack at least 3 days per week (scrum).
# Regular Zoom meetings with the assigned project point of contact.
# Attending remotely some lectures or seminars (max 4-5).

'''''Important:''' If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.''
----

=== Potential Projects ===
We are excited to continue our bioinformatics volunteership program in Fall 2025. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.
# BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.
# GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.
# ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.
# PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.

''Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen.''
----

==== 1. BiomarkerKB Biocuration Project Ideas ====
POC: Daniall Masood, Maria Kim

# Curate biomarkers for a specific disease
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.
# Top 50 biomarkers
## Curate the top 50 biomarkers for biomarkerkb.org.
## Define what constitutes a top 50 biomarker.
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.
# Biocuration of biomarkers from NLP/LLM work
## Use the biomarkers collected from NLP work.
## Curate biomarkers. Data provided was not provided in the biomarker data model.
## While curating the biomarkers, check if data collected from NLP is correct.
## After completion, the student can start using curated data to work on the NLP/LLM method.
# Curate biomarkers for a treatment
## See #1 above.
# Continue working on LLM methods started by volunteers over the summer.
## The data is available as well as some preliminary research and work done by previous volunteers in this area.

If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.

==== 2. GlyGen Biocuration Project Ideas ====
POC: Rene Ranzinger, Urnisha Bhuiyan, Kate Warner

Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding. However, the data within these databases remains highly valuable to the community. Integrating these datasets into modern databases or knowledgebases, such as GlyGen, presents a challenge because much of the valuable metadata (e.g., species, tissue, disease, cell line) annotations are free-text terms that do not align with established standard dictionaries and ontologies used in modern resources. Automated matching of this information with dictionaries or ontologies is often not possible due to the use of synonyms, spelling errors, or abbreviations. For example, "human," "man," and "h. sapiens" all map to the scientific species name "Homo sapiens."

The GlyGen project aims to make datasets from two older databases (CarbBank, CFG) accessible by migrating the data and metadata into our database. For this project, we are seeking curators with a medical or biology background who are interested in helping map metadata terms from these old databases to standard dictionaries and ontologies.

The project involves:

# Using internet resources (e.g., Google, Wikipedia) to identify terms used in the old database.
# Mapping identified terms to corresponding dictionaries and ontologies using the webpages and search interfaces of these projects.
# Finding papers based on titles and author lists that may contain spelling errors.
# Interacting and discussing with other curators in case terms are mapped differently.

If you have any other ideas or methods you would like to focus on, please reach out to rene@ccrc.uga.edu to discuss them.

'''3. GlyGen Publication Analysis Project Ideas'''

POC: Rene Ranzinger and Urnisha Bhuiyan

One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.

The project involves:

# Using the PubMed web API to filter publications based on keywords.
# Analyzing paper abstracts to identify research institutions and groups that form the community.
# Filtering the community list to exclude unrelated co-authors.
# Prioritize papers identified by GlycoSiteMiner for curation via TableMaker

A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.

==== 4. PredictMod Machine Learning Project Ideas ====
POC: Lori Krammer, Tianyi Wang, Pat McNeely (optional)

Identifying relevant and useful publicly-available datasets for machine learning is currently a resource-intensive task. This curation project aims to develop a corpus for training an AI model to recommend PMIDs with publicly-available datasets useful for intervention outcome prediction models. The corpus will include an annotation spreadsheet + annotated PDFs for PubMed articles relevant to prostate, lung, breast, and liver cancer, and focus on indicators such as condition, intervention, and response.

PMID curation involves:

# Identify potentially relevant PMIDs that may have publicly-available datasets for training intervention outcome prediction models.
# Curate indicators of useful ML publications that could be used to train an LLM to recommend relevant publications for cancer modeling.
# Review peer curations and resolve annotation conflicts.

Interested individuals should reach out to lorikrammer@gwu.edu.

'''5. FDA-ARGOS Computation and Pathogen Curation Project'''

POC: Christie Woodside, Jonathon Keeney

# Update data tables for more efficient computations
## Student would review and input additional data and IDs in the tables/sheets used to perform computations. This would be manual work (but super important), but would require high attention to detail.
## Additional Work: Requires Python/shell coding background. Student would run scripts that prepare and format data tables that are pushed to data.argosdb.org. Coding knowledge is needed in case of errors, bugs, or other mishaps in the code. Ongoing work as computations are performed.
# Curate and report on current pathogens to upload to ARGOS
## Student would work on manual curation of circulating pathogens to be added to data.argosdb.org. Regular check-ins and reports of what was found.
## Locate assembly IDs, reads, and metagenomic information for these pathogens to be used in computations and deposited into data.argosdb.org.
## Provide documentation on why they were curated, why they are important, how they were selected, and how data was collected.
# QC Analysis using HIVE
## Analyze the curated pathogens using our QC ARGOS one-click pipeline.
## The results will be added to our ARGOS database.

If the student has any other ideas or methods they want to focus on, please reach out to christie.woodside@email.gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.
----

=== Requirements for Completion ===
'''Note:''' The following are mandatory. Failure to complete any will result in an incomplete volunteer record.

==== Documentation ====
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.

==== Written Report ====
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.

==== Presentation & Slide Submission ====
Present your work last week of the 13-week period.

Slides must be submitted to the POCs.
----

=== Completion Certificate ===
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.
----

=== Contact ===
mazumder_lab@gwu.edu.
----

=== Volunteers (TBD) ===
{| class="wikitable"
|+
!Name
!Project Assigned
!Projects Interested
|-
|Diya Kamalabharathy
|
|PredictMod; Glyco web development
|-
|Anika Sikka
|
|GlyGen
|-
|Akale Kinfe
|
|BiomarkerKB
|-
|Nahom Abel
|
|BiomarkerKB
|-
|Harivinay P. Gujjula
|
|GlyGen
|-
|Sparsh Gupta
|
|BiomarkerKB
|}

Symposium 2025

2025-07-21T15:14:32Z

Daniallmasood: /* Agenda */

The HIVE Lab summer symposium is scheduled for Thursday July 31, 2025. It is an exciting time for the lab volunteers and interns to present their findings on the projects they worked on for 8 weeks.

[[File:DC.png|center|frame]]

== '''Program and Information''' ==

=== '''Symposium Venue''' ===
The HIVE lab symposium will held in person at The George Washington University, Washington DC with an option to join virtually.

In Person - Ross 637, Ross Hall, School of Health and Medical Sciences, The George Washington University, Washington DC ([https://maps.app.goo.gl/PHQmZacA4hWDvTCh6 MAP])

Virtual - Zoom

'''Zoom Link -''' https://gwu-edu.zoom.us/j/98841344003

'''Add to - [https://gwu-edu.zoom.us/meeting/tJwlc-irqj8qGtdu0zteQ__E2Fqo0fbPF6P7/calendar/google/add Google Calendar] | [https://gwu-edu.zoom.us/meeting/tJwlc-irqj8qGtdu0zteQ__E2Fqo0fbPF6P7/ics Outlook Calendar] | [https://calendar.yahoo.com/?v=60&VIEW=d&TITLE=2025%20CFDE-GlyGen-HIVE%20Lab%20Summer%20Symposium&in_loc=https%3A%2F%2Fgwu-edu.zoom.us%2Fj%2F98841344003&URL=https%3A%2F%2Fgwu-edu.zoom.us%2Fj%2F98841344003&ST=20250731T140000Z&DUR=0700&DESC=Jeet%20Vora%20%28GlyGen%20-%20GW%29%20is%20inviting%20you%20to%20a%20scheduled%20Zoom%20meeting.%0D%0AJoin%20Zoom%20Meeting%0D%0Ahttps%3A%2F%2Fgwu-edu.zoom.us%2Fj%2F98841344003%0D%0A%0D%0AMeeting%20ID%3A%20988%204134%204003%0D%0A%0D%0A---%0D%0A%0D%0AOne%20tap%20mobile%0D%0A%2B14042012656%2C%2C98841344003%23%20US%0D%0A%2B13097403221%2C%2C98841344003%23%20US%0D%0A%0D%0A---%0D%0A%0D%0ADial%20by%20your%20location%0D%0A%E2%80%A2%20%2B14042012656%20US%0D%0A%E2%80%A2%20%2B1%20309%20740%203221%20US%0D%0A%E2%80%A2%20%2B12122258997%20US%0D%0A%E2%80%A2%20%2B16462551997%20US%0D%0A%E2%80%A2%20%2B14805624901%20US%0D%0A%E2%80%A2%20%2B81345107597%20Japan%0D%0A%E2%80%A2%20%2B44%20201%20151%208517%20United%20Kingdom%0D%0A%E2%80%A2%20%2B442079791833%20United%20Kingdom%0D%0A%E2%80%A2%20%2B61280318153%20Australia%0D%0A%E2%80%A2%20%2B41434569439%20Switzerland%0D%0A%0D%0AMeeting%20ID%3A%20988%204134%204003%0D%0A%0D%0AFind%20your%20local%20number%3A%20https%3A%2F%2Fgwu-edu.zoom.us%2Fu%2FadB0kGNOyd%0D%0A%0D%0A---%0D%0A%0D%0AJoin%20by%20SIP%0D%0A%E2%80%A2%2098841344003%40zoomcrc.com%0D%0A%0D%0A---%0D%0A%0D%0AJoin%20by%20H.323%0D%0A%E2%80%A2%20144.195.19.161%20%28US%20West%29%0D%0A%E2%80%A2%20206.247.11.121%20%28US%20East%29%0D%0A%E2%80%A2%20115.114.131.7%20%28India%20Mumbai%29%0D%0A%E2%80%A2%20115.114.115.7%20%28India%20Hyderabad%29%0D%0A%E2%80%A2%20159.124.15.191%20%28Amsterdam%20Netherlands%29%0D%0A%E2%80%A2%20159.124.47.249%20%28Germany%29%0D%0A%E2%80%A2%20159.124.104.213%20%28Australia%20Sydney%29%0D%0A%E2%80%A2%20159.124.74.212%20%28Australia%20Melbourne%29%0D%0A%E2%80%A2%20170.114.180.219%20%28Singapore%29%0D%0A%E2%80%A2%2064.211.144.160%20%28Brazil%29%0D%0A%E2%80%A2%20159.124.132.243%20%28Mexico%29%0D%0A%E2%80%A2%20159.124.168.213%20%28Canada%20Toronto%29%0D%0A%E2%80%A2%20159.124.196.25%20%28Canada%20Vancouver%29%0D%0A%E2%80%A2%20170.114.194.163%20%28Japan%20Tokyo%29%0D%0A%E2%80%A2%20147.124.100.25%20%28Japan%20Osaka%29%0D%0A%0D%0AMeeting%20ID%3A%20988%204134%204003%0D%0A%0D%0A Yahoo Calendar]'''

== '''Agenda''' ==
All times in Eastern Standard Time
{| class="wikitable"
|'''Time (ET)'''
|'''Project'''
|'''Title'''
|'''Presenter'''
|-
|'''10:00am'''
| colspan="2" | '''Welcome and Introduction'''
|'''Michael Tiemeyer (10 min)'''
|-
| colspan="4" | ''Group 1 Moderator : Nathan Edwards''
|-
|10:10am
|CFDE
|Integrating Biocuration and Data Standardization to Generate Machine Learning-Ready Glycan Datasets
|Ana Jaramillo and Yuxin Zou (20 min)
|-
|10:30am
|CFDE
|Machine Learning Models for Linkage Prediction in Glycan Images
|Campbell Ross (15 min)
|-
|10:45am
|CFDE
|A Graph-Based AI Workflow for Mining Glycan Biomarkers and Related Annotations from Publications
|Cyrus Chun Hong Au Yeung (15 min)
|-
|11:00am
|BiomarkerKB
|Comprehensive Identification and LLM Based Curation of the Top 50 Clinically Relevant Disease Biomarkers
|Sohana Bahl, Isaac Kim, Sparsh Gupta (15 min)
|-
|11:15am
|BiomarkerKB
|TBA
|Nathan Ressom, Ana Vohralikova, Mathias Belay (15 min)
|-
|11:30am
|BiomarkerKB
|Systematic Curation and Large Language Model-Based Extraction of Alzheimer’s Disease Biomarkers
|John McCaffery, Alma Ogunsina, Akale Kinfe (15 min)
|-
|'''11:45am'''
| colspan="2" |'''Open Q and A'''
|'''All (30 min)'''
|-
|12:30pm
| colspan="3" | '''LUNCH (90 mins)'''
|-
| colspan="4" | ''Group 2 Moderator : Rene Ranzinger''
|-
|2:00pm
|GlyGen
|GlyGen Biocuration Project
|Aise Arpinar, Haravinay P. Gujjulla, Nahom Abel (20 min)
|-
|2:20pm
|Glycobiology Web Development
|A Resource Drill Down and Visualization for the Glyspace Alliance
|Diya Kamalabharathy (5 min)
|-
|2:25pm
|Predictmod Curation
|PredictMod: PubMed Curation for Training an LLM for Recommendation
|Grace Chong, Aaron Ressom, Diya Kamalabharathy (15 min)
|-
|2:40pm
|Predictmod AI-READI
|Robust Classification of Glycemic Health States from Continuous Glucose
|Nikhil Arethiya (15 min)
|-
|2:55pm
|Argos
|Curation of Emerging Pathogen Genomes for FDA-ARGOS Database Expansion
|Miao Wang (15 min)
|-
|3:10pm
|GlycoSiteMiner
|TBA
|(15 min)
|-
|'''3:25pm'''
| colspan="2" |'''<nowiki>Open Q and A | Closing Remarks</nowiki>'''
|'''<nowiki>All (20 min) | Raja Mazumder</nowiki>'''
|}

== '''Project Description''' ==

=== CFDE Project ===
The CFDE project focuses on integrating biocuration and data standardization to generate machine learning-ready glycan datasets. It brings together curated information and structured metadata to ensure that glycan-related data is both interoperable and computationally accessible. As part of this effort, the project supports the development of machine learning models for linkage prediction in glycan images, enabling automated interpretation of glycan structures from visual representations. In addition, a graph-based AI workflow is being implemented to mine glycan biomarkers and related annotations from scientific publications, helping to uncover novel insights and associations. These approaches collectively advance the integration of glycobiology into broader biomedical research by making glycan data more usable for downstream AI applications.

=== GlyGen Project ===
The GlyGen Biocuration project focuses on integrating legacy, yet valuable, data from the CarbBank and CFG databases into the GlyGen infrastructure. A key challenge is mapping metadata, such as species names and publication references, to standardized dictionaries and ontologies. While most entries have been automatically matched using custom scripts, remaining inconsistencies, including outdated, misspelled, or abbreviated terms, require manual curation using resources such as Google, PubMed, and domain-specific dictionaries and ontologies.

=== BiomarkerKB Biocuration Project ===
The Biomarker Biocuration project focuses on biomarker curation from abstracts and publications in the BiomarkerKB data model. A key challenge in curating biomarkers is the vast amount of data that is present over various publications. Manual curation requires reading, inferring, and understanding key elements of biomarker data and being able to map it to the defined biomarker data model. LLM methodologies will help immensely in being able to recognize biomarker and condition data and being able to map information found into the data model while also automatically mapping other contextual and standardized data to the model to allow data to be AI andmachine leanring ready.

=== ArgosDB Curation Project ===
This project focuses on evaluating and curating high-quality genomes of emerging and clinically relevant pathogens, with an emphasis on fungal species. Using public genomic repositories and FDA-ARGOS inclusion criteria, I identify candidate organisms for database expansion to support diagnostic assay development and public health surveillance.

Symposium 2025

2025-07-18T20:08:08Z

Daniallmasood: /* Agenda */

The HIVE Lab summer symposium is scheduled for Thursday July 31, 2025. It is an exciting time for the lab volunteers and interns to present their findings on the projects they worked on for 8 weeks.

[[File:DC.png|center|frame]]

== '''Program and Information''' ==

=== '''Symposium Venue''' ===
The HIVE lab symposium will held in person at The George Washington University, Washington DC with an option to join virtually.

In Person - Ross 637, Ross Hall, School of Health and Medical Sciences, The George Washington University, Washington DC ([https://maps.app.goo.gl/PHQmZacA4hWDvTCh6 MAP])

Virtual - Zoom

'''Zoom Link -''' https://gwu-edu.zoom.us/j/98841344003

'''Add to - [https://gwu-edu.zoom.us/meeting/tJwlc-irqj8qGtdu0zteQ__E2Fqo0fbPF6P7/calendar/google/add Google Calendar] | [https://gwu-edu.zoom.us/meeting/tJwlc-irqj8qGtdu0zteQ__E2Fqo0fbPF6P7/ics Outlook Calendar] | [https://calendar.yahoo.com/?v=60&VIEW=d&TITLE=2025%20CFDE-GlyGen-HIVE%20Lab%20Summer%20Symposium&in_loc=https%3A%2F%2Fgwu-edu.zoom.us%2Fj%2F98841344003&URL=https%3A%2F%2Fgwu-edu.zoom.us%2Fj%2F98841344003&ST=20250731T140000Z&DUR=0700&DESC=Jeet%20Vora%20%28GlyGen%20-%20GW%29%20is%20inviting%20you%20to%20a%20scheduled%20Zoom%20meeting.%0D%0AJoin%20Zoom%20Meeting%0D%0Ahttps%3A%2F%2Fgwu-edu.zoom.us%2Fj%2F98841344003%0D%0A%0D%0AMeeting%20ID%3A%20988%204134%204003%0D%0A%0D%0A---%0D%0A%0D%0AOne%20tap%20mobile%0D%0A%2B14042012656%2C%2C98841344003%23%20US%0D%0A%2B13097403221%2C%2C98841344003%23%20US%0D%0A%0D%0A---%0D%0A%0D%0ADial%20by%20your%20location%0D%0A%E2%80%A2%20%2B14042012656%20US%0D%0A%E2%80%A2%20%2B1%20309%20740%203221%20US%0D%0A%E2%80%A2%20%2B12122258997%20US%0D%0A%E2%80%A2%20%2B16462551997%20US%0D%0A%E2%80%A2%20%2B14805624901%20US%0D%0A%E2%80%A2%20%2B81345107597%20Japan%0D%0A%E2%80%A2%20%2B44%20201%20151%208517%20United%20Kingdom%0D%0A%E2%80%A2%20%2B442079791833%20United%20Kingdom%0D%0A%E2%80%A2%20%2B61280318153%20Australia%0D%0A%E2%80%A2%20%2B41434569439%20Switzerland%0D%0A%0D%0AMeeting%20ID%3A%20988%204134%204003%0D%0A%0D%0AFind%20your%20local%20number%3A%20https%3A%2F%2Fgwu-edu.zoom.us%2Fu%2FadB0kGNOyd%0D%0A%0D%0A---%0D%0A%0D%0AJoin%20by%20SIP%0D%0A%E2%80%A2%2098841344003%40zoomcrc.com%0D%0A%0D%0A---%0D%0A%0D%0AJoin%20by%20H.323%0D%0A%E2%80%A2%20144.195.19.161%20%28US%20West%29%0D%0A%E2%80%A2%20206.247.11.121%20%28US%20East%29%0D%0A%E2%80%A2%20115.114.131.7%20%28India%20Mumbai%29%0D%0A%E2%80%A2%20115.114.115.7%20%28India%20Hyderabad%29%0D%0A%E2%80%A2%20159.124.15.191%20%28Amsterdam%20Netherlands%29%0D%0A%E2%80%A2%20159.124.47.249%20%28Germany%29%0D%0A%E2%80%A2%20159.124.104.213%20%28Australia%20Sydney%29%0D%0A%E2%80%A2%20159.124.74.212%20%28Australia%20Melbourne%29%0D%0A%E2%80%A2%20170.114.180.219%20%28Singapore%29%0D%0A%E2%80%A2%2064.211.144.160%20%28Brazil%29%0D%0A%E2%80%A2%20159.124.132.243%20%28Mexico%29%0D%0A%E2%80%A2%20159.124.168.213%20%28Canada%20Toronto%29%0D%0A%E2%80%A2%20159.124.196.25%20%28Canada%20Vancouver%29%0D%0A%E2%80%A2%20170.114.194.163%20%28Japan%20Tokyo%29%0D%0A%E2%80%A2%20147.124.100.25%20%28Japan%20Osaka%29%0D%0A%0D%0AMeeting%20ID%3A%20988%204134%204003%0D%0A%0D%0A Yahoo Calendar]'''

== '''Agenda''' ==
All times in Eastern Standard Time
{| class="wikitable"
|'''Time (ET)'''
|'''Project'''
|'''Title'''
|'''Presenter'''
|-
|'''10:00am'''
| colspan="2" | '''Welcome and Introduction'''
|'''Michael Tiemeyer (10 min)'''
|-
| colspan="4" | ''Group 1 Moderator : Nathan Edwards''
|-
|10:10am
|CFDE
|Integrating Biocuration and Data Standardization to Generate Machine Learning-Ready Glycan Datasets
|Ana Jaramillo and Yuxin Zou (20 min)
|-
|10:30am
|CFDE
|Machine Learning Models for Linkage Prediction in Glycan Images
|Campbell Ross (15 min)
|-
|10:45am
|CFDE
|A Graph-Based AI Workflow for Mining Glycan Biomarkers and Related Annotations from Publications
|Cyrus Chun Hong Au Yeung (15 min)
|-
|11:00am
|BiomarkerKB
|TBA
|Sohana Bahl, Isaac Kim, Sparsh Gupta (15 min)
|-
|11:15am
|BiomarkerKB
|TBA
|Nathan Ressom, Ana Vohralikova, Mathias Belay (15 min)
|-
|11:30am
|BiomarkerKB
|Systematic Curation and Large Language Model-Based Extraction of Alzheimer’s Disease Biomarkers
|John McCaffery, Alma Ogunsina, Akale Kinfe (15 min)
|-
|'''11:45am'''
| colspan="2" |'''Open Q and A'''
|'''All (30 min)'''
|-
|12:30pm
| colspan="3" | '''LUNCH (90 mins)'''
|-
| colspan="4" | ''Group 2 Moderator : Rene Ranzinger''
|-
|2:00pm
|GlyGen
|GlyGen Biocuration Project
|Aise Arpinar, Haravinay P. Gujjulla, Nahom Abel (20 min)
|-
|2:20pm
|Glycobiology Web Development
|A Resource Drill Down and Visualization for the Glyspace Alliance
|Diya Kamalabharathy (5 min)
|-
|2:25pm
|Predictmod Curation
|PredictMod: PubMed Curation for Training an LLM for Recommendation
|Grace Chong, Aaron Ressom, Diya Kamalabharathy (15 min)
|-
|2:40pm
|Predictmod AI-READI
|Robust Classification of Glycemic Health States from Continuous Glucose
|Nikhil Arethiya (15 min)
|-
|2:55pm
|Argos
|Curation of Emerging Pathogen Genomes for FDA-ARGOS Database Expansion
|Miao Wang (15 min)
|-
|3:10pm
|GlycoSiteMiner
|TBA
|(15 min)
|-
|'''3:25pm'''
| colspan="2" |'''<nowiki>Open Q and A | Closing Remarks</nowiki>'''
|'''<nowiki>All (20 min) | Raja Mazumder</nowiki>'''
|}

== '''Project Description''' ==

=== CFDE Project ===
The CFDE project focuses on integrating biocuration and data standardization to generate machine learning-ready glycan datasets. It brings together curated information and structured metadata to ensure that glycan-related data is both interoperable and computationally accessible. As part of this effort, the project supports the development of machine learning models for linkage prediction in glycan images, enabling automated interpretation of glycan structures from visual representations. In addition, a graph-based AI workflow is being implemented to mine glycan biomarkers and related annotations from scientific publications, helping to uncover novel insights and associations. These approaches collectively advance the integration of glycobiology into broader biomedical research by making glycan data more usable for downstream AI applications.

=== GlyGen Project ===
The GlyGen Biocuration project focuses on integrating legacy, yet valuable, data from the CarbBank and CFG databases into the GlyGen infrastructure. A key challenge is mapping metadata, such as species names and publication references, to standardized dictionaries and ontologies. While most entries have been automatically matched using custom scripts, remaining inconsistencies, including outdated, misspelled, or abbreviated terms, require manual curation using resources such as Google, PubMed, and domain-specific dictionaries and ontologies.

=== BiomarkerKB Biocuration Project ===
The Biomarker Biocuration project focuses on biomarker curation from abstracts and publications in the BiomarkerKB data model. A key challenge in curating biomarkers is the vast amount of data that is present over various publications. Manual curation requires reading, inferring, and understanding key elements of biomarker data and being able to map it to the defined biomarker data model. LLM methodologies will help immensely in being able to recognize biomarker and condition data and being able to map information found into the data model while also automatically mapping other contextual and standardized data to the model to allow data to be AI andmachine leanring ready.

=== ArgosDB Curation Project ===
This project focuses on evaluating and curating high-quality genomes of emerging and clinically relevant pathogens, with an emphasis on fungal species. Using public genomic repositories and FDA-ARGOS inclusion criteria, I identify candidate organisms for database expansion to support diagnostic assay development and public health surveillance.

Symposium 2025

2025-07-17T17:45:57Z

Daniallmasood: /* Project Description */

Volunteership 2025

2025-04-03T20:05:30Z

Daniallmasood: Add biomarker curation project ideas

<html>
<h2>2025 Volunteer Program Details</h2>

<h3>Dates</h3>
<p><strong>June 2nd, 2025 – July 25th, 2025</strong> (8 weeks)<br>
Monday to Friday | Remote | No breaks</p>

<hr>

<h3>Volunteer Expectations</h3>
<ol>
<li>Daily progress updates via Slack (scrum).</li>
<li>Regular Zoom meetings with the assigned project point of contact.</li><li>Expected to dedicate 5–6 hours per day to project work, with the remaining time focused on skill development or reading. </li>
</ol>
<p style="color: red;"><strong>Important:</strong> If the scrum is not updated for 2 consecutive days, the candidate will be <u>automatically dropped</u> from the program.</p>
<hr>

<h3>Potential Projects</h3>
<ol>
<li>BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.</li>
<li>GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information. </li><li>ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies. </li><li>PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Identifying datasets and harmonizing them so that they can be used to generate ML models </li></ol>Individuals with a background in programming and/or machine learning may take on additional tasks that contribute to the development of ML models, which can be integrated into PredictMod (<nowiki>https://hivelab.biochemistry.gwu.edu/predictmod</nowiki>).
<hr>

<h4>BiomarkerKB Biocuration Project Ideas</h4>

# Curate biomarkers for a specific disease (Alzheimers)
## Student would work on doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly
## Next 4 weeks can work on developing an LLM or automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data
# Top 50 biomarkers
## curate the top 50 biomarkers for biomarkerkb.org
## Define what constitutes a top 50 biomarker
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in data model and collecting cross-references as well.
# Biocuration of biomarkers from NLP/LLM work
## Use the biomarkers collected from NLP work
## Curate biomarkers. Data provided was not provided in biomarker data model
## While curating biomarkers also check if data collected from NLP is correct
## After completion student can start using curated data to work on NLP/LLM method
# Curate biomarkers for a treatment
## same as number 1 above

If the student has any other ideas, diseases, treatments, or methods they want to focus on please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.<hr>
<h3>Requirements for Completion</h3>
<p><strong>Note:</strong> The following are <u>mandatory</u>. Failure to complete any will result in an incomplete volunteer record.</p>

<h4>Documentation</h4>
<p>All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.</p>

<h4>Written Report</h4>
<p>Submit a 1–2 page summary of your tasks and accomplishments to the Admin Team during the final week of your program.</p>

<h4>Presentation & Slide Submission</h4>
<p>Present your work last week of the 8-week period.</p>
<p>Slides must be submitted to the Admin Team and should include:</p>
<ul>
<li>A title slide with your name, date, and mentor</li>
<li>At least 3 content slides</li>
<li>A final slide with acknowledgements or references</li>
</ul>
Contact the Admin Team to access previously submitted slides.
<hr>

=== Completion Certificate ===
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.
<hr>
=== Contact ===
mazumder_lab AT gwu.edu.

Volunteership 2025

2025-04-03T20:01:24Z

Daniallmasood: