Volunteership Summer 2026: Difference between revisions
| Line 41: | Line 41: | ||
==== 1. BiomarkerKB Biocuration Project Ideas ==== | ==== 1. BiomarkerKB Biocuration Project Ideas ==== | ||
# Review exisiting published biomarkers for the correctness and validity | |||
#* Validate biomarker–disease associations using primary literature | |||
#* Assess evidence strength | |||
#* Identify outdated, conflicting, or unsupported biomarker claims | |||
# Biocurate biomarkers from publications based on disease and entity type | |||
#* Identify and curate novel biomarkers from recent publications | |||
#* Standardize biomarker representation using controlled vocabularies and ontologies | |||
#* Classify biomarkers by type and disease context | |||
# Review and Map Electronic Health Records Normal Entity Data | |||
#* Identify relevant EHR data elements (lab tests, diagnoses, procedures) | |||
#* Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes) | |||
#* Resolve ambiguities and inconsistencies in mapping, clinical terminology | |||
# Continue working on LLM methods started by previous volunteers. | |||
#* The data is available as well as some preliminary research and work done by previous volunteers in this area. | |||
If the student has any other ideas or would like to know more, please reach out to jeetvora@gwu.edu | |||
==== 2. GlyGen Biocuration Project Ideas ==== | ==== 2. GlyGen Biocuration Project Ideas ==== | ||
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan | POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan | ||
| Line 143: | Line 161: | ||
!Projects Interested | !Projects Interested | ||
|- | |- | ||
| | |Sahana Adusumilli | ||
| | |BiomarkerKB | ||
| | |Jeet Vora | ||
| | |Review EHR Normal Ranges | ||
|- | |- | ||
| | |Abhirama Chillara | ||
| | |BiomarkerKB | ||
| | |Jeet Vora/Maria | ||
| | |TBD | ||
|- | |- | ||
| | | | ||
Revision as of 21:01, 2 April 2026
2026 Summer Volunteer Program Details
Dates
Application Deadline
Date TBD | 12:00 PM ET
Please email your updated resume and projects in order of preference. Acceptance letter/email will be sent to candidates latest the day after the kick-off meeting.
Volunteer Zoom Kick-Off Meeting
Date TBD | 11:00 AM to 12:00 PM
Program Dates: June 1, 2026 – July 31, 2026 (9 weeks)
Remote | Hybrid for GW employees and students (Ross Hall 5th floor)
Fall 2025 Volunteership (Closed)
Volunteer Expectations
- Minimum commitment of 10 hours per week.
- Progress updates via Slack at least 3 days per week (scrum).
- 30-minute Zoom meetings (during regular work hours) once a week or every other week with the assigned project point of contact (POC).
- Attend some lectures or seminars remotely (max 4-5).
Important: If the scrum is not updated for 2 consecutive working days, the candidate will be automatically dropped from the program.
Potential Projects
We are excited to continue our bioinformatics volunteership program in Summer 2026. This program offers students the opportunity to work on bioinformatics projects supported by agencies such as the NIH, ARPA-H, and FDA. Participants will gain exposure to a variety of activities within a bioinformatics lab, including data analysis, computational biology, and genomics. If you are interested, please email mazumder_lab@gwu.edu your resume and a ranked list of the projects that interest you most. You can also indicate if you want to focus on specific areas that are of interest to you.
- BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.
- GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.
- ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.
- PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Curating PMIDs for intervention outcome prediction dataset LLM recommendation training.
Note: Individuals involved in the above projects with a background in programming and/or machine learning may also undertake additional tasks to support the development of ML models, which can be integrated into PredictMod or used to enhance AI/ML-ready datasets within GlyGen. We are also looking for individuals who have previously worked with us to take on a coordinator role.
1. BiomarkerKB Biocuration Project Ideas
- Review exisiting published biomarkers for the correctness and validity
- Validate biomarker–disease associations using primary literature
- Assess evidence strength
- Identify outdated, conflicting, or unsupported biomarker claims
- Biocurate biomarkers from publications based on disease and entity type
- Identify and curate novel biomarkers from recent publications
- Standardize biomarker representation using controlled vocabularies and ontologies
- Classify biomarkers by type and disease context
- Review and Map Electronic Health Records Normal Entity Data
- Identify relevant EHR data elements (lab tests, diagnoses, procedures)
- Map entities to standard terminologies (e.g., SNOMED CT, LOINC, ICD codes)
- Resolve ambiguities and inconsistencies in mapping, clinical terminology
- Continue working on LLM methods started by previous volunteers.
- The data is available as well as some preliminary research and work done by previous volunteers in this area.
If the student has any other ideas or would like to know more, please reach out to jeetvora@gwu.edu
2. GlyGen Biocuration Project Ideas
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan
Over the last three decades, numerous glycomics database projects have been initiated to collect valuable information about glycans, proteins, and their interactions. Some of these databases have been discontinued due to the end of project funding; however, the data contained within them remains highly valuable to the research community. Integrating these legacy datasets into modern databases or knowledgebases, such as GlyGen, presents a significant challenge because much of the associated metadata (e.g., species, tissue, disease, cell line) is recorded as free-text that does not conform to the standardized dictionaries and ontologies used by current resources.
To address this challenge, this project will leverage large language models (LLMs) to automate the mapping of free-text metadata from legacy databases, specifically CarbBank and CFG, to standardized accessions in authoritative resources such as NCBI Taxonomy, Disease Ontology, and Cellosaurus. The LLM-based workflow will identify and normalize synonyms, abbreviations, and spelling variants (e.g., “human,” “man,” or “h. sapiens” mapped to Homo sapiens), enabling scalable and reproducible metadata harmonization that would otherwise require extensive manual curation. The LLM tasks will be performed using OpenAI resources integrated into the GlyGen curation pipeline. The project involves the development of Python scripts to read and write data, invoke the OpenAI API and compare results with manual curated data. Another aspect of the work is the development and finetunning of a prompt for ChatGPT to ensure reliable and accurate mapping is produced.
While the mapping process will be largely automated, manual validation will be incorporated as a quality-control step to assess model performance, verify correctness, and identify edge cases requiring refinement. This hybrid approach significantly reduces curator burden while ensuring high-quality, ontology-aligned annotations.
The goal of this effort is to migrate and modernize datasets from CarbBank and CFG, making them interoperable with GlyGen and other contemporary glycoinformatics resources through a scalable, AI-assisted curation strategy.
For any questions, please contact Rene Ranzinger (rene@ccrc.uga.edu) or Kate Warner (k.warner1@email.gwu.edu).
3. GlyGen Publication Analysis Project Ideas
POC: Rene Ranzinger, Kate Warner, and Urnisha Bhuiyan
One of the challenges for any bioinformatics project is understanding the size of its community, how well the project serves this community, and how widely its software/database is used. A potential solution is to analyze PubMed publication data. We are seeking applicants with programming skills (in Python or Java) to perform this analysis.
The project involves:
- Using the PubMed web API to filter publications based on keywords.
- Analyzing paper abstracts to identify research institutions and groups that form the community.
- Filtering the community list to exclude unrelated co-authors
A subproject will involve analyzing the full text of papers (when available) for keywords or resource and database names. The results of the analysis will be discussed with GlyGen project member who will suggest changes and improvements to the analysis and data presentation. Source code developed as part of this project will be documented and shared in a public GitHub repository. If you have any other ideas or methods you would like to explore, please reach out to rene@ccrc.uga.edu to discuss them.
4. PredictMod Machine Learning (ML) Modeling Project
POC: Lori Krammer, Pat McNeely (optional)
Volunteers will conduct ML modeling using publicly-available -omics datasets that were previously identified (see our Recommended Publications for IOPMs page). This volunteership will involve data harmonization, model training, and pipeline documentation.
Tasks associated with this project include:
- Exploring and understanding the data found in relevant PMIDs that can be used to train intervention outcome prediction models.
- Preparing the data for model training and model performance evaluation
- Testing the modeling tutorial, PredictMod platform, and associated project tools
- Documentation of the ML pipeline and testing results
Deliverables for this project include:
- ML-ready datasets
- Trained model scripts
- Pipeline documentation captured in BioCompute Objects (BCOs) and testing reports
- Volunteership documentation (final report, progress updates, symposium presentation)
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.
5. BioCompute Objects User Research Project
POC: Lori Krammer, Pat McNeely
Volunteers will conduct individual audits and user researcher to improve the human readability of BioCompute Objects (BCOs) and the project documentation. This volunteership will involve user research, prototyping, and documentation.
Tasks associated with the project include:
- Reviewing existing documentation to gain a comprehensive understanding of BioCompute Objects, their relevance to bioinformatics, and key user personas. The volunteer will identify and report gaps in the current documentation.
- Conducting user research to understand pain points and desired outcomes. The volunteer will develop user stories based on interviews with BCO users.
- Prototyping improvements to the BCO documentation and/or portal based on user stories. This could involve visual diagrams, wiki restructuring, or decision logs.
Deliverables will include:
- User research report with user story maps
- BCO documentation improvement plan
- Volunteership documentation (final report, progress updates, symposium presentation)
Interested individuals should reach out to lorikrammer@gwu.edu. Please note that this project requires attendance at biweekly meetings and a final presentation of your work.
6. FDA-ARGOS Computation and Pathogen Curation Project
Requirements for Completion
Note: The following are mandatory. Failure to complete any will result in an incomplete volunteer record.
Documentation
All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.
Written Report
Submit a 1–2 page summary of your tasks and accomplishments to the Admin during the final week of your program.
Presentation & Slide Submission
Present your work last week of the 9-week period.
Slides must be submitted to the POCs.
Completion Certificate
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program. Additional recognition will be given to the top three volunteers with exceptional presentations at the end of the program.
Contact
mazumder_lab@gwu.edu.
Volunteers (TBD)
| Name | Project Assigned | POC Assigned | Projects Interested |
|---|---|---|---|
| Sahana Adusumilli | BiomarkerKB | Jeet Vora | Review EHR Normal Ranges |
| Abhirama Chillara | BiomarkerKB | Jeet Vora/Maria | TBD |
*Returning volunteer.
**Not directly involved in the semester curriculum; long-term volunteer.
Summer 2026 Symposium
The Summer symposium will be held virtually.
Date: TBD
Time: 4 - 6 PM
Zoom Link - TBA
Agenda (All times are in Eastern Standard Time)
| Time | Project | Presentation Title | Presenter(s) |
|---|---|---|---|