Volunteership 2025: Difference between revisions

From HIVE Lab
Jump to navigation Jump to search
Summer Volunteer-ship 2025
ย 
Maria.kim (talk | contribs)
m Formatting
ย 
(22 intermediate revisions by 5 users not shown)
Line 1: Line 1:
<nowiki><h2>2025 Volunteer Program Details</h2></nowiki>
<h2>2025 Volunteer Program Details</h2>


<nowiki><h3>๐Ÿ“… Dates</h3></nowiki>
<h3>Dates</h3>
<p><strong>June 2nd, 2025 โ€“ July 25th, 2025</strong> (8 weeks)<br>
Monday to Friday | Remote | No breaks</p>


<nowiki><p><strong>June 2nd, 2025 โ€“ July 25th, 2025</strong></nowiki> (8 weeks)<nowiki><br></nowiki>
<hr>


Monday to Friday | Remote | No breaks<nowiki></p></nowiki>
<h3>Volunteer Expectations</h3>
<ol>
ย  <li>Daily progress updates via Slack (scrum).</li>
ย  <li>Regular Zoom meetings with the assigned project point of contact.</li><li>Expected to dedicate 5โ€“6 hours per day to project work, with the remaining time focused on skill development or reading. </li>
</ol>
<p style="color: red;"><strong>Important:</strong> If the scrum is not updated for 2 consecutive days, the candidate will be <u>automatically dropped</u> from the program.</p>
<hr>


<nowiki><p style="color: red;"><strong>Important:</strong></nowiki> If the scrum is not updated for 2 days in a row, the candidate will be <nowiki><u>automatically dropped</u></nowiki> from the program.<nowiki></p></nowiki>
<h3>Potential Projects</h3>
<ol>
ย  <li>BiomarkerKB ([https://biomarkerkb.org biomarkerkb.org]) project: Biomarker curation project. Involves reading papers and collecting biomarkers.</li>
ย  <li>GlyGen ([https://glygen.org glygen.org]) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information. </li><li>ARGOS ([https://argosdb.org argosdb.org]) project: Analyze genomics data using HIVE to identify reference genome assemblies. </li><li>PredictMod ([https://hivelab.biochemistry.gwu.edu/predictmod hivelab.biochemistry.gwu.edu/predictmod]) project. Identifying datasets and harmonizing them so that they can be used to generate ML models. Individuals with a background in programming and/or machine learning may take on additional tasks that contribute to the development of ML models, which can be integrated into PredictMod. </li></ol><hr>


<nowiki><hr></nowiki>
<h4>BiomarkerKB Biocuration Project Ideas</h4>POC: Daniall Masood
# Curate biomarkers for a specific disease (Alzheimers)
## The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.
## The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.
# Top 50 biomarkers
## Curate the top 50 biomarkers for biomarkerkb.org.
## Define what constitutes a top 50 biomarker.
## Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.
# Biocuration of biomarkers from NLP/LLM work
## Use the biomarkers collected from NLP work.
## Curate biomarkers. Data provided was not provided in the biomarker data model.
## While curating the biomarkers, check if data collected from NLP is correct.
## After completion, the student can start using curated data to work on the NLP/LLM method.
# Curate biomarkers for a treatment
## See #1 above.


<nowiki><h3>๐ŸŽฏ Volunteer Expectations</h3></nowiki>
If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.


<nowiki><ol></nowiki>
==== GlyGen Biocuration Project ====
POC: Rene Ranzinger and Urnisha Bhuiyan


ย  <nowiki><li>Scrumming every day (daily progress updates)</li></nowiki>
Using TableMaker in GlyGen, individuals will curate glycomics and glycoproteomics data from previous database resources that are now defunct. There might also be biocuration projects that inolve curating papers.


ย  <nowiki><li>Zoom meetings with your assigned Project Point of Contact</li></nowiki>
==== PredictMod Machine Learning Project Ideas ====
POC: Lori Krammer


<nowiki></ol></nowiki>
Data Identification & Harmonization:


<nowiki><hr></nowiki>
# Identify publicly-available datasets from scientific literature that can be used for intervention outcome prediction models.
# Conduct data harmonization and pre-processing following established project pipelines to make ML-ready dataset and data dictionary.


<nowiki><h3>โœ… Requirements for Completion</h3></nowiki>
Modeling & Integration (for those with experience in programming/ML)


<nowiki><p><strong>Note:</strong></nowiki> All of the following are mandatory. Failure to complete any will result in an incomplete volunteer record.<nowiki></p></nowiki>
# Perform model training and document ML pipeline in a BioCompute Object (BCO). ย 
# Integrate model into PredictMod platform.


<nowiki><h4>๐Ÿ“ Documentation</h4></nowiki>
Individuals with a background or interest in machine learning should reach out to lorikrammer@gwu.edu with a potential dataset to determine if it is a feasible project for the summer.<hr>
<h3>Requirements for Completion</h3>
<p><strong>Note:</strong> The following are <u>mandatory</u>. Failure to complete any will result in an incomplete volunteer record.</p>


<nowiki><p>Volunteers must maintain clear documentation of their work. This includes, but is not limited to, written protocols and scripts submitted to GitHub.</p></nowiki>
<h4>Documentation</h4>
<p>All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.</p>


<nowiki><h4>๐Ÿงพ Written Report</h4></nowiki>
<h4>Written Report</h4>
<p>Submit a 1โ€“2 page summary of your tasks and accomplishments to the Admin Team during the final week of your program.</p>


<nowiki><p>Volunteers must submit a 1โ€“2 page summary of tasks and accomplishments during the program. This should be emailed to the Admin Team during the final week.</p></nowiki>
<h4>Presentation & Slide Submission</h4>
<p>Present your work last week of the 8-week period.</p>
<p>Slides must be submitted to the Admin Team and should include:</p>
<ul>
ย  <li>A title slide with your name, date, and mentor</li>
ย  <li>At least 3 content slides</li>
ย  <li>A final slide with acknowledgements or references</li>
</ul>
Contact the Admin Team to access previously submitted slides.
<hr>


<nowiki><h4>๐Ÿ“Š Presentation & Slide Submission</h4></nowiki>
=== Completion Certificate ===
ย 
A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.
<nowiki><p>Volunteers must present their work during an All-Hands Meeting. Presentations must be 5โ€“7 minutes long, followed by 3โ€“5 minutes of Q&A.</p></nowiki>
<hr>
ย 
=== Contact ===
<nowiki><p>Slides must be submitted to the Admin Team. Presentations must include:</p></nowiki>
mazumder_lab AT gwu.edu.
ย 
<nowiki><ul></nowiki>
ย 
ย  <nowiki><li>Title Slide (with Name, Date, and Mentor)</li></nowiki>
ย 
ย  <nowiki><li>At least three slides total</li></nowiki>
ย 
ย  <nowiki><li>End slide with acknowledgements/references</li></nowiki>
ย 
<nowiki></ul></nowiki>
ย 
<nowiki><p>For access to past slides, contact the Admin Team.</p></nowiki>

Latest revision as of 21:43, 3 April 2025

2025 Volunteer Program Details

Dates

June 2nd, 2025 โ€“ July 25th, 2025 (8 weeks)
Monday to Friday | Remote | No breaks


Volunteer Expectations

  1. Daily progress updates via Slack (scrum).
  2. Regular Zoom meetings with the assigned project point of contact.
  3. Expected to dedicate 5โ€“6 hours per day to project work, with the remaining time focused on skill development or reading.

Important: If the scrum is not updated for 2 consecutive days, the candidate will be automatically dropped from the program.


Potential Projects

  1. BiomarkerKB (biomarkerkb.org) project: Biomarker curation project. Involves reading papers and collecting biomarkers.
  2. GlyGen (glygen.org) project: Review glycomics and glycoproteomics data and curate tissue, disease, and other related information.
  3. ARGOS (argosdb.org) project: Analyze genomics data using HIVE to identify reference genome assemblies.
  4. PredictMod (hivelab.biochemistry.gwu.edu/predictmod) project. Identifying datasets and harmonizing them so that they can be used to generate ML models. Individuals with a background in programming and/or machine learning may take on additional tasks that contribute to the development of ML models, which can be integrated into PredictMod.

BiomarkerKB Biocuration Project Ideas

POC: Daniall Masood

  1. Curate biomarkers for a specific disease (Alzheimers)
    1. The student would be doing manual curation for about 4 weeks, with regular check-ins with me to ensure it is being done correctly.
    2. The next 4 weeks can be dedicated to developing an LLM or an automated process to extract biomarker details with data collected in the first 4 weeks as training data/example data.
  2. Top 50 biomarkers
    1. Curate the top 50 biomarkers for biomarkerkb.org.
    2. Define what constitutes a top 50 biomarker.
    3. Begin curating biomarkers from different sources and papers by collecting fields mentioned in the data model, as well as collecting cross-references.
  3. Biocuration of biomarkers from NLP/LLM work
    1. Use the biomarkers collected from NLP work.
    2. Curate biomarkers. Data provided was not provided in the biomarker data model.
    3. While curating the biomarkers, check if data collected from NLP is correct.
    4. After completion, the student can start using curated data to work on the NLP/LLM method.
  4. Curate biomarkers for a treatment
    1. See #1 above.

If the student has any other ideas, diseases, treatments, or methods they want to focus on, please reach out to daniallmasood@gwu.edu to discuss your idea and check if it will be feasible as a project for the summer.

GlyGen Biocuration Project

POC: Rene Ranzinger and Urnisha Bhuiyan

Using TableMaker in GlyGen, individuals will curate glycomics and glycoproteomics data from previous database resources that are now defunct. There might also be biocuration projects that inolve curating papers.

PredictMod Machine Learning Project Ideas

POC: Lori Krammer

Data Identification & Harmonization:

  1. Identify publicly-available datasets from scientific literature that can be used for intervention outcome prediction models.
  2. Conduct data harmonization and pre-processing following established project pipelines to make ML-ready dataset and data dictionary.

Modeling & Integration (for those with experience in programming/ML)

  1. Perform model training and document ML pipeline in a BioCompute Object (BCO).
  2. Integrate model into PredictMod platform.

Individuals with a background or interest in machine learning should reach out to lorikrammer@gwu.edu with a potential dataset to determine if it is a feasible project for the summer.


Requirements for Completion

Note: The following are mandatory. Failure to complete any will result in an incomplete volunteer record.

Documentation

All volunteers must maintain adequate documentation of their work, including written protocols and scripts submitted to GitHub.

Written Report

Submit a 1โ€“2 page summary of your tasks and accomplishments to the Admin Team during the final week of your program.

Presentation & Slide Submission

Present your work last week of the 8-week period.

Slides must be submitted to the Admin Team and should include:

  • A title slide with your name, date, and mentor
  • At least 3 content slides
  • A final slide with acknowledgements or references

Contact the Admin Team to access previously submitted slides.


Completion Certificate

A certificate of completion and a letter of recommendation will be provided to all participants who successfully complete the program.


Contact

mazumder_lab AT gwu.edu.