GW-FEAST: Difference between revisions

From HIVE Lab
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
Federated Ecosystems for Analytics and Standardized Technologies ([https://hivelab.biochemistry.gwu.edu/gw-feast FEAST]) is a cloud-based, agile bioinformatics and data analysis platform under development through the ARPA-H Biomedical Data Fabric (BDF) toolbox program. The project is led by [https://dnahive.com DNA-HIVE] and other funded collaborators include Cornell University, Vanderbilt University, Georgetown University, European Bioinformatic Institute, and Kaiser Permanente. Our team is responsible for the GW instance of FEAST (GW-FEAST) and for co-leading the project with DNA-HIVE. This project is part of the ARPA-H FEAST performer team initiative to create bridges across data silos and make health data more accessible and usable.  
__NOTOC__
<!-- BANNER ACROSS TOP OF PAGE -->
<div id="ggw-topbanner" style="clear:both; position:relative; box-sizing:border-box; width:100%; margin:1.2em 0 6px; min-width:47em; border:1px solid #ddd; background-color:#f9f9f9; color:#000;"><div style="margin:0.4em; text-align:center;">
        <div style="font-size:160%; padding:.1em;">Welcome to GW-FEAST Wiki!</div>
        <div style="font-size:100%;">This is the [https://www.mediawiki.org/wiki/MediaWiki MediaWiki] for the GW-FEAST project.
</div>
</div>
<div style="clear: both;"></div>
</div> 
 
<div id="ggw_row3" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;">
    <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;">
        <h2>About GW-FEAST</h2>
Federated Ecosystems for Analytics and Standardized Technologies ([https://hivelab.biochemistry.gwu.edu/gw-feast FEAST]) is a cloud-based, agile bioinformatics and data analysis platform under development through the ARPA-H Biomedical Data Fabric (BDF) toolbox program. The project is led by [https://dnahive.com DNA-HIVE] and other funded collaborators include Cornell University, Vanderbilt University, Georgetown University, European Bioinformatics Institute, and Kaiser Permanente. Our team is responsible for the GW instance of FEAST (GW-FEAST) and for co-leading the project with DNA-HIVE. This project is part of the ARPA-H FEAST performer team initiative to create bridges across data silos and make health data more accessible and usable.  


Several hospitals and cancer centers will have a FEAST platform, which enables cross-site data analysis without the need to export or transform the data. Currently, large chunks of data are used by insurance companies, pharmaceutical companies, and others for research and development purposes. The FEAST platform, which is particularly strong with noisy, real-world data, aims to enable more precise data selection for research use while preserving patient privacy. When clinical data is submitted to the suite of tools, submission is handled via the HL7 FHIR protocol, ensuring only authorized parties ever have access to protected data. Models that provide update mechanisms such as online training will be updated appropriately without retaining any personally identifiable information (PII). Thus, these tools support federated data sets and training without ever retaining clinical PII within the system. All services are treated as independent microservices through containerization within docker containers.  
Several hospitals and cancer centers will have a FEAST platform, which enables cross-site data analysis without the need to export or transform the data. Currently, large chunks of data are used by insurance companies, pharmaceutical companies, and others for research and development purposes. The FEAST platform, which is particularly strong with noisy, real-world data, aims to enable more precise data selection for research use while preserving patient privacy. When clinical data is submitted to the suite of tools, submission is handled via the HL7 FHIR protocol, ensuring only authorized parties ever have access to protected data. Models that provide update mechanisms such as online training will be updated appropriately without retaining any personally identifiable information (PII). Thus, these tools support federated data sets and training without ever retaining clinical PII within the system. All services are treated as independent microservices through containerization within docker containers.  


[https://drive.google.com/file/d/1iv9VmFhNbd-5iwSwDMLVumCFnFN84cl8/view?usp=drive_link FEAST Video]
[https://drive.google.com/file/d/1iv9VmFhNbd-5iwSwDMLVumCFnFN84cl8/view?usp=drive_link FEAST Video]
</div>
    </div>


=== GW-FEAST Project Architecture ===
<div id="ggw_row3" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;">
[[File:GW-FEAST architecture.png|frameless|950x950px]]
    <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;">
 
== Quick Links ==
This figure depicts the current GW instance of FEAST and is subject to change throughout the life of the project.
 
== Data Sources ==
The GW-FEAST Book of Data contains data types, access restrictions, and de-identification protocols for each data source of the GW-FEAST project. All datasets are comprehensively documented using BioCompute Objects (BCOs), an Institute for Electrical and Electronics Engineers (IEEE) standard currently in use at the United States (US) Food and Drug Administration. BCOs foster transparency and reproducibility in bioinformatics pipelines. All BCOs for the GW-FEAST project can be found on the [https://biocomputeobject.org/bcodbs BioCompute Portal].
 
=== Overview of Current Data Sources for GW-FEAST ===
An overview of current data sources is shown below. The DNA and health record icons indicate whether the data type is -omics or electronic medical records (EMR) and the padlock icon is used to indicate which data sources are access-restricted. Greyed-out data sources are pending ingestion into the secure data environment for GW-FEAST.
 
[[File:GW-FEAST Dataset Overview.png|frameless|722x722px]]
 
==== [[National Breast Cancer Coalition (NBCC) Data]] ====


==== [[GW Data Commons (GWDC) Data]] ====
* [[GW-FEAST Data|GW-FEAST Data Sources]]
* [[GW-FEAST Data Access Portal]]
* [[GW-FEAST Data De-identification|GW-FEAST De-identification]]
</div>
    </div>


== Data Access ==
<div id="ggw_row3" style="display: flex; flex-flow: row wrap; justify-content: space-between; padding: 0; margin: 0 -5px 0 -5px;">
    <div style="flex: 1; margin: 5px; min-width: 210px; border: 1px solid #CCC; padding: 0 10px 10px 10px; box-shadow: 0 2px 2px rgba(0,0,0,0.1); background: #f5faff;">
        <h2>GW-FEAST Project Architecture</h2>[[File:GW-FEAST_architecture.png|none|thumb|658x658px]]


==== [[GW-FEAST Data Access Portal]] ====
''The GW-FEAST architecture diagram showcases the GW environment set up to facilitate FEAST queries through the GW node (or instance of FEAST at GW). While other consortium sites may have slightly different environment configurations, the overall structure and security practices will be similar across all sites. This diagram is subject to change throughout the life of the project.''
GW-FEAST data is access-controlled. To gain access please email mazumder_lab@gwu.edu.
</div>
For users who have access please connect to GW VPN and then go to the [https://feast.mgpc.biochemistry.gwu.edu/dsviewer '''data access portal'''] and log in.
    </div>

Revision as of 16:28, 26 March 2025


Welcome to GW-FEAST Wiki!
This is the MediaWiki for the GW-FEAST project.

About GW-FEAST

Federated Ecosystems for Analytics and Standardized Technologies (FEAST) is a cloud-based, agile bioinformatics and data analysis platform under development through the ARPA-H Biomedical Data Fabric (BDF) toolbox program. The project is led by DNA-HIVE and other funded collaborators include Cornell University, Vanderbilt University, Georgetown University, European Bioinformatics Institute, and Kaiser Permanente. Our team is responsible for the GW instance of FEAST (GW-FEAST) and for co-leading the project with DNA-HIVE. This project is part of the ARPA-H FEAST performer team initiative to create bridges across data silos and make health data more accessible and usable.

Several hospitals and cancer centers will have a FEAST platform, which enables cross-site data analysis without the need to export or transform the data. Currently, large chunks of data are used by insurance companies, pharmaceutical companies, and others for research and development purposes. The FEAST platform, which is particularly strong with noisy, real-world data, aims to enable more precise data selection for research use while preserving patient privacy. When clinical data is submitted to the suite of tools, submission is handled via the HL7 FHIR protocol, ensuring only authorized parties ever have access to protected data. Models that provide update mechanisms such as online training will be updated appropriately without retaining any personally identifiable information (PII). Thus, these tools support federated data sets and training without ever retaining clinical PII within the system. All services are treated as independent microservices through containerization within docker containers.

FEAST Video

GW-FEAST Project Architecture

The GW-FEAST architecture diagram showcases the GW environment set up to facilitate FEAST queries through the GW node (or instance of FEAST at GW). While other consortium sites may have slightly different environment configurations, the overall structure and security practices will be similar across all sites. This diagram is subject to change throughout the life of the project.