Recommended Publications for Intervention Outcome Prediction Models: Difference between revisions
Lorikrammer (talk | contribs) mNo edit summary |
Lorikrammer (talk | contribs) mNo edit summary |
||
| (2 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
<small> Go Back to [[PredictMod|PredictMod Project]]. </small> | |||
The following publications have been evaluated for their usefulness in providing publicly-available datasets for intervention outcome prediction models. | The following publications have been evaluated for their usefulness in providing publicly-available datasets for intervention outcome prediction models. | ||
== Breast Cancer == | == Breast Cancer == | ||
=== | === [https://pubmed.ncbi.nlm.nih.gov/36358741/ PMID 36358741] === | ||
The data was collected from the breast cancer cohort of The Cancer Genome Atlas (TCGA) database. The cohort size was 399 patients. | The data was collected from the breast cancer cohort of The Cancer Genome Atlas (TCGA) database. The cohort size was 399 patients. | ||
| Line 11: | Line 13: | ||
<u>Curator comments:</u> I traced the link and looked at the summary page, which provided me with data on the cancer type, data types, and mutations. I focused on where I could find the response/nonresponse status. Since the response indicator for this PMID was “Patients had no disease progression after first-line chemotherapy during the 150 months follow-up period”, I looked for where I could find data on progression of the disease. I found this data on the KM plot for overall survival and disease free months. If we isolated each of the data points on this graph, we could get each patient ID, cancer ID, and other data that corresponds with the responder status of the ID. For example, I found that the data point for the patient ID TCGA-B6-A0I5 had the highest overall percentage disease free after 281 months. When I traced this ID on the clinical data tab, I found the specific mutations and genome alterations to the corresponding ID, which were two important factors looked at in the paper. | <u>Curator comments:</u> I traced the link and looked at the summary page, which provided me with data on the cancer type, data types, and mutations. I focused on where I could find the response/nonresponse status. Since the response indicator for this PMID was “Patients had no disease progression after first-line chemotherapy during the 150 months follow-up period”, I looked for where I could find data on progression of the disease. I found this data on the KM plot for overall survival and disease free months. If we isolated each of the data points on this graph, we could get each patient ID, cancer ID, and other data that corresponds with the responder status of the ID. For example, I found that the data point for the patient ID TCGA-B6-A0I5 had the highest overall percentage disease free after 281 months. When I traced this ID on the clinical data tab, I found the specific mutations and genome alterations to the corresponding ID, which were two important factors looked at in the paper. | ||
=== | === [https://pubmed.ncbi.nlm.nih.gov/36257316/ PMID 36257316] === | ||
The purpose of this study was to explore how Ferroptosis (a type of cell death linked to iron and lipid metabolism) works differently across triple-negative breast cancer (TNBC) tumors, and to test whether blocking the enzyme GPX4 could make TNBC cells more sensitive to immunotherapy (anti–PD-1). | The purpose of this study was to explore how Ferroptosis (a type of cell death linked to iron and lipid metabolism) works differently across triple-negative breast cancer (TNBC) tumors, and to test whether blocking the enzyme GPX4 could make TNBC cells more sensitive to immunotherapy (anti–PD-1). | ||
| Line 26: | Line 28: | ||
== Lung Cancer == | == Lung Cancer == | ||
=== | === [https://pubmed.ncbi.nlm.nih.gov/27613525/ PMID 27613525] === | ||
Detailed clinico-pathological information was collected from 6 different cohorts derived from public datasets excluding cases with additional chemotherapy or treatment. The different properties and metadata included in the datasets provided are in similar formats. With some filtering, the general dataset is presented as such: | Detailed clinico-pathological information was collected from 6 different cohorts derived from public datasets excluding cases with additional chemotherapy or treatment. The different properties and metadata included in the datasets provided are in similar formats. With some filtering, the general dataset is presented as such: | ||
| Line 37: | Line 39: | ||
<u>Curator comments</u>: The focus of the article is the predictive capability and potential of the two genes: DUSP6 and ACTN4 which can give an accurate result for the patient’s prognosis. Therefore, originally for the purposes of our study and the results of interventions, diagnosis biomarkers are not ideal, however, after looking at the datasets themselves, it would be interesting to consider its capability for our purposes. Due to the different datasets containing metadata such as age of surgery and whether the patient had a history of smoking. We could look at the exact details for the surgery as our intervention and use the dead or alive category as the responder/non-responder property based on the type of surgery and taking into consideration the severity of the condition. | <u>Curator comments</u>: The focus of the article is the predictive capability and potential of the two genes: DUSP6 and ACTN4 which can give an accurate result for the patient’s prognosis. Therefore, originally for the purposes of our study and the results of interventions, diagnosis biomarkers are not ideal, however, after looking at the datasets themselves, it would be interesting to consider its capability for our purposes. Due to the different datasets containing metadata such as age of surgery and whether the patient had a history of smoking. We could look at the exact details for the surgery as our intervention and use the dead or alive category as the responder/non-responder property based on the type of surgery and taking into consideration the severity of the condition. | ||
=== | === [https://pubmed.ncbi.nlm.nih.gov/38935373/ PMID 38935373] === | ||
The purpose of this study was to see if a newer type of radiation called intensity-modulated radiotherapy (IMRT) works better and is safer than an older method, three-dimensional conformal radiotherapy (3D-CRT), when both are given with chemotherapy to people with advanced non-small cell lung cancer (NSCLC) that cannot be removed by surgery. | The purpose of this study was to see if a newer type of radiation called intensity-modulated radiotherapy (IMRT) works better and is safer than an older method, three-dimensional conformal radiotherapy (3D-CRT), when both are given with chemotherapy to people with advanced non-small cell lung cancer (NSCLC) that cannot be removed by surgery. | ||
| Line 52: | Line 54: | ||
== Liver Cancer == | == Liver Cancer == | ||
=== | === [https://pubmed.ncbi.nlm.nih.gov/34975338/ PMID 34975338] === | ||
RNA-seq datasets from publicly available datasets were used to compute a stemness (the stem cell-like properties of a subpopulation of tumor cells known as cancer stem cells (CSCs) which are responsible for initiating and sustaining tumor growth) index (mRNAsi) for HCC (hepatocellular carcinoma) patients. Using the index, HCC patients were categorized into two stemness subtypes which would essentially determine the effectiveness and sensitivity of therapies that would target immunotherapy. | RNA-seq datasets from publicly available datasets were used to compute a stemness (the stem cell-like properties of a subpopulation of tumor cells known as cancer stem cells (CSCs) which are responsible for initiating and sustaining tumor growth) index (mRNAsi) for HCC (hepatocellular carcinoma) patients. Using the index, HCC patients were categorized into two stemness subtypes which would essentially determine the effectiveness and sensitivity of therapies that would target immunotherapy. | ||
| Line 63: | Line 65: | ||
== Ovarian Cancer == | == Ovarian Cancer == | ||
=== | === [http://pubmed.ncbi.nlm.nih.gov/35671108/ PMID 35671108] === | ||
'''THIS DATA HAS BEEN USED TO TRAIN MODELS FOR THE PREDICTMOD PROJECT.''' | |||
See https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-Methylation and https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-RNAseq. | |||
This study investigates the combination of epigenetic priming (guadecitabine) and immunotherapy (pembrolizumab) in patients with platinum-resistant ovarian, fallopian tube, or primary peritoneal cancer (N=35). The study defines clear response criteria, classifying patients as Responders (durable clinical benefit, CBR; receiving ≥6 cycles) and Non-Responders. The primary endpoint is objective response rate (ORR by RECIST 1.1). All high-throughput sequencing data (methylomic and transcriptomic) is publicly available in GEO (GSE186825, GSE188250), making it suitable for intervention outcome prediction modelling. | This study investigates the combination of epigenetic priming (guadecitabine) and immunotherapy (pembrolizumab) in patients with platinum-resistant ovarian, fallopian tube, or primary peritoneal cancer (N=35). The study defines clear response criteria, classifying patients as Responders (durable clinical benefit, CBR; receiving ≥6 cycles) and Non-Responders. The primary endpoint is objective response rate (ORR by RECIST 1.1). All high-throughput sequencing data (methylomic and transcriptomic) is publicly available in GEO (GSE186825, GSE188250), making it suitable for intervention outcome prediction modelling. | ||
== Esophageal Cancer == | == Esophageal Cancer == | ||
=== | === [https://pubmed.ncbi.nlm.nih.gov/37313409/ PMID 37313409] === | ||
Single Cell RNA Sequencing Data: This paper selects transcriptomic data from the GSE78220 cohort, which included melanoma patients who received anti-PD-1 checkpoint inhibition therapy before treatment. | Single Cell RNA Sequencing Data: This paper selects transcriptomic data from the GSE78220 cohort, which included melanoma patients who received anti-PD-1 checkpoint inhibition therapy before treatment. | ||
Latest revision as of 20:44, 9 January 2026
Go Back to PredictMod Project.
The following publications have been evaluated for their usefulness in providing publicly-available datasets for intervention outcome prediction models.
Breast Cancer
PMID 36358741
The data was collected from the breast cancer cohort of The Cancer Genome Atlas (TCGA) database. The cohort size was 399 patients.
All the genomic data was found at this link: https://www.cbioportal.org/study/summary?id=brca_tcga
Curator comments: I traced the link and looked at the summary page, which provided me with data on the cancer type, data types, and mutations. I focused on where I could find the response/nonresponse status. Since the response indicator for this PMID was “Patients had no disease progression after first-line chemotherapy during the 150 months follow-up period”, I looked for where I could find data on progression of the disease. I found this data on the KM plot for overall survival and disease free months. If we isolated each of the data points on this graph, we could get each patient ID, cancer ID, and other data that corresponds with the responder status of the ID. For example, I found that the data point for the patient ID TCGA-B6-A0I5 had the highest overall percentage disease free after 281 months. When I traced this ID on the clinical data tab, I found the specific mutations and genome alterations to the corresponding ID, which were two important factors looked at in the paper.
PMID 36257316
The purpose of this study was to explore how Ferroptosis (a type of cell death linked to iron and lipid metabolism) works differently across triple-negative breast cancer (TNBC) tumors, and to test whether blocking the enzyme GPX4 could make TNBC cells more sensitive to immunotherapy (anti–PD-1).
Condition: 465 TNBC patients from a large multi-omics dataset, including 360 with transcriptomic data, 279 with whole-exome sequencing (WES), 401 with somatic copy-number alteration (SCNA) data, and 330 with metabolomic data. Additional validation was done in LAR-like TNBC mouse models (TS/A cells in BALB/c mice). Clinical data came from the I-SPY2 cohort and related GEO datasets (GSE173839, GSE124821, GSE176078).
Intervention (drug treatment): Combination of GPX4 inhibitors (RSL3, ML162) and anti–PD-1 immunotherapy, tested both alone and together. Mice received treatments of: GPX4 inhibitor only, anti–PD-1 only, or the combination of both.
# of Patients / Samples: 465 patients in the TNBC multi-omics cohort. 8 mice per treatment group in the in-vivo study.
Primary Endpoint: To see whether GPX4 inhibition could: Slow down tumor growth, change the tumor microenvironment to be more immune-active, improve tumor control when combined with anti–PD-1 therapy.
Responder / Non-Responder Definition: Responders: Tumors where GPX4 inhibition caused ferroptosis, reduced growth, and made the tumor environment more inflammatory. Non-Responders: Tumors with high GSH metabolism (glutathione pathway) that resisted ferroptosis and showed poor response to immunotherapy.
Lung Cancer
PMID 27613525
Detailed clinico-pathological information was collected from 6 different cohorts derived from public datasets excluding cases with additional chemotherapy or treatment. The different properties and metadata included in the datasets provided are in similar formats. With some filtering, the general dataset is presented as such:
Then the two-gene classifier was calculated for the samples and sorted into low, medium, or high classifiers. 253 genes were selected based on literature research and array data was analyzed on patients with early stage lung cancer.
The results were validated using qRT-PCR in the same sample population and measured as significantly correlated (p < 0.05) with the microarray data for all of the 20 genes that were most significantly associated with relapse-free survival (RFS) using univariable Cox regression. 7 out of the 20 genes measured were significantly associated with RFS. Two of which analyzed were DUSP6 or ACTN4 in high expression indicated a worse prognosis in Japan and NCI-MD/Norway cohorts. This was validated with the other four cohorts and a fixed effects meta-analysis of the datasets demonstrated no heterogeneity or inconsistency. Therefore, the two-gene classifier reliably and precisely identified stage I + II and stage I patients at high risk for death.
Curator comments: The focus of the article is the predictive capability and potential of the two genes: DUSP6 and ACTN4 which can give an accurate result for the patient’s prognosis. Therefore, originally for the purposes of our study and the results of interventions, diagnosis biomarkers are not ideal, however, after looking at the datasets themselves, it would be interesting to consider its capability for our purposes. Due to the different datasets containing metadata such as age of surgery and whether the patient had a history of smoking. We could look at the exact details for the surgery as our intervention and use the dead or alive category as the responder/non-responder property based on the type of surgery and taking into consideration the severity of the condition.
PMID 38935373
The purpose of this study was to see if a newer type of radiation called intensity-modulated radiotherapy (IMRT) works better and is safer than an older method, three-dimensional conformal radiotherapy (3D-CRT), when both are given with chemotherapy to people with advanced non-small cell lung cancer (NSCLC) that cannot be removed by surgery.
Condition: Patients with locally advanced NSCLC who could not have surgery, enrolled in the NRG Oncology–RTOG 0617 phase 3 clinical trial. 483 patients (average age 64 years; 40% female) received carboplatin and paclitaxel chemotherapy along with radiation.
Intervention (treatment comparison): Compared IMRT vs 3D-CRT, both given with the same chemotherapy. Looked at how much radiation reached the heart and lungs — measured as lung V5 and heart V40 — and how those doses affected survival and side effects.
# of Patients / Samples: Total: 483 (IMRT = 228 (47%); 3D-CRT = 255 (53%)). Follow-up time: about 5 years.
Primary Endpoint: Measured overall survival (OS), progression-free survival (PFS), local failure, new cancers, and severe side effects (grade ≥ 3). Focused on whether heart and lung radiation dose affected long-term outcomes.
Responder / Non-Responder Definition: Responders: Patients treated with IMRT whose heart V40 < 20% had fewer severe lung side effects (pneumonitis) and lived longer (2.5 years vs 1.7 years). Non-Responders: Patients with heart V40 ≥ 20% or who received 3D-CRT, showing more pneumonitis and lower survival (HR 1.34 [95% CI 1.06–1.70]; p = 0.01).
Liver Cancer
PMID 34975338
RNA-seq datasets from publicly available datasets were used to compute a stemness (the stem cell-like properties of a subpopulation of tumor cells known as cancer stem cells (CSCs) which are responsible for initiating and sustaining tumor growth) index (mRNAsi) for HCC (hepatocellular carcinoma) patients. Using the index, HCC patients were categorized into two stemness subtypes which would essentially determine the effectiveness and sensitivity of therapies that would target immunotherapy.
When looking for the data itself, it is publicly available, and when exploring the National Cancer Institute GDC Data Portal, there is extensive meta data such as the demographic, race, age (Days to Birth), Days To Death, Vital Status, Primary Diagnosis, Disease Response, and Therapeutic Agents. The formatting may be difficult as I would need to fully download the data before seeing it in the json or csv format to judge but by compiling the variables above, a table could be presented as such by pulling specific relevant data:
With some possible formatting, it would be usable for PredictMod with the selection of data that would be for a specific treatment, in the article, it was noted that Erlotinib was analyzed. Additionally, the assumption that the variable where the patients’ overall survival (OS) in HCC is equivalent to the Disease Response identified in the actual data. The data also seemed to have gone through multiple bioinformatic pipelines, therefore, the disease response could have also been calculated using different biological biomarkers and that data is not accessible.
Ovarian Cancer
PMID 35671108
THIS DATA HAS BEEN USED TO TRAIN MODELS FOR THE PREDICTMOD PROJECT.
See https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-Methylation and https://hivelab.biochemistry.gwu.edu/predictmod/models/Ovarian-Cancer-RNAseq.
This study investigates the combination of epigenetic priming (guadecitabine) and immunotherapy (pembrolizumab) in patients with platinum-resistant ovarian, fallopian tube, or primary peritoneal cancer (N=35). The study defines clear response criteria, classifying patients as Responders (durable clinical benefit, CBR; receiving ≥6 cycles) and Non-Responders. The primary endpoint is objective response rate (ORR by RECIST 1.1). All high-throughput sequencing data (methylomic and transcriptomic) is publicly available in GEO (GSE186825, GSE188250), making it suitable for intervention outcome prediction modelling.
Esophageal Cancer
PMID 37313409
Single Cell RNA Sequencing Data: This paper selects transcriptomic data from the GSE78220 cohort, which included melanoma patients who received anti-PD-1 checkpoint inhibition therapy before treatment.