Metagenomic ML Tutorial

From HIVE Lab
Revision as of 16:18, 24 February 2026 by Lorikrammer (talk | contribs) (Created page with "The following tutorial aims to provide step-by-step instructions for training and testing a Random Forest Classifier with gut microbiome relative abundance data. The metagenomic files were retrieved from SRA accession PRJNA454826 and comprise paired-end whole genome sequencing (WGS) reads. This data was analyzed and processed in the HIVE platform where both HIVE-Hexagon and Censuscope were utilized to perform sequence alignment and taxonomic profiling, respectively. NOT...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The following tutorial aims to provide step-by-step instructions for training and testing a Random Forest Classifier with gut microbiome relative abundance data. The metagenomic files were retrieved from SRA accession PRJNA454826 and comprise paired-end whole genome sequencing (WGS) reads. This data was analyzed and processed in the HIVE platform where both HIVE-Hexagon and Censuscope were utilized to perform sequence alignment and taxonomic profiling, respectively.

NOTE: Make sure to run, by clicking the arrow key next to the Python code, for each step prior to moving on to the next step to ensure all necessary information is loaded.

Step 1: The first step is to import all necessary packages that are needed to conduct this ML training and testing. Run the first python box, and make sure you see "Ready to Go" as your output.  

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.metrics import confusion_matrix

from sklearn.metrics import classification_report

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

print("Ready to Go")