Metagenomic ML Tutorial
The following tutorial aims to provide step-by-step instructions for training and testing a Random Forest Classifier with gut microbiome relative abundance data. The metagenomic files were retrieved from SRA accession PRJNA454826 and comprise paired-end whole genome sequencing (WGS) reads. This data was analyzed and processed in the HIVE platform where both HIVE-Hexagon and Censuscope were utilized to perform sequence alignment and taxonomic profiling, respectively.
NOTE: Make sure to run, by clicking the arrow key next to the Python code, for each step prior to moving on to the next step to ensure all necessary information is loaded.
Step 1: The first step is to import all necessary packages that are needed to conduct this ML training and testing. Run the first python box, and make sure you see "Ready to Go" as your output.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
print("Ready to Go")