Diseases Discrimination
Metabolic risk scores (MetRS) were respectively developed for prevalent and incident diseases. The MetRS for each disease was represented as risk probabilities ranging from 0 to 1 and was developed using a standard machine-learning pipeline. Initially, all 313 metabolites were utilized to train a basic classifier using a Light Gradient Boosting Machine (LightGBM) [1]
Then, we utilized LightGBM‘s built-in Information Gain algorithm to compute an importance score for each metabolite, allowing us to rank all metabolites based on their importance scores. The top 30 metabolites were selected as the predictors to establish the MetRS.
Subsequently, a fine-tuned LightGBM classifier was employed and post-added an isotonic regression to calibrate the predicted risks to the observed disease prevalence as the final MetRS. For this website, users can get the evaluation metrics of area under the ROC curve, accuracy, sensitivity, specificity, precision, Youden index, and F1 score.
In addition, the top 30 metabolites are demonstrated in a bar plot. Model development and evaluations were implemented through lightgbm (v3.3.2) and scikit-learn (v1.0.2) under Python (v3.9.16).