Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea

Kim, Junho; Mun, Sujeong; Lee, Siwoo; Jeong, Kyoungsik; Baek, Younghwa

doi:10.1186/s12889-022-13131-x

Table 2 The models’ performance with 95% confidence interval according to the number of features used

From: Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea

	F1-score		Accuracy		Sensitivity		Specificity		AUC
	Original	SMOTE	Original	SMOTE	Original	SMOTE	Original	SMOTE	Original	SMOTE
*4 Features (Demographic and anthropometric Features)*
Decision Tree	0.711 (0.66–0.76)	0.758 (0.71–0.80)	0.711 (0.66–0.76)	0.758 (0.71–0.80)	0.573 (0.52–0.63)	0.758 (0.71–0.80)	0.782 (0.74–0.83)	0.758 (0.71–0.80)	0.677 (0.63–0.73)	0.758 (0.71–0.80)
Gaussian NB	0.789 (0.75–0.83)	0.780 (0.74–0.82)	0.790 (0.75–0.83)	0.780 (0.74–0.82)	0.684 (0.63–0.73)	0.790 (0.75–0.83)	0.844 (0.80–0.88)	0.769 (0.72–0.81)	0.764 (0.72–0.81)	0.780 (0.74–0.82)
KNN	0.774 (0.73–0.82)	0.783 (0.74–0.83)	0.777 (0.73–0.82)	0.783 (0.74–0.83)	0.619 (0.57–0.67)	0.826 (0.79–0.87)	0.859 (0.82–0.90)	0.740 (0.69–0.79)	0.739 (0.69–0.79)	0.783 (0.74–0.83)
XGBoost	0.771 (0.73–0.82)	0.802 (0.76–0.84)	0.773 (0.73–0.82)	0.802 (0.76–0.85)	0.626 (0.57–0.68)	0.812 (0.77–0.85)	0.848 (0.81–0.89)	0.792 (0.75–0.84)	0.737 (0.69–0.78)	0.802 (0.76–0.85)
RF	0.772 (0.73–0.82)	0.813 (0.77–0.86)	0.774 (0.73–0.82)	0.814 (0.77–0.86)	0.628 (0.58–0.68)	0.832 (0.79–0.87)	0.850 (0.81–0.89)	0.795 (0.75–0.84)	0.739 (0.69–0.79)	0.814 (0.77–0.86)
Logistic R	0.777 (0.73–0.82)	0.783 (0.74–0.83)	0.787 (0.74–0.83)	0.784 (0.74–0.83)	0.558 (0.50–0.61)	0.799 (0.76–0.84)	0.904 (0.87–0.94)	0.768 (0.72–0.81)	0.731 (0.68–0.78)	0.784 (0.74–0.83)
SVM	0.787 (0.74–0.83)	0.785 (0.74–0.83)	0.795 (0.75–0.84)	0.785 (0.74–0.83)	0.585 (0.53–0.64)	0.809 (0.77–0.85)	0.903 (0.87–0.93)	0.762 (0.72–0.81)	0.744 (0.70–0.79)	0.786 (0.74–0.83)
MLP	0.785 (0.74–0.83)	0.770 (0.72–0.82)	0.792 (0.75–0.84)	0.772 (0.73–0.82)	0.607 (0.55–0.66)	0.735 (0.69–0.78)	0.887 (0.85–0.92)	0.809 (0.77–0.85)	0.747 (0.70–0.79)	0.772 (0.73–0.82)
1D-CNN	0.779 (0.73–0.82)	0.783 (0.74–0.83)	0.782 (0.74–0.83)	0.784 (0.74–0.83)	0.657 (0.61–0.71)	0.784 (0.74–0.83)	0.846 (0.81–0.88)	0.784 (0.74–0.83)	0.752 (0.71–0.80)	0.784 (0.74–0.83)
*12 Features (Lifestyle-related features added)*
Decision Tree	0.722 (0.67–0.77)	0.765 (0.72–0.81)	0.724 (0.68–0.77)	0.765 (0.72–0.81)	0.570 (0.52–0.62)	0.776 (0.73–0.82)	0.803 (0.76–0.85)	0.755 (0.71–0.80)	0.686 (0.64–0.74)	0.765 (0.72–0.81)
Gaussian NB	0.775 (0.73–0.82)	0.766 (0.72–0.81)	0.774 (0.73–0.82)	0.766 (0.72–0.81)	0.685 (0.64–0.74)	0.773 (0.73–0.82)	0.820 (0.78–0.86)	0.759 (0.71–0.81)	0.753 (0.71–0.80)	0.766 (0.72–0.81)
KNN	0.738 (0.69–0.78)	0.780 (0.73–0.82)	0.743 (0.70–0.79)	0.782 (0.74–0.83)	0.551 (0.50–0.60)	0.879 (0.84–0.91)	0.842 (0.80–0.88)	0.685 (0.63–0.73)	0.696 (0.65–0.75)	0.782 (0.74–0.83)
XGBoost	0.778 (0.73–0.82)	0.834 (0.79–0.87)	0.782 (0.74–0.83)	0.834 (0.79–0.87)	0.622 (0.57–0.67)	0.837 (0.8–0.88)	0.863 (0.83–0.90)	0.832 (0.79–0.87)	0.743 (0.70–0.79)	0.834 (0.79–0.87)
RF	0.791 (0.75–0.83)	0.838 (0.80–0.88)	0.795 (0.75–0.84)	0.838 (0.80–0.88)	0.635 (0.58–0.69)	0.850 (0.81–0.89)	0.876 (0.84–0.91)	0.826 (0.79–0.87)	0.756 (0.71–0.80)	0.838 (0.80–0.88)
Logistic R	0.785 (0.74–0.83)	0.779 (0.73–0.82)	0.792 (0.75–0.84)	0.779 (0.73–0.82)	0.595 (0.54–0.65)	0.791 (0.75–0.83)	0.893 (0.86–0.93)	0.767 (0.72–0.81)	0.744 (0.70–0.79)	0.779 (0.73–0.82)
SVM	0.790 (0.75–0.83)	0.783 (0.74–0.83)	0.797 (0.75–0.84)	0.783 (0.74–0.83)	0.605 (0.55–0.66)	0.796 (0.75–0.84)	0.894 (0.86–0.93)	0.770 (0.72–0.82)	0.750 (0.70–0.80)	0.783 (0.74–0.83)
MLP	0.772 (0.73–0.82)	0.797 (0.75–0.84)	0.778 (0.73–0.82)	0.798 (0.75–0.84)	0.619 (0.57–0.67)	0.790 (0.75–0.83)	0.859 (0.82–0.90)	0.806 (0.76–0.85)	0.739 (0.69–0.79)	0.798 (0.75–0.84)
1D-CNN	0.771 (0.73–0.82)	0.770 (0.72–0.82)	0.776 (0.73–0.82)	0.774 (0.73–0.82)	0.635 (0.58–0.69)	0.861 (0.82–0.90)	0.848 (0.81–0.89)	0.688 (0.64–0.74)	0.742 (0.69–0.79)	0.775 (0.73–0.82)
*20 Features (Biochemical measurements added)*
Decision Tree	0.743 (0.70–0.79)	0.777 (0.73–0.82)	0.743 (0.70–0.79)	0.778 (0.73–0.82)	0.631 (0.58–0.68)	0.797 (0.75–0.84)	0.801 (0.76–0.84)	0.758 (0.71–0.80)	0.716 (0.67–0.76)	0.778 (0.73–0.82)
Gaussian NB	0.786 (0.74–0.83)	0.759 (0.71–0.81)	0.795 (0.75–0.84)	0.762 (0.72–0.81)	0.577 (0.52–0.63)	0.646 (0.59–0.70)	0.906 (0.87–0.94)	0.878 (0.84–0.91)	0.741 (0.69–0.79)	0.762 (0.72–0.81)
KNN	0.748 (0.70–0.79)	0.787 (0.74–0.83)	0.756 (0.71–0.80)	0.788 (0.74–0.83)	0.540 (0.49–0.59)	0.871 (0.83–0.91)	0.866 (0.83–0.90)	0.705 (0.66–0.75)	0.703 (0.65–0.75)	0.788 (0.74–0.83)
XGBoost	0.801 (0.76–0.84)	0.851 (0.81–0.89)	0.804 (0.76–0.85)	0.851 (0.81–0.89)	0.662 (0.61–0.71)	0.859 (0.82–0.9)	0.877 (0.84–0.91)	0.843 (0.8–0.88)	0.769 (0.72–0.81)	0.851 (0.81–0.89)
RF	0.815 (0.77–0.86)	0.843 (0.80–0.88)	0.818 (0.78–0.86)	0.844 (0.80–0.88)	0.690 (0.64–0.74)	0.857 (0.82–0.89)	0.883 (0.85–0.92)	0.831 (0.79–0.87)	0.786 (0.74–0.83)	0.844 (0.80–0.88)
Logistic R	0.812 (0.77–0.85)	0.804 (0.76–0.85)	0.818 (0.78–0.86)	0.804 (0.76–0.85)	0.638 (0.59–0.69)	0.812 (0.77–0.85)	0.910 (0.88–0.94)	0.796 (0.75–0.84)	0.774 (0.73–0.82)	0.804 (0.76–0.85)
SVM	0.811 (0.77–0.85)	0.810 (0.77–0.85)	0.817 (0.78–0.86)	0.810 (0.77–0.85)	0.636 (0.58–0.69)	0.831 (0.79–0.87)	0.909 (0.88–0.94)	0.790 (0.75–0.83)	0.773 (0.73–0.82)	0.810 (0.77–0.85)
MLP	0.807 (0.76–0.85)	0.811 (0.77–0.85)	0.812 (0.77–0.85)	0.812 (0.77–0.85)	0.638 (0.59–0.69)	0.836 (0.80–0.88)	0.901 (0.87–0.93)	0.787 (0.74–0.83)	0.770 (0.72–0.81)	0.812 (0.77–0.85)
1D-CNN	0.799 (0.76–0.84)	0.814 (0.77–0.86)	0.803 (0.76–0.85)	0.815 (0.77–0.86)	0.662 (0.61–0.71)	0.807 (0.76–0.85)	0.875 (0.84–0.91)	0.822 (0.78–0.86)	0.768 (0.72–0.81)	0.815 (0.77–0.86)

Presented are the results before (Original) and after (SMOTE) applying the synthetic minority oversampling technique
AUC Area under the receiver operating characteristic curve, Gaussian NB Gaussian naïve bayes classifier, KNN K-nearest neighbor, XGBoost Extreme gradient boosting, Logistic R Logistic regression, RF Random forest, SVM Support vector machine, MLP Multilayer perceptron, 1D-CNN 1-dimensional convolutional neural network

Back to article page

ISSN: 1471-2458

Contact us

Submission enquiries: bmcpublichealth@biomedcentral.com
General enquiries: ORSupport@springernature.com

BMC Public Health

Contact us