A routine biomarker-based risk prediction model for metabolic syndrome in urban Han Chinese population

Background Many MetS related biomarkers had been discovered, which provided the possibility for building the MetS prediction model. In this paper we aimed to develop a novel routine biomarker-based risk prediction model for MetS in urban Han Chinese population. Methods Exploring Factor analysis (EFA) was firstly conducted in MetS positive 13,345 males and 3,212 females respectively for extracting synthetic latent predictors (SLPs) from 11 routine biomarkers. Then, depending on the cohort with 5 years follow-up in 1,565 subjects (male 1,020 and female 545), a Cox model for predicting 5 years MetS was built by using SLPs as predictor; Area under the ROC curves (AUC) with 10 fold cross validation was used to evaluate its power. Absolute risk (AR) and relative absolute risk (RAR) were calculated to develop a risk matrix for visualization of risk assessment. Results Six SLPs were extracted by EFA from 11 routine health check-up biomarkers. Each of them reflected the specific pathogenesis of MetS, with inflammatory factor (IF) contributed by WBC & LC & NGC, erythrocyte parameter factor (EPF) by Hb & HCT, blood pressure factor (BPF) by SBP & DBP, lipid metabolism factor (LMF) by TG & HDL-C, obesity condition factor (OCF) by BMI, and glucose metabolism factor (GMF) by FBG with the total contribution of 81.55% and 79.65% for males and females respectively. The proposed metabolic syndrome synthetic predictor (MSP) based predict model demonstrated good performance for predicting 5 years MetS with the AUC of 0.802 (95% CI 0.776-0.826) in males and 0.902 (95% CI 0.874-0.925) in females respectively, even after 10 fold cross validation, AUC was still enough high with 0.796 (95% CI 0.770-0.821) in males and 0.897 (95% CI 0.868-0.921) in females. More importantly, the MSP based risk matrix with a series of risk warning index provided a feasible and practical tool for visualization of risk assessment in the prediction of MetS. Conclusions MetS could be explained by six SLPs in Chinese urban Han population. The proposed MSP based predict model demonstrated good performance for predicting 5 years MetS, and the MetS-based matrix provided a feasible and practical tool. Electronic supplementary material The online version of this article (doi:10.1186/s12889-015-1424-z) contains supplementary material, which is available to authorized users.


Background
Metabolic syndrome (MetS) is a disorder with cooccurrence of several known cardiovascular risk factors, including insulin resistance, obesity, atherogenic dyslipidemia and hypertension [1]. With the economic development and the changing of people's lifestyle in china, the prevalence of MetS is increasing rapidly. Compared with Europeans and Americans, Asians are more likely to have MetS [2]. Data from the China Health and Nutrition Survey conducted in 2009 suggested that the prevalence rate of MetS has reached up to 21.3% among the Chinese adults [3]. Many studies indicated that incidence of MetS will increase the risk of type 2 diabetes [4], cardiovascular disease [5][6][7][8][9], renal damage [10,11], and so on. Therefore, prediction of MetS is very essential for early prevention of the above diseases.
Some risk scores based on cross-sectional studies were structured for screening undiagnosed MetS [12][13][14], which depended on questionnaire survey about participants' lifestyle and medical histories. Although the area under the ROC curves for detecting the MetS in these studies were acceptable with a range from 72.4% to 80.1%, cross-sectional study could only provide temporal information of the subjects. Cohort study is more preferable for risk assessment. Hsiao and Yang conducted a two-year (from 2003 to 2005) [15] and a 5-year followup study (during 1997-2006) [16] respectively in Chinese population. Both of them confirmed that routine checkup biomarkers like serum cholesterol, triglycerides, blood glucose, measurement of body height and weight, blood pressure et al., could be served as effective predictors to MetS using multivariate logistic regression (MLR). However, MLR is not suitable for survival data, and it also limited the applying of the model in the first study due to relative short follow-up time and small sample. In the second study stepwise regression has ruled out many MetS related biomarkers from the model. Fortunately, many other studies [17][18][19][20][21][22][23][24][25][26][27][28][29] have found a number of MetS related biomarkers, which provide us a convenience to build the risk appraisal model of MetS. After studying 6Synthetic Latent Predictors from 11 MetS routine biomarkers in a MetS positive population, we develop a novel routine biomarker-based risk prediction model for MetS in urban Han Chinese population. subjects without MetS at baseline, 1,565 (1,020 males and 545 females) completed a 5-year follow-up and were included in the cohort study design. The cumulative incidence rate was calculated for 1,565 subjects who were followed up.

Biomarkers selection and measurements
In the present study, eleven biomarkers were selected from routine health check-up data, including body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), fasting blood-glucose (FBG), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), hemoglobin (Hg), hematocrit (HCT), white blood cell count (WBC), lymphocyte (LC), neutrophile granulocyte (NGC). Among them, BMI, SBP, DBP, TG, HDL-C and FBG were selected based on the traditional definition of MetS. The others were included according to the available results of peer studies: Hb [17][18][19], HCT [17,18,28], WBC [20][21][22][23][24][25][26][27], LC [23,29], NGC [23,29]. All measurements were conducted in the Center for Health Management of Shandong Provincial QianFoShan Hospital and the Health Examination Center of Shandong Provincial Hospital following same and standard procedures. Both of the two institutions are nationally accredited. The whole study was approved by the Ethics Committee of School of Public Health, Shandong University, and written informed consent was obtained from all eligible participants.

Statistical analysis
Descriptive statistics were conducted for 16,557 subjects with MetS at baseline. Student's t test was used to detect the statistical significances for 11 biomarkers between males and females, and the χ 2 test was conducted to detect the difference in the prevalence of the four basic components (obesity, hypertension, dyslipidemia and hyperglycemia) between males and females.
To eliminate multicollinearity between the routine check-up biomarkers and build a better model for MetS prediction, both Exploring factor analysis(EFA) and Cox proportional hazard regression model are applied in the present study. Finally, a MetS synthetic predictor (MSP) was developed through the following four steps: First, EFA with principal component algorithm and varimax rotation from correlation matrix was performed to extract independent MetS risk-related factors from the 11 routine check-up biomarkers in 16,557 subjects with MetS at baseline. The criteria for retaining factors in the present study was eigenvalue >0.9 (for keeping the accounting variations of total was greater than 70%). Only variables that share a factor loading of at least 0.50 were used for further analytical interpretation and named factors. Second, a Cox regression model was built between the hazard function of MetS and the extracted factors in 1,565 subjects in the cohort study design: h(t) = h0(t)exp (β 0 age + β 1 F 1 + β 2 F 2 + … + β k F k ), and a MSP was developed by MSP = β 1 F 1 + β 2 F 2 + … + β k F k . Third, the risk of MetS, for the 1565 subjects from the cohort, was estimated by where, P(t) was the predictive probability of MetS at year t, B = β 0 age + β 1 MSP. For both training set and 10 fold cross validation, Receiver Operator Characteristics curve (ROC curve) analysis was conducted, and the area under the ROC curve (AUC) together with sensitivity, specificity, 95% Confidence Interval and cut-off of P value was calculated by MedCalc software [31]. The optimal cut-off was estimated based on the Youden index criterion [32] which is optimal in the sense that it provides a score which reflects the intention of maximizing the overall correct classification rate. Finally, Excess Absolute Risk (EAR) and Relative Absolute Risk(RAR) were calculated for 1,565 subjects from the five-year follow-up who had completed physical examinations and the 11 biomarker measurements by EAR ¼ P j t ð Þ− P j t ð Þ and RAR ¼ P j t ð Þ= P j t ð Þ respectively, where P j (t) signified Absolute Risk (AR), namely the probability of MetS at year t, in which j noted subject's age. P j t ð Þ signified the average probability of MetS at year t in j th age, which can be calculated by model (1) through B j ¼ β 1 age j þ γ MSPj, where MSPj was the mean of MSP in j th age. All the steps were conducted in males and females respectively. The risk matrix for AR and RAR were depicted using ArcGIS 9.1, and all statistical analyses was performed using SAS 9.1.3 with P < 0.05 considered statistically significant.

Results
The prevalence of MetS in the study was 17.9% (16,557/ 92,284) (22.7% in males and 9.6% in females) at baseline.
At the end of the follow-up period of 1,565 subjects, 348 incident MetS cases (286 males and 62 females) were diagnosed and the cumulative incidence rate was 22.2% (28% in males and 11.4% in females) (see Additional file 1: Table S1). The prevalence of four basic components (obesity, hypertension, hyperglycemia, and dyslipidemia) was significantly different between males and females (see Additional file 2: Table S2). Table 1 showed the distribution of age and eleven biomarkers between males and females with MetS at baseline, indicating that all variables except BMI and LC were significantly different between males and females. Of them, DBP, TG, Hb, HCT, WBC and NGC were higher in males than in females, while age, SBP, FBG, and HDL-C were higher in females than in males. Correlation matrix between 11 biomarkers was illustrated in Additional file 3: Table S3. The results of EFA were showed in Table 2 with explained variance and cumulative variance, this suggested that six synthetic latent predictors (SLPs) could explain 81.55% and 79.65% of total variance for males and females respectively. According to the criteria of analytical interpretation stated in the statistical analysis section, they were named as inflammatory factor (IF), erythrocyte parameter factor (EPF), blood pressure factor (BPF), lipid metabolism factor (LMF), obesity condition factor (OCF), and glucose metabolism factor (GMF) in both males and females. Of the six SLPs  Figure 1 showed the result of ROC analysis to predict 5-year risk of MetS by the proposed predict model. It indicated that the AUC was up to 80.2% and 90.2% for males and females respectively in training set (seeing Figure 1A and 1B). While 79.6% and 89.7% after 10 fold cross validation. Figure 2 showed the 5-year AR matrix and RAR matrix for MetS by gender in the cohort (n = 1,565), specifically Figure 2A1 and 2A2 for males, and Figure 2B1 and 2B2 for females. These matrices provide a convenient tool for conducting MetS prediction in health management and clinical practice. For example, a man aged 30-year-old and having AR of 0.233 has an EAR of 0.054 (0.233-0.179) and RAR of 1.301 (0.233/0.179), while a man aged 60-year-old and having an AR of 0.233, has the EAR −0.187 (0.233-0.420) and RAR of 0.555 (0.233/ 0.420). These show that although their predictive probabilities for MetS over 5 years are the same, the younger man has a higher MetS risk compared to his peers, about 1.301 times than that of the average risk of 40-year-old population, indicating that changes in lifestyles and social intervention strategies are needed for him. Alternatively the MetS risk of the older man is lower than the average risk of the same age, only 55.5% of the average risk of 60-year-old population, indicating that he has a good health status compared with his peers.
Using the cut-off points showed in Figure 1A for males (0.2749) and Figure 1B for females (0.1181), people were classified as high-risk population (> the cut-off point value) or low-risk population (≤ the cut-off point value). The proportion of high-risk that comes with ageing in the general population (n = 92284) was drawn in Figure 3. Generally, the proportion of high risk subjects increase with age in both males and females. Nevertheless, the proportion of high-risk was higher in males than females before the age of 55, while it was the reverse after 55.

Metabolic syndrome synthetic predictor and its application in MetS prediction
At the end of the follow-up period, the cumulative incidence rate reach up to 22.2% (28% in males and 11.4% in females) (see Additional file 1: Table S1). Currently, three cross-sectional design based risk scores [12][13][14] and two cohort design based predictive models [15,16] had been developed to predict MetS on different ethnicities. Although these predict tools obtain acceptable power with their AUC ranged from 0.724 to 0.827, their risk algorithm and visualization of risk assessment still had development potential for improving power, feasibility and practicability. More importantly, the MSP was further used to construct the risk matrix with a series of risk warning indexes including average risk in population, AR & RAR for subjects, and the cut-off curve for predicted MetS (see Figure 2). This matrix provided a feasible and practical tool for visualization of risk assessment in the prediction of MetS. As an example, for a woman at a given age who receives health check-up, the risk matrices can provide her with AR ( Figure 2B1) and RAR ( Figure 2B2) compared with the average hazard within the same age group in females, this may urge her to intervene risk factors for reducing risk of MetS.

The risk distribution in urban Han Chinese population
The proportion of subjects with high-risk was higher in males than females before the age of 55, while it was in reverse after 55 (showed in Figure 3). Similar results have been obtained in the Korean population [47] with the demarcation point of 60 years old. In particular, the patterns of subjects with high-risk were quite different between males and females. The proportion of subjects with high-risk increased linearly with age in male population, while showed an S shaped curve in female population with the fastest growth period from 40 to 60 years old. This difference may be associated with women's menopause. Various studies indicated that natural menopause was associated with increased central adiposity [48], blood pressure [49][50][51][52][53][54][55], total cholesterol, LDL cholesterol and triglyceride levels [50,56], which would further increase