Prediction model for the risk of osteoporosis incorporating factors of disease history and living habits in physical examination of population in Chongqing, Southwest China: based on artificial neural network

Background Osteoporosis is a gradually recognized health problem with risks related to disease history and living habits. This study aims to establish the optimal prediction model by comparing the performance of four prediction models that incorporated disease history and living habits in predicting the risk of Osteoporosis in Chongqing adults. Methods We conduct a cross-sectional survey with convenience sampling in this study. We use a questionnaire From January 2019 to December 2019 to collect data on disease history and adults’ living habits who got dual-energy X-ray absorptiometry. We established the prediction models of osteoporosis in three steps. Firstly, we performed feature selection to identify risk factors related to osteoporosis. Secondly, the qualified participants were randomly divided into a training set and a test set in the ratio of 7:3. Then the prediction models of osteoporosis were established based on Artificial Neural Network (ANN), Deep Belief Network (DBN), Support Vector Machine (SVM) and combinatorial heuristic method (Genetic Algorithm - Decision Tree (GA-DT)). Finally, we compared the prediction models’ performance through accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC) to select the optimal prediction model. Results The univariate logistic model found that taking calcium tablet (odds ratio [OR] = 0.431), SBP (OR = 1.010), fracture (OR = 1.796), coronary heart disease (OR = 4.299), drinking alcohol (OR = 1.835), physical exercise (OR = 0.747) and other factors were related to the risk of osteoporosis. The AUCs of the training set and test set of the prediction models based on ANN, DBN, SVM and GA-DT were 0.901, 0.762; 0.622, 0.618; 0.698, 0.627; 0.744, 0.724, respectively. After evaluating four prediction models’ performance, we selected a three-layer back propagation neural network (BPNN) with 18, 4, and 1 neuron in the input layer, hidden and output layers respectively, as the optimal prediction model. When the probability was greater than 0.330, osteoporosis would occur. Conclusions Compared with DBN, SVM and GA-DT, the established ANN model had the best prediction ability and can be used to predict the risk of osteoporosis in physical examination of the Chongqing population. The model needs to be further improved through large sample research. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-11002-5.


Background
Osteoporosis is defined by the World Health Organization (WHO) as a 'skeletal disease characterized by low bone mass and microarchitectural deterioration of bone tissue' [1]. About 200 million people worldwide suffer from osteoporosis, while around 88 million people in China suffer from it [2]. With increasing age, the incidence of osteoporotic fractures increases, and these fractures significantly reduce the quality of life. Moreover, osteoporosis is usually detected when a fracture occurs. Fortunately, osteoporosis can be prevented. Research by Yu et al. revealed that improving lifestyle can control and prevent osteoporosis [3]. For example, it has been found that taking calcium tablet, cooking, and doing housework were associated with lowering the risk of osteoporosis, while smoking, drinking alcohol, and sedentary behavior were positively correlated with osteoporosis. The research results also revealed that certain diseases were related to osteoporosis [4]. People with a history of hyperthyroidism, hypertension, coronary heart disease (CHD), diabetes mellitus (DM), and other diseases had a higher risk of osteoporosis.
Because the risk factors of osteoporosis interacted each other by a non-linear mechanism, it was difficult for traditional linear regression models and logistic regression models to solve collinearity [5,6]. Therefore, machine learning approaches, combinatorial heuristics, and other specific algorithms may be required [7]. In addition, machine learning approaches were the new method in the field of public health. However, few studies assessed risk factors for osteoporosis based on these models [8]. Therefore, this study aims to illustrate the potential use of ANN in predicting the risk of osteoporosis by comparing the performance of four prediction models that combined with disease history and living habits.

Study subjects
Considering the availability of physical examination data and questionnaire data as well as the operability of the study, this study selected a convenience sampling to distribute personal health status and lifestyle questionnaire designed by the research team (see more in Supplementary File 1) to the adults who got dual-energy X-ray absorptiometry in the Medical Examination Center of the First Affiliated Hospital of Chongqing Medical University. Participants who had previously been diagnosed with osteoporosis were excluded. Questionnaires with missing content, especially those with missing medical card numbers and obvious logical errors will be excluded from the analysis. In the end, 1419 questionnaires (all over 20 years old) were effectively returned, and the effective response rate was 86.2%. The inclusion process of participants in this study is shown in Fig. 1.

Data collection
Well-trained staff conducted a questionnaire survey on all subjects to obtain their information. The questionnaire consists of three parts, general information (including name, gender, medical card number), personal health (including hyperthyroidism, hypertension, CHD, DM, chronic gastrointestinal disease, chronic renal failure, gout, and malignant tumor), and living habits (take calcium tablet, smoking status, drinking status, cooking status, the nature of work, the main mode of transportation to work, housework, and physical exercise). Physical examination (weight, height, waist circumference (WC), blood pressure) was measured with standard medical equipment. The participants' bone mineral density (BMD) was measured using DXA (MEDIX DR 2D-fan beam densitometer; Medilink, France). Daily phantom scans were performed every morning for quality control, and all BMD scans were conducted by well-trained examiners using standardized procedures, according to the recommended protocols by manufacturers. Body mass index (BMI) was calculated as weight in kilograms divided by the square of height in meters (kg/m 2 ).

Definitions
Normal BMD was defined if the T-score was more than − 1.0, osteopenia was defined if the T-score was less than or equal to − 1.0 and more than − 2.5, and osteoporosis was defined if the T-score was less than or equal to − 2.5, according to the WHO criteria (1994) [9]. As osteoporosis and osteopenia were the similar entity in nature, and the number of osteoporosis cases were small in this study, osteoporosis and osteopenia were combined into one outcome, which is called low BMD [10]. A history of hyperthyroidism, hypertension, CHD, DM, chronic gastrointestinal disease, chronic renal failure, gout, and malignant tumor was defined as received treatment or taking medication. Drinking was dichotomized (current drinker vs former or nondrinker). Smoking was also dichotomized (current smoker vs former or nonsmoker). The questionnaire defined cooking as cooking more than five times a week; Housework as never, occasionally, often; Calcium tablets as taking it for more than seven consecutive days. The nature of work is divided into three levels: sedentary, mild labor, manual labor. Sedentary behaviors have been classified into a range of human endeavors that result in an energy expenditure of no more than 1.5 times' resting energy expenditure [11]. Manual labor was mainly based on physical activity, which required strong physical strength, greater muscle contraction, and physical movement of the limbs in space [12]. Mild labor falls somewhere between this sedentary and manual labor. The main modes of transportation to work are divided into four categories: working from home, walking, taking the bus and driving. Physical exercise was defined as whether they have participated in physical exercise lasting more than 20 min each time in recent years. There were two possible answers -yes and no.

Statistical analysis
Baseline characteristics were statistically described as mean ± standard deviation (SD). Continuous variables were analyzed by analysis of variance (ANOVA), and categorical variables were tested by chi-squared test. The univariate logistic regression analysis was used to estimate odds ratios (ORs) and 95% confidence intervals (CIs) of baseline characteristics related to osteoporosis. We used SAS statistical software (version 9.4; SAS Institute Inc., Cary, North Carolina) to analyze the data.
All eligible participants (N = 1419) were randomly divided into two sets: the training set (N1 = 993) and the test set (N2 = 426) in the ratio of 7:3, the process was based on deep learning of ANN for proportional division [13,14]. The input variables of osteoporosis prediction model based on ANN, DBN, SVM were selected by univariate logistic model (P < 0.20) [15], and the input variables of osteoporosis prediction model based on combination heuristic were selected by GA-DT. These models required that all variable values be standardized to 0 to 1. Binary variables were divided into 0 and 1; Non-binary variables were standardized to the range of 0 to 1 by the formula: X m ¼ X m −X min X max −X min . After four prediction models were established, we compared the risk assessment models' performance through accuracy, sensitivity, specificity, and receiver operating characteristic curve (ROC) to select the optimal prediction model. ROC curve is a visualized optimal model, which shows the performance of binary classifiers by considering the sensitivity, specificity, and accuracy of the model [16,17]. Unless specified, we used the significance level of 0.05 for all analyses. Use software R version 4.0.2 (R Fig. 1 The inclusion process of participants in this study Foundation for Statistical Computing, Vienna, Austria) and MATLAB (version R2013b; MathWorks.Inc., U.S.A) to build prediction models. Table 1 summarized the baseline characteristics of the normal BMD and the low BMD in all study populations.

Characteristics of subjects
In total study population, 460 out of 1419 subjects had low BMD, while the other 959 were normal. Compared with the normal BMD group, the low BMD group subjects were older and had lower BMI (P < 0.001). In the low BMD group, the incidence of fracture was higher (10.64% vs. 17.61%) and CHD (0.94% vs. 3.91%), the difference was statistically significant (P < 0.001), but other diseases such as hyperthyroidism, DM, chronic gastrointestinal disease, chronic renal failure, gout, and malignant tumor showed no significant difference between the two groups. The two groups had statistically significant differences in drug use (such as calcium tablets and estrogen drugs) and living habits (such as smoking, drinking, cooking, the nature of work, the main mode of transportation to work, doing the household work and physical exercise). There were no significant differences between two groups in terms of diastolic blood pressure (DBP).

Risk factors for osteoporosis
As shown in

Establishment of the prediction models for osteoporosis
We established an ANN model based on factors related to osteoporosis derived from univariate logistic regression analysis. Gender, age, BMI, SBP, disease history (including fracture, hypertension, CHD, DM, chronic gastrointestinal disease, gout) and living habits (including taking calcium tablet, smoking, drinking, cooking, the nature of work, the main mode of transportation to work, doing the housework, physical exercise) as the input layer of the model. The output variable was a binary variable of whether a person had osteoporosis. The ANN structure consisted of three layers (Fig. 2), and the parameters were selected according to previous related studies [6,18]. Training parameters such as learning rate and momentum were set at their default values. The performance of four prediction models Table 3 compared four prediction models through accuracy, sensitivity, specificity, AUC. The four indicators of the ANN model were better than the DBN. SVM and GA-DT had higher sensitivity than the ANN model, but the specificity was extremely low. Figure 3 shown the AUC obtained from the training set and test set of the ANN model.

Discussion
This was a novel study to develop an osteoporosis prediction model based on disease history and living habits.
Our study established an osteoporosis prediction models based on ANN with momentum algorithm, DBN, SVM, and GA-DT. Through the comparison of four prediction models, in addition to AUC, the accuracy, sensitivity, and specificity also indicated that the ANN model we constructed was reliable. For the training set and the test set, the AUCs of ANN model were 0.901 and 0.762, respectively. For the training group and test group, the cut-off probability of osteoporosis was 0.330. That is, when the probability of occurrence was significant than 0.330, osteoporosis will occur. In clinical diagnosis, ANN had great value. According to the relevant factors of the patient, the probability value of each patient was calculated. In 2017, Song et al. [19] used BP neural network, decision tree model and logistic regression model to predict the risk of individual DM. Through the AUC value, the BP neural network had the best performance in predicting the risk of DM. ANN is essentially a mathematical model, and its structure is similar to biological neural network. The input layer is the first layer in the ANN. It takes input signals (values) and passes them on to the next layer. A hidden layer in an ANN is a layer in between input layers and output layers, where artificial neurons take in a set of weighted inputs and produce an output through an activation function. It is a typical part of nearly any neural network in which engineers simulate the types of activity that go on in the human brain. The output layer in an ANN is the last layer of neurons that produces given outputs for the program [20]. The ANN models have been widely applied in research because they can model highly non-linear systems in which the relationship among the variables is unknown or very complicated, especially in medical field, which is an emerging phenomenon [21]. In most study cases [22][23][24][25][26], the ANN model is of practical value for predicting the 5year overall survival rate after gastrectomy for gastric cancer, diagnosing congenital heart disease in pregnant women, and analyzing the risk of thyroid cancer and monitoring the trends and the incidence of AIDS in China and the classification of leukemia.
ANN includes the BP algorithm and BP with momentum algorithm. The BP algorithm is widely regarded as a powerful tool for training feedforward neural networks. However, because it uses the steepest descent method to update the weights, its convergence speed is prolonged, and it often obtains sub-optimal solutions [27]. On the contrary, the BP of the momentum algorithm converges quickly. The significance of this model is that the probability of suffering from osteoporosis can be obtained when age, gender, BMI, SBP, history of fracture, history of hypertension, history of CHD, history of DM, history of chronic gastrointestinal disease, history of gout, taking calcium tablet, cooking, drinking alcohol, smoking, the nature of work, the main mode of transportation to work, doing the housework, and physical exercise were given as input into the model. These input variables are easily available. This model Fig. 2 Graphic representation of the basic architecture of ANN used in training set. x 1 represents age, x 2 represents gender, x 3 represents BMI, x 4 represents SBP, x 5 represents history of fracture, x 6 represents history of hypertension, x 7 represents history of CHD, x 8 represents history of DM, x 9 represents history of chronic gastrointestinal disease, x 10 represents history of gout, x 11 represents take calcium tablet, x 12 represents cooking, x 13 represents drinking alcohol, x 14 represents smoking, x 15 represents the nature of work, x 16 represents the main mode of transportation to work, x 17 represents do the household work, x 18 represents physical exercise, and y represents osteoporosis can be used as a tool for preliminary judgment of osteoporosis.
We found that the top five predictors were age, gender, disease history (DM, hypertension), and living habits (smoking). The results of Hiremath [28] uncovered that the risk increased by 20% with every 5 years increase of age. Besides, Fu et al. [29] believed that the most people aged 45-50 may suffer from osteoporosis, which is almost a common feature of human aging, and the incidence of a fracture depends on the age of the subject. Interestingly, some studies have demonstrated that postmenopausal women were susceptible to this disease, and low bone density had nothing to do with race [30,31]. The main difference between the sexes is that males will produce larger bones than females as they grow, while females have better microstructures inside and have less chance of bone reconstruction. Furthermore, in direct or indirect mechanisms, sex hormones also play a leading and vital role in the physiology of bone [32]. Therefore, estrogen secretion in women during menopause is reduced, which ultimately leads to osteopenia and osteoporosis. Compared with women, men had a later age of bone loss. However, the mechanism between DM and bone condition was not fully understood clinically. In most studies, there was no consistent relationship between DM control and osteopenia in patients with type II DM [33]. Existing results shown that people with DM had lower bone density than people without DM. It can be concluded that there was an association between DM [34]. Besides, the available knowledge attested that smoking is a risk factor for osteoporosis [35,36]. The latest study used national sample cohort data to conduct a retrospective study. The results showed that long-term abstinence from smoking reduced the risk of fracture in men [37].
This study used standard protocols and instruments, and all participants undergone complete health examinations. In order to ensure the collection of high-quality data, strict personnel training process was established. All these are the advantages, but the limitations also need to note. First of all, convenience sampling method may cause selection bias. Therefore, the generalization of the prediction model should be cautious. Secondly, the study was cross-sectional, not longitudinal. Some confounding factors cannot be eliminated in the research, causal association between low BMD and risk factors cannot be examined. Finally, the measurement of BMD at a certain point in time may be affected by survival effects.

Conclusion
In summary, our study found that disease history and living habits were related to the risk of osteoporosis. The performance of the ANN model was better than other three models. The ANN model results showed that if the probability was more significant than 0.330, osteoporosis would occur. Further researches are needed to validate our model to predict the risk of osteoporosis in adults. Fig. 3 The receiver operating characteristic (ROC) curves obtained from the ANN in training set and test set