- Open Access
How to predict the electronic health literacy of Chinese primary and secondary school students?: establishment of a model and web nomograms
BMC Public Health volume 22, Article number: 1048 (2022)
The internet has become an important resource for the public to obtain health information. Therefore, the ability to obtain and use such resources has become important for health literacy. This study aimed to establish a prediction model of Chinese students’ electronic health literacy (EHL) to guide government policymaking and parental interventions, identify the predictors of EHL in Chinese students using random forests, and establish a corresponding prediction model to help policymakers and parents determine whether primary and secondary school students have high EHL.
This is a cross-sectional study. From June to August 2021, a cluster sample survey was conducted with 1,300 students from seven primary and secondary schools in Shaanxi Province, China. We evaluated 1,235 primary and secondary school students using the e-health literacy scale. The data were divided into training and testing datasets in a 70:30 ratio for further analysis using random forest. The predictive accuracy of the score was measured using the area under the receiver operating characteristic curve. We also used decision curve analysis to determine the usefulness of the prediction model by quantifying the net benefits at different threshold probabilities in the validation dataset.
We found that 33.6% of students had high EHL. The univariate analysis showed that age (P < 0.001), grade (P < 0.001), employment status (P < 0.001), household location (P < 0.001), parental phubbing behavior (P < 0.001), and general self-efficacy (P < 0.001) were significantly associated with EHL. A random forest classification model was developed with the training dataset (872 students), and seven variables were confirmed as important: age, grade, employment status, father education level, game time, parental phubbing behavior, and general self-efficacy. The validation of the model showed good discrimination, with an area under the curve of 0.975 in the training dataset and 0.738 in the testing dataset. The model was translated into an online risk calculator, which is freely available (https://xietao.shinyapps.io/DynNomapp/).
In this study, an intuitive tool to predict the EHL of Chinese primary and secondary school students was developed and validated.
Health literacy refers to the ability of individuals to obtain and understand health information and make correct health decisions . Electronic health literacy (EHL), first proposed by Canadian scholar Norman et al. , refers to the ability of individuals to obtain, understand, judge, and use information from electronic resources to solve their health problems. It is a concept that combines HL and electronic health . The e-health literacy scale (eHEALS), prepared by Norman et al. , is the first and currently the most commonly used EHL assessment tool. It mainly measures the self-perception skills of internet users when they seek and apply online health knowledge.
With the rapid development of internet technology, an increasing number of government departments, medical institutions, and nonprofit organizations have placed health-related information on the internet. Many people have begun to obtain health information through the internet, and EHL is gaining attention [5, 6]. However, not everyone has the HL to access appropriate health information, especially primary and secondary school students.
The popularity of the internet is quite high among primary and middle school students who are familiar with the most popular network applications and rely on network information technology for all kinds of communication, interaction, and access to information related to life and learning. However, previous studies have shown that junior high school students are not able to make good judgments about online health information and cannot use the internet to help solve health problems . Therefore, the ability to obtain and use such resources has become an important component of individual HL . Middle school students are in a critical development period where their world outlook, life outlook, and values form, and their ability to distinguish between good and bad information on the internet is not mature .
However, Chinese schools in this group have low basic knowledge of electronic media and EHL. If this problem is ignored, it will not be conducive to the balanced and healthy development of these students . In China, studies mainly focus on the current situation and influencing factors of EHL [11,12,13,14,15,16], the relationship between EHL and having a healthy lifestyle [17, 18], and the current situation of searching for health information on the internet . For example, to understand the status of EHL among college students in Guangdong province during the COVID-19 pandemic, Pan Chenghao et al.  conducted an online questionnaire survey among college students in Guangdong province and found that the level of EHL was low and female students and those who were more affected by information related to COVID-19 had lower EHL. Liu Jianchao et al.  selected 1157 college students from four higher vocational colleges in Jinan to investigate EHL and disease behavior and found that EHL is an important factor that affects the disease behavior of college students in higher vocational colleges. The above studies mainly focused on college students, and there are few studies on EHL among primary and secondary school students [9, 10, 20]. Linan et al.  used the eHEALS scale to conduct an EHL survey of middle school students, and the results showed that adolescents had low application ability and evaluation ability in obtaining online health information and services. Xie Yuchang et al.  found that high school students have a certain level of EHL and interactive HL through a study of EHL in high school students and that the two were positively correlated.
Although some international studies have examined the factors of EHL in adolescents, most of the focus is on recognition and college students. For example, Holch et al.  found that eHEALS was significantly positively correlated with general self-efficacy and that general self-efficacy was a significant predictor of eHEALS scores. Amina Tariq et al.  showed that perceived EHL was not associated with health behaviors such as physical activity and dietary supplement intake. Adile et al.  indicated that the mean digital HL scores were high in students who lived in a nuclear family, understood the importance of good health, had easy access to the internet, and had highly educated parents with high-income levels in Turkey. Tsukahara et al.  reported that the EHL of university students in Japan was comparable to that of the general Japanese population. Graduate students, as well as those in medical departments, had higher EHL. It appears from the above studies that EHL is related to socio-demographic and socio-economic variables.
Unfortunately, no specific studies have predicted EHL among Chinese primary and secondary school students. Therefore, identifying and predicting the EHL of primary and secondary school students is critical. This study aimed to identify the predictors of EHL in Chinese students using random forest and establish a corresponding prediction model to help policymakers and parents determine whether primary and secondary school students have EHL to enable them to implement more targeted interventions.
This study was designed as a cross-sectional study.
Study design and data collection
A total of 1300 students from seven primary and middle schools in Shaanxi Province, China, were surveyed from June to August 2021. In this study, cluster sampling was used to randomly select two primary schools, two middle schools, and three high schools in the main urban areas of Yulin City and Ankang City of Shaanxi Province. Four classes were randomly selected from each primary school, and four classes were randomly selected from each middle school and high school. The inclusion criteria were public schools, elementary students in grades 2–5, middle school students in grades 1–2, and high school students in grades 1–2. The exclusion criteria included private schools, first and sixth graders, junior middle school students, and senior high school students.
Two to four researchers were responsible for each study. To ensure the quality of the questionnaire, the students were guided by the researchers during the questionnaire-filling process. After explaining our study, informed consent was obtained from all participants or their legal guardians for those below 16 years old. Of the 1300 students interviewed, 65 were excluded from the analysis because of the large number of missing values in the questionnaire. We then randomly divided them into training and testing datasets at a ratio of 70:30, with 872 students assigned to the training database and 363 students assigned to the testing database.
All methods were performed in accordance with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD)  guideline and regulation.
Potential predictive variables
We conducted a systematic review of HL in Chinese students , identifying all published observational studies in both Chinese (CNKI, Wan Fang, CQVIP) and English databases (PubMed, Embase, Web of Science, Cochrane Library) between January 2010 and September 2020 on factors that affect HL in Chinese students. The significant influencing factors for Chinese students were sex, location of the household grade, good academic performance, race, health information concerns, online game time, parental education, whether they were a single child, family monthly income, health education, if they were majoring in medicine or attending medical school. Therefore, we identified the following potential predictive variables for this study: sex, age, race, grade, family size, only child, employment status, household location, mother’s education, father’s education, and gaming time. We did not consider health information concerns, majoring in medicine, and medical school attendance because the influence group of these variables is college students in the systematic review. Academic performance was not included in the analysis because China’s current policy regards student performance as very important and private and is therefore difficult to obtain in the data collection process. Most primary and middle school students do not know their family income. Therefore, we did not include family income in the analysis. In addition to the factors mentioned above, another study found that self-efficacy and parental phubbing behavior were closely related to HL [27, 28]. Therefore, these two variables were included. General self-efficacy was measured using the general self-efficacy scale (GSES) , and parental phubbing behavior was measured using the parental phubbing scale (PPS) .
We used the eHEALS prepared by Norman et al.  to evaluate the EHL of primary and secondary school students (with vs. without). Students who scored above 80% were judged to have EHL . We used 80% of the scoring nodes because we borrowed the Chinese HL classification method. There have also been other studies [15, 32] that have determined EHL using the 80% threshold. See Additional file 1 for more detailed information and the reliability and validity analysis of the scale.
Bivariate analysis was performed using the Mann–Whitney U test for continuous and ordinally distributed variables and the chi-squared test for categorical variables. For further analysis, a nomogram was formulated based on the machine learning results.
Random forest, a classical algorithm in machine learning, was selected for learning and prediction. The basis of random forest is a decision tree, which is a basic classification and regression method. The decision tree model takes the form of a tree. A classification problem represents the process of classifying instances based on their features. Random forest is an algorithm that combines the results of multiple decision trees for classification or regression. The number of decision trees constructed in this study was 500, and three variables were randomly selected for each node of the decision tree. Random forests select or exclude variables based on the importance of the features. Validated variables were used to create a simplified model rather than a complete model with all variables. Similar to other machine learning models, the random forest algorithm consists of training and testing steps. The computer first uses a training set to select the optimal model and then uses a test set to evaluate the model. The area under the curve (AUC) was used as an assessment tool, and AUC values between 0.6 and 0.8 were considered acceptable .
The least absolute shrinkage and selection operator (LASSO) is a regression analysis method used for simultaneous feature selection and regularization. This adds an L1 norm as a penalty in the calculation of the minimum residual sum of squares. When lambda is sufficiently large, certain coefficients can be accurately reduced to zero. LASSO has excellent feature selection ability. Therefore, we also conducted LASSO regression and compared the results with random forest.
The receiver operating characteristic (ROC) curve is drawn on a two-dimensional plane. It was drawn with sensitivity as the ordinate and specificity as the abscissa. Any point on the curve represents the corresponding sensitivity and specificity for the observed sample. The AUC refers to the size of a part of the area under the ROC curve, which is a standard used to measure the quality of a classification model and reflects the accuracy of the model. Typically, AUC values range from 0.5 to 1.0, with a larger AUC representing better model performance.
Decision curve analysis (DCA) reflects outcome variables and can be used to evaluate and compare different prediction models. The AUC only measures the accuracy of the prediction model and does not consider the actual utility of a particular model, whereas the DCA integrates the preferences of the object or decision-maker into the analysis.
To facilitate the application of the prediction model, we developed a web page based on a prediction model using Shinayapp. Statistical analysis was performed using R version 4.0.5 for Mac (R Foundation for Statistical Computing).
Characteristics of sample
We found that 415 (33.6%) students were e-health literate, and 820 (66.4%) were not. Table 1 summarizes the characteristics of the total population of Chinese students (N = 1235). The univariate analysis showed that age (P < 0.001), grade (P < 0.001), employment status (P < 0.001), household location (P < 0.001), parental phubbing behavior (P < 0.001), and general self-efficacy (P < 0.001) were significantly associated with EHL.
Table 2 summarizes the characteristics of the Chinese students in the training dataset (n = 872). Moreover, we found that 287 (32.9%) students were e-health literate, and 585 (67.1%) were not. The univariate analysis showed that age (P < 0.001), grade (P < 0.001), household location (P < 0.001), parental phubbing behavior (P < 0.001), and general self-efficacy (P < 0.001) were significantly associated with EHL.
Table 3 summarizes the characteristics of the Chinese students included in the testing dataset (n = 363). In the testing dataset, 128 (35.3%) students were e-health literate, and 235 (64.7%) were not. The univariate analysis showed that general self-efficacy (P < 0.001) was related to EHL.
Predictive variables selection
Thirteen variables measured at school (Tables 2 and 3) were included in the random forest. The process and results of feature selection by random forest are shown in Fig. 1, which identifies seven variables that were confirmed to be important: age, grade, employment status, father education level, game time, parental phubbing behavior, and general self-efficacy. Five variables were confirmed as unimportant: sex, mother’s education level, being an only child, number of people, and race. Furthermore, the variable of “household location” was excluded. To consolidate the results of the random forest feature selection, we performed a LASSO regression, as shown in Fig. 2. As expected, there were only two variables (grade and general self-efficacy) left in the LASSO regression model, far fewer than in the random forest model, due to the strong shrinkage capability of LASSO regression. These variables overlapped exactly with the variables identified in the random forest.
Validation of CSEHL
We then developed a prediction model using the seven identified key factors selected by the random forest. In the internal verification of the training dataset, the ROC showed that the model had high recognition ability, with an AUC of 0.975 (Fig. 3). The validation cohort included 363 students with a mean (SD) age of 13.5 (3.9) years, 188 (51.8%) males, and 128 (35.3%) students with EHL. In the independent validation database, the model showed satisfactory discrimination with an AUC of 0.738 (Fig. 3).
Construction of the predictive score and web-based calculator
The EHL prediction score was constructed based on random forest. We used the model to build nomograms (Fig. 4). To further facilitate the use of our findings by policymakers and parents, this study presents nomograms in the form of a web page; that is, a web calculator was generated that can automatically calculate the probability of students having EHL according to seven key variables (https://xietao.shinyapps.io/DynNomapp/).
Decision curve analysis
In Fig. 5, the three lines in the training and test parts represent the different conditions. Smoothed net benefit Pr (EH) represents the prediction model used in this study. The other two lines represent two extremes: net benefit: Treat none represents a situation where none of the samples have EHL, and the net benefit is zero. Net benefit: Treat indicates that all samples have EHL, and the net benefit is a negative slope of the backslash line. As shown in Fig. 5, the model in this study is higher than the extreme curve in a large threshold range. Therefore, the model in this study has a relatively large optional threshold range and is relatively safe. For example, in the training data set, assuming that we choose a prediction probability of 30%, 15 out of 100 students who use the model will benefit from it without affecting anyone else.
Quality of access to health information is closely related to the quality of people’s lives. Knowing and processing health information and using it can help people maintain and promote their health. The internet is the main way to obtain health information . An individual’s EHL will determine whether they can accurately obtain health information to promote their health. In this study, we developed and validated an EHL score map and a web-based web calculator to predict EHL among Chinese primary school students. In the training and validation datasets, the AUC values of the model were 0.975 and 0.738, respectively, which were satisfactory. Policymakers and parents can use our web-based calculator to estimate the probability of a student having EHL.
Mai et al.  pointed out that there were statistically significant differences in EHL scores among students of different sexes, places of household, and whether they were the only child. Multiple linear regression analysis found that the educational level of the father of a child was the main influencing factor of EHL. Zhong et al.  found that sex, grade, and time spent online were the main influencing factors of EHL in junior middle school students. We narrowed it down to seven key factors: age, grade, employment status, father’s education level, gaming time, parental phubbing behavior, and general self-efficacy. These factors are consistent with the results of previous studies.
Among the seven variables used to calculate the probability of EHL, age, grade, employment status, father’s education level, and game time can be obtained from the basic information. Basic efficacy and phubbing behavior can be measured using publicly available and easily available scales. Web-based calculators are easy to use, and schools and parents can take appropriate measures if it is identified that the probability of students having EHL is low. We have not graded the predicted probability so that parents of students in different regions can make decisions based on their family situation, and government workers can make decisions based on the development level of the region. For example, policymakers can intervene to help students whose predicted probability is below 80% in more developed provinces. However, in provinces with a general level of development, the prediction probability could be reduced to 60%. Different regions can explore the specific division of the prediction probability value themselves.
There are several limitations to this study. First, the sample size for constructing the probability score was moderate. Second, the sample size for verification was relatively small. Third, the sample size was concentrated in Shaanxi Province, China. These limitations may limit the applicability of the model to other regions of China. Data from other provinces in China must be collected to further verify the model. In addition, as mentioned above, because of the constraints of realistic conditions, this study did not include the variables of students’ academic performance and family income in the model, which needs to be overcome in future research.
Availability of data and materials
All data generated or analyzed during this study are included in this published article.
Sørensen K, Broucke S, Fullam J, Doyle G, Pelikan J, Slonska Z, Brand H. Health literacy and public health: A systematic review and integration of definitions and models. BMC Public Health. 2012;12(1):1–13.
Norman CD, Skinner HA. eHEALS: The eHealth Literacy Scale. J Med Internet Res. 2006;8(4):e27.
Riahi A, Mousavi CA. Survey of E-Health Literacy among Employees of State-Owned Banks in Tehran During 2020. J Health Literacy. 2020;5(3):53–63.
Norman CD, Skinner HA. eHealth Literacy: Essential Skills for Consumer Health in a Networked World. J Med Internet Res. 2006;8(2):e9.
NaseriBooriAbadi T, Sadoughi F, Sheikhtaheri A. The Status of Electronic Health Literacy in people with Hearing Impairment: Content Analysis Approach. J Health Literacy Autumn. 2021;6(3):9–23.
Norman C. eHealth Literacy 2.0: Problems and Opportunities With an Evolving Concept. J Med Internet Res. 2011;13(4):e125.
ZHONG Miao, CAI Ying-ying. Analysis on the status quo of electronic health literacy and health information utilization of junior middle school students and their related effects. Health Educ Health Promotion. 2016;11(6):426-9+43.
Eysenbach G. The Effect of Individual Factors on Health Behaviors Among College Students: The Mediating Effects of eHealth Literacy. J Med Internet Res. 2014;16(12):e28.
Linan CH, Wenxiang CUI. Current status of adolescents’ electronic health literacy in Jilin province. Chin J School Health. 2016;37(04):526–8.
Rong-wei SONG, Rusul PARHATIJIANG, Hua FU, Fan WANG, Sha TAO. Investigation on e-health literacy of middle school students in Shanghai from Xinjiang Uygur Autonomous Region. Chin J Health Educ. 2018;34(01):33–7.
PAN Cheng-hao, ZHU Le⁃wei, FENG Kai-ying, WANG Hao, FAN Xiao-yan, LI Yan2, GU Jing., Status and influencing factors of electronic health literacy among college students in Guangdong during the COVID-19 epidemic. South China Journal of Preventive Medicine 2021, 47, (07), 852-856.
Jian-rong MAI, Ling ZHOU, Lina LIN. A cross-sectional study of electronic health literacy among college students in Guangzhou. Health Vocational Educ. 2021;39(02):56–7.
Guang-hui CUI, Shao-Jie LI, Yong-Tian YIN. ZHANG PING, Research on eHealth literacy of medical students and its influencing factors. Modern Prevent Med. 2020;47(06):1148–52.
Shao-jie LI, Yong-tian YIN, Li CHEN. ZHANG Ping, CUI Guang-hui, Analysis of electronic health literacy level and influencing factors of college students in Jinan city. Chin J School Health. 2019;40(07):1071–4.
Shu-xian MENG, Chong SHEN. Current situation of eHealth literacy and health behaviors of college students in Nanjing. Chin J Health Educ. 2018;34(03):254–7.
Qiu-yu PAN. Factors influencing the health literacy of college students in Nanchong. Chin J Med Manage Sci. 2018;8(01):61–6.
Guang-hui CUI, Yong-tian YIN, Ming-zhou WANG, Ke-xin YANG, Jia-qin LI. The relationship between electronic health literacy and healthy lifestyle of medical students. Chin J School Health. 2020;41(06):936–8.
Jian-chao LIU, Yong-tian YIN, Ying-ying FAN. The relationship between electronic health literacy and disease behavior of college students in Jinan Higher Vocational Colleges. Chin J School Health. 2020;41(10):1502-1505+1510.
Xin LI, Xu-hui LI. Investigation and Countermeasures on Current Situation of College Students’ Online Health Information Search Behavior Based on eHealth Literacy. Library Theory Pract. 2017;04:44–50.
XIE Yu-chang, ZHANG Hua. Analysis on the current situation of electronic health literacy and interactive health literacy of senior High school students. Reference for Middle School Teaching. 2017;2017(33)60–62.
Holch P, Marwood J. EHealth Literacy in UK Teenagers and Young Adults: Exploration of Predictors and Factor Structure of the eHealth Literacy Scale (eHEALS). JMIR Form Res. 2020;4(9):e14450.
Tariq A, Khan S, Basharat A. Internet Use, eHealth Literacy, and Dietary Supplement Use Among Young Adults in Pakistan: Cross-Sectional Study. J Med Internet Res. 2020;22(6):e17014.
EAdile Tümer, Adem Sümen., E-health literacy levels of high school students in Turkey: results of a cross-sectional study. Health Promotion International 2021, daab174, https://doi.org/10.1093/heapro/daab174.
Tsukahara S, Yamaguchi S, Igarashi F, Uruma R, Ikuina N, Iwakura K, Koizumi K, Sato Y. Association of eHealth Literacy With Lifestyle Behaviors in University Students: Questionnaire-Based Cross-Sectional Study. J Med Internet Res. 2020;22(6):e18155.
Moons K, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–73.
MAO Ying, XIE Tao, ZHANG Ning. Chinese Students' Health Literacy Level and Its Associated Factors: A Meta-Analysis. Int J Environ Res Public Health. 2020;18(1):204.
Qian ZHANG, Liang ZHU, Li-wei JING, Fenglan WANG, Xiaoli ZHANG, Fengmei XING. Effects of self efficacy and coronary heart disease knowledge on health literacy of young and middleaged patients with coronary heart disease. Chin J Behav Med Brain Sci. 2018;27(03):252–5.
Xiao-shuang ZHAO, Chun-yu LI, Cai-fu LI. A path analysis of the impact of health literacy and self-efficacy on health status in community-dwelling patients with diabetes. Chin J Nurs. 2013;48(01):63–5.
WANG Cai-kang, HU Zhong-feng, LIU Yong. Evidences for Reliability and Validity of the Chinese Version of General SelfEfficacy Scale. Chinese Journal of Applied Psychology 2001;2001(01)37–40.
Qian DING, Zhao-qi WANG, Yong-xin ZHANG. Revision of the Chinese Version of Parents Phubbing Scale in Adolescents. Chin J Clin Psychol. 2020;28(05):942-945+896.
Xue-qiong NIE, Ying-hua LI, Li LI. Statistic analysis of 2012 Chinese residents health literacy monitoring. Chin J Health Educ. 2014;30(02):178–81.
Ye ZHAO, Hui CHEN, Cong ZOU, Hui-ling GONG, Yu-jing WU. Correlation of e-Health literacy and health information seeking behavior among adult internet users. Chin J Health Educ. 2018;34(09):812–6.
Nomura K, Kido M, Tanabe A, Nagashima K, Takenoshita S, Ando K. Investigation of optimal weight gain during pregnancy for Japanese Women. Sci Rep. 2017;7(1):2569.
Jian-rong MAI, Ling ZHOU, Li-na LING. A cross-sectional study of electronic health literacy among college students in Guangzhou. Health Vocational Educ. 2021;39(02):56–7.
This study was supported by the Major Project of the National Social Science Fund of China: Research on big health putting prevention first and construction of healthy China (grant number 17ZDA079). The funding institution had no role in the design, data collection, analysis, interpretation, or writing of this manuscript.
Ethics approval and consent to participate
All methods were performed in accordance with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines and regulations. The use of the questionnaire survey, records, and study results were approved by the Biomedical Ethics Committee of the School of Medicine, Xi’an Jiaotong University (No. 2021–1525). After detailing our study, informed consent was obtained from all participants or their legal guardians for those below 16 years old.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Xie, T., Zhang, N., Mao, Y. et al. How to predict the electronic health literacy of Chinese primary and secondary school students?: establishment of a model and web nomograms. BMC Public Health 22, 1048 (2022). https://doi.org/10.1186/s12889-022-13421-4
- Electronic health literacy
- Chinese students
- Random forest
- Web nomograms