Assessment of the Chinese Resident Health Literacy Scale in a population-based sample in South China

Background A national health literacy scale was developed in China in 2012, though no studies have validated it. In this investigation, we assessed the reliability, construct validity, and measurement invariance of that scale. Methods A population-based sample of 3731 participants in Hunan Province was used to validate the Chinese Resident Health Literacy Scale based on item response theory and classical test theory (including split-half coefficient, Cronbach’s alpha, and confirmatory factor analysis). Measurement invariance was examined by differential item functioning. Results The overall Cronbach’s alpha of the scale was 0.95 and Spearman-Brown coefficient 0.94. Confirmatory factor analysis showed that the test measured a unidimensional construct with three highly correlated factors. Highest discrimination was found among participants with limited to moderate health literacy. In all, 64 items were selected from the original scale based on factor loading, Pearson’s correlation coefficient, and discrimination and difficulty parameters in item response theory. Measurement invariance was significant but slight. According to the two-level linear model, health literacy was associated with education level, occupation, and income. Conclusions The 2012 national health literacy scale was validated, and 64 items were selected based on classical test theory and item response theory. The revised version of the scale has strong psychometric properties with minor measurement invariance.


Background
The concept of health literacy was introduced in China in 2005 by the Chinese government through a manual entitled "Basic Knowledge and Skills of People's Health Literacy" [1,2]. That manual used the definition of health literacy of the World Health Organization: the cognitive and social skills which determine the motivation and ability of individuals to gain access to, understand and use information in ways which promote and maintain good health [3]. Under that definition, health literacy goes beyond the narrow concept of health education and individual behavior-oriented communication: it addresses the environmental, political, and social factors that determine health. The US Institute of Medicine defines health literacy in a similar way: health literacy is a set of skills that enable people to participate more fully in society instead of simply functional capabilities [4]. The ability to read and write is the foundation for health literacy, upon which a range of complementary skills can be built [5].
Based on the existing situation in China, health literacy was defined by its government as a set of capabilities in three domains in 2008: conceptual knowledge and attitudes; behavior and lifestyle; and health-related skills. The first nationwide survey on health literacy in China was conducted in 2008, and it focused on health knowledge [6]. The second national survey was conducted in 2012, with an emphasis on basic reading ability, arithmetic, and understanding medical information [7].
Internationally, the most commonly used measure of health literacy is the Rapid Estimate of Adult Literacy in Medicine and its shortened version; these assess an adult patient's ability to read common medical terms and lay expressions for body parts and illnesses [8][9][10]. The Test of Functional Health Literacy in Adults and its shortened version are timed tests of reading comprehension of medical information [11,12]. Other measures of health literacy in clinical settings include the following: the Medical Achievement Reading Test; the Newest Vital Sign; the Set of Brief Screening Questions; Functional, Communicative and Critical Health Literacy; the eHealth Literacy Scale; the Cancer Health Literacy Test; and the Diabetes Numeracy Test [13][14][15][16][17][18][19][20]. These measures focus on a single dimension of health literacy, rather than identifying its multidimensional nature [21].
By contrast, some measures have expanded the scope of medical care-related literacy; they include the following: the Health Activity Literacy Scale; the Demographic Assessment of Health Literacy; the 2003 National Assessment of Adult Literacy; the Adult Literacy and Life Skills Survey; and the Health Literacy Assessment Using Talking Touchscreen Technology [22][23][24][25][26][27]. These scales and questionnaires are more comprehensive because they involve different health-related competencies. However, they are considered proxy measures owing to the lack of an explicit definition of the concept of health literacy [28].
In China, health literacy in clinical settings has been measured using translated versions of scales used overseas as well as original Chinese scales among certain populations [29][30][31][32][33][34][35][36][37][38]. For example, health literacy among older adults has been measured using the Chinese version of the Rapid Estimate of Adult Literacy in Medicine [33]; the translated version of the Test of Functional Health Literacy in Adults was employed to measure health literacy among adolescents aged 12-16 years in Nanning; the eHealth Literacy Scale was translated and used on a sample of senior high school students [34]; the Chinese version of the Diabetes Numeracy Test was used in a cluster-randomized trial in patients with diabetes [35]; and a three-question measure of health literacy derived from a systematic review was applied among cataract patients [36,39]. However, all these studies investigated cross-sectional health literacy without evaluating the instruments employed [40][41][42]. In the present study, we assessed the reliability and construct validity of the Chinese Resident Health Literacy Scale based on item response theory (IRT) and classical test theory using a population-based sample from Hunan Province in 2012. We also examined the association between health literacy and sociodemographic factors.

Participants
The participants were residents aged 15 to 69 years who had lived in the sampled regions for more than 6 of the previous 12 months. Such individuals as patients, students, military personnel, and prisoners resident in hospitals, school dormitories, nursing homes, military bases, and prisons were excluded from the survey.
We used a population-based stratified sampling frame, as shown in Fig. 1. The sampling strata included 13 cities or counties in Hunan Province, three streets or towns in each city or county, and two communities or villages (where the number of households exceeded 750) in each street or town. If there were fewer than 750 households within a community or village, the neighboring units were combined until that total was met. In each household, information regarding all family and non-relative members (e.g., hired nannies) aged 15-69 years who had been living there for more than 6 of the previous 12 months was recorded including gender (male or female) and age (elder to younger). One member in each house was selected for the survey by means of a Kish grid [43]. Unselected members were not allowed to complete the survey as a substitution.
The research protocol was reviewed and approved by the Medical Ethic Committee of the National Health and Family Planning Commission of China. All participants who agreed to participate in the study signed an informed consent form at the beginning of the survey.

Study design
The Chinese Resident Health Literacy Scale was developed based on a manual published by the Chinese Ministry of Health in 2008-"Basic Knowledge and Skills of People's Health Literacy" (trial edition) [1]. The scale was designed by experts in public health, health education and promotion, and clinical medicine using the Delphi method. Details of the development procedure have been described in a previous paper [44]. The scale contains 80 items and three dimensions: (1) knowledge and attitudes; (2) behavior and lifestyle; and (3) health-related skills. The questions cover six aspects: scientific views of health; infectious diseases; chronic diseases; safety and first aid; medical care; and health information. As indicated in Table 1, there are four types of questions in the scale: true-or-false; single-answer (only one correct answer in multiple-choice questions); multiple-answer (more than one correct answer in multiple-choice questions); and situation questions. With multiple-answer questions, a correct response had to contain all the correct answers and no wrong ones. Situation questions were given following a paragraph of instruction or medical information.
Before the field study, a survey team was established in each of the 13 cities or counties; the team comprised a principal, a coordinator, four to six investigators, a quality controller, and a data manager. All these team members received training for the sampling method, research tools, and quality control. A simulated survey was conducted during the training, and the investigators' eligibility was assessed before performing the field survey.
Written informed consent was obtained from all participants before the survey. The scale was selfadministered. However, if a participant was unable to complete the scale owing to impaired vision or other such reasons, an interview was used as an alternative. In that situation, the investigators would complete the questions in a neutral fashion on behalf of the participants.

Statistical analyses
Because repeated measures were not used, test-retest reliability was not determined. The split-half coefficient and Cronbach's alpha were estimated before and after the item-selection procedure.
IRT was used to evaluate the precision of the measurements. IRT is a family of associated mathematical models that relate latent traits (ability) to the probability of responses to items in an assessment, and it has been widely used in psychometrics and health assessment [45]. It specifies a nonlinear relationship between binary, ordinal, or categorical responses and the latent trait (health literacy in this case). Compared with classical test theory approaches, the advantages of IRT include the following: near-equal interval measurement; representation of respondents and items on the same scale; and independence of person estimates from the particular set of items used for estimation [46].
We applied a two-parameter logistic IRT model for dichotomous responses. The two-parameter logistic model includes a difficulty parameter and discrimination parameter for each item. The difficulty parameter is the point on the ability scale that corresponds to a probability of a correct response of 0.5; the discrimination parameter estimates how well an item can differentiate among respondents with different levels of ability. Because the "I don't know" choice was included for all questions, guessing parameters were not considered. Items with a discrimination parameter of 0.5 to 2.0 and a difficulty parameter corresponding to a certain region of the ability scale (−3.0 to 3.0) provide the most information [45,47]. Parameters were estimated using a marginal maximum-likelihood method. The IRT model was recalibrated after the item-selection procedure. Measurement invariance of the scale among the different subgroups (by gender and race) was estimated using differential item functioning in the IRT model. Pearson's correlation coefficient was determined. An eligible items had to be significantly and at least moderately (0.4 to 0.7) correlated to the total score of its domain; hence, the correlation coefficient between them had to be above 0.4 [48]. The construct validity was assessed by confirmatory factor analysis (CFA). An assumed structure of the scale (three dimensions) was tested using a structural equation model. Since the items were binary measures, the unweighted least-squares method was employed for parameter estimation in the structural equation model. The chi-square value, goodness-of-fit index, root of the mean square residual, and parsimony goodness-of-fit index were used to assess the model fit. Several studies have recommended that the factor loading should be above 0.4 [49][50][51].
Items that met two or more of the following criteria were removed: (1) discrimination parameter <0.5 or >2.0; (2) difficulty parameter < −3.0 or >3.0; (3) factor loading <0.4; and (4) Pearson's correlation coefficient <0.4. In addition, items with strong discrimination (≥1.0) were selected to form a short version of the scale. The demographic variables were described, and raw scores among the different subgroups were compared using analysis of variance. After item selection, the association between health literacy scores and demographic variables was tested by means of a multilevel linear model.
The IRT calibrations were conducted using PARSCALE 4.1 (Scientific Software International Inc., Lincolnwood, USA). CFA was performed in AMOS 17.0 (Arbuckle JL and SPSS Inc., Chicago, USA). Multilevel model estimation was carried out with MLwiN 2.1 (Rasbash J, Charlton C, Browne WJ, Healy M, and Cameron B, Centre for Multilevel Modelling, University of Bristol, UK). Other analyses were conducted using SAS 9.2 (SAS Institute Inc., Cary, USA). The significance level was 0.05 for all statistical tests.

Results
In all, 3900 participants were sampled, and 3731 (95.7 %) completed the survey without apparent logical errors or missing items. As indicated in Table 2, there were significant differences in the health literacy scores among the subgroups of age, education level, occupation, annual per capita income, and residence (P <0.05),  Health-related skills Health information but not among the subgroups of gender and race. The proportion of correct responses to the 80 items varied from 10.8 to 96.7 % ( Table 2). The Spearman-Brown split-half coefficient was 0.94. The overall Cronbach's alpha was 0.95; Cronbach's alpha of the three dimensions was as follows: 0.90 (knowledge and attitude, 38 items); 0.83 (behavior and lifestyle, 22 items); and 0.85 (skills, 20 items). The two-parameter logistic model fitted the data well (P >0.05). The difficulty and discrimination parameters from the IRT model appear in Table 3. Most items exhibited good discriminative power and moderate difficulty. As shown in Fig. 2, the test information reached a peak when the participants' ability was between −1 and 0, which indicates that the measurement was most discriminative among participants with limited to medium-level abilities in health literacy.
With the CFA results, the three-factor model showed slightly better fit than the one-factor model. Correlations among the three factors (knowledge and attitudes; behavior and lifestyle; skills) were 0.96-0.98, which indicates good evidence for unidimensionality, i.e., the dominant dimension of health literacy. Factor loading and the correlation coefficient between items and dimensional scores are presented in Table 3.
In all, 16 items were removed from the scale according to the criteria of item selection; 10 of them were trueor-false questions, which showed poor discriminative power and small factor loading. Sixty-four items were selected according to classical and modern test theory standards. The Spearman-Brown coefficient was 0.94. The overall Cronbach's alpha was 0.95; Cronbach's alpha of the three dimensions was as follows: 0.90 (knowledge and attitude, 30 items); 0.83 (behavior and lifestyle, 16 items); and 0.86 (skills, 18 items). Goodness-of-fit of the CFA and the IRT models improved slightly compared with the original scale. Factor loading, difficulty parameters, and discrimination parameters of all the items met the criteria.
A shorter version of the scale, comprising 19 items with discrimination parameters ≥1.0, was also created. The shorter version consisted of eight items in the knowledge and attitude dimension, five items in the behavior and lifestyle dimension, and six items in the health-related skill dimension. The overall Cronbach's alpha was 0.88; Cronbach's alpha of the three dimensions was 0.76, 0.64, and 0.77 respectively. The split-half coefficient was 0.87. The correlation coefficients and factor loadings of all the items were above 0.4 (mostly >0.5), and the discrimination parameters of all the items were 0.5-2.0 (mostly 1.0-2.0). Differential item functioning in the IRT model was used to examine measurement invariance. The chisquare tests showed significant measurement invariance  in both gender and race (P <0.05); however, the slope and threshold parameters were very close between male and female as well as between urban and rural groups.
The association between health literacy (revised scale) and demographic variables was explored using a twolevel model because intracluster correlation was identified at the level of cities. As indicated in Table 4, education level, occupation, and income were associated with health literacy. Participants with higher socioeconomic status (higher education level and greater income) were more likely to have adequate health literacy. The intracluster correlation coefficient at the city level was 34.5 %.

Discussion
To validate the scale used in the 2012 National Health Literacy Survey, we performed this study using a population-based sample in Hunan Province. Classical test theory (Cronbach's alpha, split-half coefficient, and factor analysis) and modern test theory (IRT) were used in validating the scale. We found that the 2012 scale of health literacy meets psychometric standards. The overall Cronbach's alpha was 0.95. The assumption that the scale measures a unidimensional construct was supported by the three-factor model fit being approximately that of the one-factor model fit and the three factors (knowledge and attitudes; behavior and lifestyle; skills) being highly correlated. Among the 80 items tested, 16 performed poorly and were removed. The remaining 64 items yielded a reliable estimate of health literacy, especially among participants with moderate and limited health literacy. The short version of the scale, which comprises 19 items with discrimination parameters ≥1.0, Fig. 2 Test information and participant ability. Ability signifies health literacy estimated using the maximum-likelihood method. Ability in the item response theory (IRT) model practically (though not exclusively) ranged from −3 to +3. The test information reached a peak when the ability was between −1 and 0; this indicates that the measurement exhibited highest discriminative power among participants with limited and under-average ability with respect to health literacy did not meet the standards for individual measurement (reliability ≥0.9). Nevertheless, the short version may still be effective for group comparisons [52].
In IRT, an item is useful only when it has good discrimination and its difficulty corresponds to a certain range in the ability scale: questions that are too hard or too easy provide little information [53]. However, if the discrimination is too high (i.e., greater than 2.5, as seen in clinical and psychological studies), the measured construct is often conceptually narrow. We limited the discrimination parameters to 0.5-2.0 because health literacy is a relatively broad concept. In this study, we identified items with inappropriate discrimination and difficulty. Most of them also had low factor loadings and correlation to the dimension score. However, the test used in the present study is time consuming. It usually took 30 min for an adult to complete the test; it took even longer for participants with limited literacy. Thus, in the future, it will be necessary to develop computerized adaptive testing and provide participants with short, tailored tests that have scores comparable to those of fixed-length tests.
Differential item functioning showed significant measurement invariance within both gender and race; however, the slope and threshold parameters were extremely close between the male and female as well as between the urban and rural groups. We observed no large differences between gender and race groups. The sample size in our study was sufficiently large to detect such slight differences. Thus, our results suggest that the Chinese Resident Health Literacy Scale may be efficiently applied for Chinese subjects of different genders and races for comparable scores.
The demographic factors associated with health literacy included education level, occupation, and annual income. Participants with higher education and better economic status were more likely to have adequate health literacy. Gender, age, race, and type of residence were found to be insignificant in the regression. The multilevel model identified an obvious intracluster correlation at the city level (primary unit in the sampling frame), with an intracluster correlation coefficient of 34.5 %. Health literacy is the outcome of health promotion, and both health literacy and socioeconomic factors are determinants of health. However, the potential of health education as a tool for promoting the social determinants of health has been neglected [54]. Health education should not focus only on changing personal lifestyles and improving compliance with disease management, but also on raising awareness of the social determinants of health [5].
Some limitations of this study deserve mention. First, we did not assess the content validity since the scale was initially developed by an expert panel from the Ministry of Health. Second, we did not perform repeated measures during the field study. Thus, the test-retest reliability was not determined. Third, as noted above, the test is time consuming: it usually took 30 min for an adult to complete and even longer for participants with limited literacy or other conditions. Despite these limitations, this study has a number of implications. First, the original scale was found to be appropriate in terms of reliability and validity. We removed 16 items according to factor analysis and IRT, and the scores of the 64-item scale correlated highly with the scores of the original scale. Accordingly, the main conclusions of the 2012 National Health Literacy Survey were unaffected by validation of the scale it employed. Second, a shortened 19-item version was created because applying the original scale was very time consuming. The 19-item version was found to be slightly inferior to the original scale in terms of reliability (Cronbach's alpha decreased from 0.95 to 0.88); however, it would still be effective for group comparisons and population studies. Third, the instruments used in the National Health Literacy Survey in 2008 and 2012 were different. Therefore, a direct comparison based on raw scores would be inappropriate. In the present study, IRT provided an opportunity for longitudinal comparison.