- Research article
- Open Access
- Open Peer Review
Assessment of psychometric properties of the Korean SF-12 v2 in the general population
BMC Public Healthvolume 14, Article number: 1086 (2014)
The psychometric properties of the Korean Short Form-12 Health Survey, version 2 (SF-12 v2) have not been assessed in the general population. Therefore, the aim of our study was to evaluate the psychometric properties of the Korean version of the SF-12 v2 in the general population and to provide SF-12 v2 domain scores according to the general characteristics of the study population.
A total of 1,000 participants from the general Korean population were recruited using a multistage quota sampling method. Psychometric properties were evaluated by descriptive statistics, validity, reliability, and exploratory factor analysis.
Item convergent and discriminant validity met the criteria established by the instrument developer. In the known-group comparison, male gender, age <60 years, high educational status, and absence of any comorbidity were significantly associated with high scale scores. The reliability of all SF-12 v2 items was 0.88.
The findings of this study generally support the idea that the Korean SF-12 v2 is a feasible, valid, and reliable instrument for assessing health-related quality of life in the general population. The SF-12 v2 seems to be a viable alternative health-related quality of life instrument for the Korean population.
Interest in health-related quality of life (HRQoL) issues has increased in recent decades, and the number of citations for “quality of life” in the medical literature has increased significantly. HRQoL instruments are essential for evaluating HRQoL as an outcome measure of community- or hospital-based interventions . The Short Form-36 Health Survey, version 2 (SF-36 v2) is one of the most popular generic worldwide instruments for evaluating HRQoL. The SF-12 v2 is a shorter version of the SF-36 v2 that uses only 12 questions. Because the SF-12 v2 is brief and measures various aspects of health status, it has become the instrument of choice in population health surveys and in clinical studies that combine it with disease-specific instruments [2, 3]. Several studies have reported the validity and reliability of the SF-12 as a measure of HRQoL in a range of medical conditions, as well as in the general population [4–8]. Although the psychometric properties of the Korean SF-36 v2 have been evaluated in the general population [9, 10], a similar evaluation of the psychometric properties of the Korean SF-12 v2 is yet to be performed.
In addition, there is some evidence suggesting cultural differences in the item interpretation of HRQoL instruments [11, 12]. Therefore, assessing the feasibility and understanding the psychometric properties of the instruments should precede their application in research when instruments developed in other countries are adapted to the Korean population. Therefore, the aim of our current study was to evaluate the psychometric properties of the Korean version of the SF-12 v2 in the general population and to provide SF-12 v2 domain scores according to the general characteristics of the study population.
This study was conducted using individual face-to-face interviews. The survey was performed from August 2013 to November 2013 by 27 trained interviewers. Respondents were asked to complete the Korean version of the SF-12 v2 for HRQoL. Data on demographic factors (i.e. age, sex, level of education, and occupation) and health-related factors (i.e. current disease, outpatient visits in the past 2 weeks, and hospitalization in the past year) were also collected.
Setting and samples
Out of the 3,206 households that were contacted for interviews, 1,000 successful interviews were conducted (31.2%). The target population included individuals aged 19 years or older living in Korea (except for Jeju Island) who consented to participate in the survey. Sampling was performed using a multistage stratified quota method. Sample quota were assigned to each of the 15 Korean regions according to the population structure (gender, 10-year age group, and level of education [12 years or less vs. more than 12 years]), as defined by the resident registration data of the Ministry of Administration and Security of South Korea in June 2013.
This study was approved by the Institutional Review Board of the National Evidence-based healthcare Collaborating Agency (approval number: NECA IRB13-002), and all of the participants provided written informed consent.
Our present study used the Korean SF-12 v2. The SF-12 v2 is a multipurpose, short form, health survey that includes 12 items taken directly from the SF-36 v2. The SF-12 v2 yields eight scale scores (physical functioning [PF], role-physical [RP], bodily pain [BP], general health [GH], vitality [VT], social functioning [SF], role-emotional [RE], and mental health [MH]). Four scale scores (PF, RP, RE, and MH) are calculated using two items each, whereas the remaining scales (BP, GH, VT, and SF) are represented by a single item . Several worded items were recoded so that higher scores indicate a better condition. Scale scores were transformed into the 0 to 100 range according to the scoring manual . The 12 items are used to derive two summary measures (i.e. physical component summary [PCS] and mental component summary [MCS]) .
The SF-12 v2 was assessed according to the data quality indicator recommended by its developer . The assessment included completeness of the data, based on the percentage of the total number of items with a valid item response, as well as on the percentage of responses outside the range. In addition, convergent validity was tested to determine whether items were expected to represent the PCS or the MCS. When all of the hypothesized item-component correlations were 0.30 or greater, convergent validity was considered to be acceptable. It was hypothesized that the PCS is related to the PF, RP, GH, and BP items, and the MCS is related to the MH, RE, VT, and SF items. Finally, discriminant validity was assessed to determine whether an item more highly correlates with its hypothesized component summary measure score than with the alternative component summary measure score. When all of the hypothesized item-component correlations were significantly higher than the alternative item-component correlations, item discriminant validity was considered to be satisfactory. In addition, the percentages of respondents who achieved either the highest score (ceiling) or the lowest score (floor) were calculated because large ceiling and floor effects may limit the responsiveness of the SF-12 v2 [9, 13].
To assess construct validity, SF-12 v2 scale scores were calculated in terms of sociodemographic and health-related factors. It was expected that the SF-12 v2 scale scores would be lower in women, older persons, poorly educated persons, the unemployed, those suffering from any disease, and recent health service users [11, 16–19]. Comparison of differences in scale scores between groups was performed using the student’s t-test or analysis of variance with post hoc Tukey’s test.
The summary measure, internal reliability, was analyzed with Cronbach’s alpha. When Cronbach’s alpha was ≥0.7, the reliability was considered to be acceptable . To test whether the Korean SF-12 v2 produced the hypothesized structure of the original survey, exploratory item level factor analysis was performed using principal component analysis with varimax rotation. Factor loadings ≥0.4 were considered to be significant . All statistical analyses were conducted using SAS (version 9.1; SAS Institute Inc., Cary, NC).
The mean age of the participants was 45.0 years (standard deviation [SD], 14.3) and 50.1% of the participants were women. A total of 126 participants (12.6%) reported a current disease, and most of the participants were employed or self-employed (Table 1). The completeness of the data was 100%, and there were no out-of-range values. SF-12 v2 item descriptive statistics are presented in Table 2. The ceiling effect was considerably higher for the PF, RP, BP, SF, and RE items, whereas only 23 participants (2.3%) responded in the upper end of the scale for all items. The floor effect was <2% for the majority of items.
The Spearman correlation coefficients for the SF-12 v2 items and their component summaries are shown in Table 3. All of the items were correlated with their hypothesized measures by ≥0.30. Each item and its hypothesized component demonstrated a correlation between 0.59–0.78. In terms of discriminant validity, all of the items were more highly correlated with their hypothesized components than with the alternative components.
The scale scores of the Korean SF-12 v2 according to the sociodemographic and health-related variables are shown in Table 4. Significant differences were observed in SF-12 v2 scale scores. As expected, the scale scores of women were significantly lower than those of men in all scales except for the SF and RE scales. The oldest age group (≥70 years) demonstrated a significantly lower value than the other age groups on most of the scales except for the MH scale when the post hoc Tukey’s comparison was applied. Highly educated people tended to report higher values than poorly educated people on all scales. People suffering from disease and those who recently used the hospital service demonstrated significantly lower scores than the other participants on most of the scales. Scale scores according to gender and age group are presented in Table 5.
Internal consistency reliabilities were 0.84, 0.83, and 0.85 in the PF, RP, and RE domains, respectively, whereas the reliability was 0.37 in the MH domain. The reliability of all SF-12 v2 items was 0.88. Cronbach’s alpha for the PF, RP, GH, and BP items was 0.83, and that for the MH, RE, VT, and SF items was 0.79. Item factor analysis demonstrated the presence of three factors that accounted for 65.1% of the variance. The results are presented in Table 6. The PF, BP, and GH items loaded onto the physical health concept (factor 1) and the VT, MH, and GH items separately loaded onto the psychological health concept (factor 3). The SF, RP, and RE items loaded onto factor 2.
Quality of life is a critical component of healthcare. Many HRQoL outcome measures have been used in clinical and health economics research. Prior to the application of HRQoL instruments, evidence on the psychometric properties of each instrument should be considered. Our study assessed the data quality and psychometric properties of the Korean version of the SF-12 v2 in a general population sample. The rate of missing data was zero, and the quality criteria recommended by the developer of the instrument were satisfied in our study. All of the correlations between the items and their hypothesized components were >0.3, and all of the items were more highly correlated with their own hypothesized components than with other competing components. Generally, the item scores in our sample were higher than those in other countries. Korean people seem to evaluate themselves as healthy compared to people from other countries. Differences in the SF-12 v2 scale scores in terms of sex, age, educational level, health status, and use of health services showed evidence of construct validity.
Psychometric properties of the SF-12 have been demonstrated in the general population of various countries, including USA [4, 22], Israel , Sweden , Greek , Hong Kong , and so on. Psychometric properties of the SF-12 v2 in the Americans and Chinese adolescents have been presented [6, 23]. In terms of convergent and discriminant validity, all of the hypothesized item-component correlations were 0.30 or greater, and hypothesized item-component correlations were significantly higher than the alternative item-component correlations in previous publications , but, the study by Jakobsson et al. showed that item-component correlations argued against the suggested structure in a general elderly population (aged 75+) . Scale and component score was lower in older persons, poorly educated persons, the unemployed, those suffering from any disease, and recent health service users [8, 11, 16–19]. Cheak-Zamora et al. showed high test–retest reliability of PCS (ICC = .78) and moderate reliability of MCS (ICC = .60) . Factor analysis yielded two factors and hypothesized item included the same factor in some of the countries [4, 8, 17]. However, the study performed in Israel revealed three factors and physical role loaded as a separate factor , and the results of the study by Jakobsson et al. failed to support a two-dimensional item structure among the elderly population .
This study demonstrated the psychometric properties of the Korean version of SF-12 v2. The vitality (a lot of energy) and MH (calm and peaceful, and downhearted and blue) items in the Korean population scored lower than those in Greek and Iranian studies [8, 17]. Our data showed higher ceiling effects than these studies, but our results were similar to those of a previous study in Chinese adolescents . The RE and RP items were changed from two levels in version 1 to five levels in version 2, although the highest scores were still elevated and they ranged from 70.1% to 82.3% but the floor effects were lesser than those in a previous study [5, 8, 17]. Internal consistency reliability was >0.7 for the PF, RP, and RE scales, but the internal reliability of the MH scale was low at 0.37 in our study. Korean people may be free from the influence of two MH items (Calm and peaceful, downhearted and depressed), respectively. These two items were loaded onto a different factor in a previous study on Korean SF-36 . These findings for reliability are comparable with the reliability of 0.34 found in a Chinese study . Factor analysis of individual items produced partial matching of items to their hypothesized components. However, the loading of the items separated into three factors and aggregated into? SF, RE, and RP items. This pattern is unique to the Korean population, as the RE and RP items were also loaded onto the same factor in the Korean SF-36 v2 . Use of item or scale scores rather than use of two summary measures of the SF-12 v2 seems to be more appropriate in the Korean population.
There were some limitations to our present study. Firstly, although we had recruited respondents nationwide, the external validity of the sample would be limited. The age and sex distributions of our sample were similar to those reported in the 2010 national census, but participants in this study reported lower health care utilization than the participants of the 5th KNHANES, which is a national-wide health survey of more than 30,000 people. Lower health care utilization may indicate that our population sample was healthier than the general Korean population. Healthy people may assign a HRQoL score by producing high item scores and a low floor effect. In addition, we did not explore face validity, concurrent validity, test-retest reliability, and responsiveness for health state change. Therefore, further research on the psychometric properties of the SF-12 v2 is needed.
The Korean SF-12 v2 seems to be a feasible, valid, and reliable instrument for measuring the HRQoL of a general population. The use of scale scores instead of component summaries seems to be more appropriate in Korean people. Further research on other psychometric properties of the Korean SF-12 v2 is desirable.
Health-related quality of life
Mental component summary
Physical component summary
- SF-12 v2:
Short form-12 health survey, version 2
Walters SJ: Quality of Life Outcomes in Clinical Trials and Health-Care Evaluation: A Practical Guide to Analysis and Interpretation. 2009, West Sussex: Wiley
Maruish ME, Turner-Bowker DM: A Guide to the Development of Certified Modes of Short Form Survey Administration. 2009, Lincoln, RI: QualityMetric, Incorporated
Ware J, Kosinski M, Keller SD: A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996, 34: 220-233. 10.1097/00005650-199603000-00003.
Cernin PA, Cresci K, Jankowski TB, Lichtenberg PA: Reliability and validity testing of the short-form health survey in a sample of community-dwelling African American older adults. J Nurs Meas. 2010, 18: 49-59. 10.1891/1061-3722.214.171.124.
Bentur N, King Y: The challenge of validating SF-12 for its use with community-dwelling elderly in Israel. Qual Life Res. 2010, 19: 91-95. 10.1007/s11136-009-9562-3.
Cheak-Zamora NC, Wyrwich KW, McBride TD: Reliability and validity of the SF-12v2 in the medical expenditure panel survey. Qual Life Res. 2009, 18: 727-735. 10.1007/s11136-009-9483-1.
Jakobsson U, Westergren A, Lindskov S, Hagell P: Construct validity of the SF-12 in three different samples. J Eval Clin Pract. 2012, 18: 560-566. 10.1111/j.1365-2753.2010.01623.x.
Kontodimopoulos N, Pappa E, Niakas D, Tountas Y: Validity of SF-12 summary scores in a Greek general population. Health Qual Life Outcomes. 2007, 5: 55-10.1186/1477-7525-5-55.
Kim SH, Jo M-W, Lee S: Psychometric properties of the Korean short form-36 health survey version 2 for assessing the general population. Asian Nurs Res. 2013, 7: 61-66. 10.1016/j.anr.2013.03.001.
Han CW, Lee EJ, Iwaya T, Kataoka H, Kohzuki M: Development of the Korean version of Short-Form 36-Item Health Survey: health related QOL of healthy elderly people and elderly patients in Korea. Tohoku J Exp Med. 2004, 203: 189-194. 10.1620/tjem.203.189.
Thumboo J, Fong KY, Machin D, Chan SP, Leon KH, Feng PH, Thio ST, Boe ML: A community-based study of scaling assumptions and construct validity of the English (UK) and Chinese (HK) SF-36 in Singapore. Qual Life Res. 2001, 10: 175-188. 10.1023/A:1016701514299.
Tseng HM, Lu JF, Gandek B: Cultural issues in using the SF-36 Health Survey in Asia: results from Taiwan. Health Qual Life Outcomes. 2003, 1: 72-10.1186/1477-7525-1-72.
Ware JE, Kosinski M, Keller SD, Institute NEMCHH: SF-12: How to Score the SF-12 Physical and Mental Health Summary Scales. 1995, Boston, MA: Health Institute, New England Medical Center
Maruish ME, De Rosa MA: A Guide to the Integration of Certified Short Form Survey Scoring and Data Quality Evaluation Capabilities. 2009, Lincoln, RI: Quality Metric Incorporated
Brazier JE, Roberts J: The estimation of a preference-based measure of health from the SF-12. Med Care. 2004, 42 (9): 851-859. 10.1097/01.mlr.0000135827.18610.0d.
Brazier JE, Harper R, Jones NM, O’Cathain A, Thomas KJ, Usherwood T, Westlake L: Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ. 1992, 305: 160-164. 10.1136/bmj.305.6846.160.
Montazeri A, Goshtasebi A, Vahdaninia M, Gandek B: The Short Form Health Survey (SF-36): translation and validation study of the Iranian version. Qual Life Res. 2005, 14: 875-882. 10.1007/s11136-004-1014-5.
Franks P, Gold MR, Fiscella K: Sociodemographics, self-rated health, and mortality in the US. Soc Sci Med. 2003, 56: 2505-2514. 10.1016/S0277-9536(02)00281-2.
Lam CL, Fong DY, Lauder IJ, Lam TP: The effect of health-related quality of life (HRQOL) on health service utilisation of a Chinese population. Soc Sci Med. 2002, 55: 1635-1646. 10.1016/S0277-9536(01)00296-9.
Nunnally JC, Bernstein IH: Psychometric Theory 3E. 1994, New York: McGraw-Hill Education (India) Pvt Limited
Sharma S, Mukherjee S: Applied Multivariate Techniques. 1996, New York: John Wiley & Sons Canada, Limited
Larson CO, Schlundt D, Patel K, Beard K, Hargreaves M: Validity of the SF-12 for use in a low-income African American community-based research initiative (REACH 2010). Prev Chronic Dis. 2008, 5: A44-
Fong DY, Lam CL, Mak KK, Lo WS, Lai YK, Ho SY, Lam TH: The Short Form-12 Health Survey was a valid instrument in Chinese adolescents. J Clin Epidemiol. 2010, 63: 1020-1029. 10.1016/j.jclinepi.2009.11.011.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2458/14/1086/prepub
This study was supported by a grant of the Korean Health Technology R&D Project, Ministry of Health & Welfare, Republic of Korea (number of the study: HI13C0729).
The authors declare that they have no competing interests.
All authors contributed to the conception and design of the study, the acquisition of data, and the interpretation of the results. SHK analyzed the data and was involved in drafting the manuscript; MWJ and JA were involved in revising the manuscript to ensure its critically important content. All authors have read and approved the final manuscript.