We have shown here that the K-mMMSE is a valid, reliable, and stable cognitive screening instrument, as well as being more sensitive to all levels of CIND and dementia, compared to the K-MMSE. The K-mMMSE has been shown to have a broader spectrum of cognitive domains, including political figures, word fluency, similarities, and delayed recall. Furthermore, the expanded 100 point scoring allows finer discrimination of cognitive impairment. Thus, the K-mMMSE represents a summary form of administration and scoring.
Internal consistency results of the K-mMMSE and K-MMSE were comparable to those observed in previous community studies. Cronbach's alpha (á) for the 3MS has been reported to be 0.91 in a community study, a value identical to that found here [23]. Another population study has reported alphas for the 3MS and MMSE of 0.87 and 0.78, respectively, which were slightly lower than our findings of 0.91 and 0.84 [6]. Cronbach's alpha has been reported to be influenced by educational status or variability of response, in that it was higher in groups having fewer years of education [24] and in clinical populations having greater variability [25]. Our population consisted of a high percentage with no formal education (50.2%), and their scores were very variable (inter-quartile ranges for the K-mMMSE and the K-MMSE of 48–80 and 14–25, respectively).
Test-retest reliability results of the K-mMMSE were also comparable to those in previous studies. Correlation coefficients of 0.91 to 0.93 have been reported for small samples of community dwelling residents and dementia patients, which are slightly higher than our value of 0.89 [26]. The Stirling County Study found a coefficient of 0.78, but items requiring less judgment exhibited lower reliability than items requiring more judgment [23]. In contrast, we found markedly lower reliability in items requiring more judgment, i.e., similarities (r = 0.37), compared with simple items, i.e., temporal orientation (r = 0.81). The discrepancy might be due to a difference of time lag, in that the Stirling County Study retest was performed over a 3 year interval, with individual retests ranging from 0.9 to 4.0 years. Furthermore their retested subjects were not representative of all participants. We observed a correlation coefficient of 0.85 for the K-MMSE over all levels of cognitive status, which is in line with generally acceptable findings [2]. Scores on the K-mMMSE and K-MMSE increased after retest, with differences in mean values of 4.4 and 2.6 points, respectively, presumably due to a practice or studying effect after a short interval [1, 27, 28].
The K-mMMSE was superior to the K-MMSE for diagnosis of all levels of CIND or dementia, as well as being slightly superior at almost all cut-off points. Since dementia is usually preceded by CIND or mild cognitive impairment (MCI), the definition of both requires explication [29, 30]. Subjects with CIND or MCI have been found to be at increased risk for developing dementia or, more specifically, Alzheimer's disease and some vascular subtypes of dementia [30, 31]. The difference between the K-mMMSE and the K-MMSE in diagnosing this condition might mean that the former was more sensitive to the mild stage or pre-dementia than the latter. In this respect, the K-mMMSE seemed to partially overcome a weakness of the K-MMSE, that is, insensitivity to mild dementia [4]. The present findings suggested that the K-MMSE was actually a fairly reasonable instrument as well. Given the faster administration of the K-MMSE, it would be a choice of clinicians to use optimally, recognizing that the K-MMSE was slightly inferior in terms of its test characteristics.
The two cognitive screening measures did not differ significantly, however, in the detection of dementia. These results are comparable to the findings of McDowell et al. [6]; however, their validity results differed between the two language groups studied, namely French and English speakers. The 3MS was superior only in the diagnosis of combined CIND or dementia in French, but not English, speaking participants. There were fewer French than English participants (434 vs. 1166), and they had fewer years of education (6.8 vs. 9.2 years). These differences were also observed in our study samples, with the most important being the smaller sample size, inasmuch as statistical significance was directly influenced by the total number of participants [22].
Even Cache County modifications to the 3MS showed a good sensitivity and specificity, 3MS-R was also dependent on their cultural and social factors which might limit general use in non-US population [9]. For this reason, cross validation was a very important step for a cultural validation of the instrument. The K-mMMSE was shown to be more significantly correlated with other tests for cognitive status or functional abilities, such as the CDR, S-SDQ, and K-IADL, than was the K-MMSE. The correlation coefficients of the CDR were higher than the informant questionnaires, which might be due to the characteristics of the questions. That is, the K-mMMSE and K-MMSE are cognitive screening measures, and the CDR includes items about the cognitive aspects for scoring, whereas the informant questionnaires (S-SDQ and K-IADL) are comprised only of questions related to functional abilities. To the best of our knowledge, this is the first report showing concurrent validity of the modified MMSE series.
There are important limitations to our findings. First, the subjects who participated in this study showed very low levels of educational background, perhaps limiting its general usefulness, especially regarding the cut-off points for a diagnosis of CIND or dementia. The low educational attainment, however, has been one of important characteristics of our elder population, because they were largely deprived of education due to Korean War and Japanese colonial dominion over the country [32]. And the study design, which showed the validities of and comparison between the two cognitive screening measures, would be appropriate for selected community samples, because all participants have a two-stage interview and a clinical examination, thus reducing verification bias. Second, although the trapezoidal rule provide a more accurate method of estimating the "true" AUC, an AUC derived from the parameters of a straight-line fit to the ROC plot tends to slightly underestimate the AUC of a Gaussian-based ROC. Finally, although we observed no significant differences between participants and non-participants, the rate of participation in our study was somewhat low. The majority of non-participants were those with whom we could not meet on two separate visits, suggesting that individuals who refused to participate may be more intelligent or active than the participants. If this were true, however, our results would not change, and additional statistical power may be added to our analysis.