Longitudinal measurement invariance in urbanization index of Chinese communities across 2000 and 2015: a Bayesian approximate measurement invariance approach

Background The Urbanicity Scale was developed based on the China Health and Nutrition Survey (CHNS) to measure the urbanization index of communities according to 12 components. The present study was designed to systematically investigate the factorial validity, reliability, and longitudinal measurement invariance (LMI) of the Urbanicity Scale. Methods Six waves of CHNS data from 2000 to 2015 were adopted. The factor structure and reliability of the Urbanicity Scale for 301 communities were examined using Bayesian exploratory factor analysis. Metric and scalar LMIs were evaluated using both the conventional exact and a novel approximate LMI approach via Bayesian structural equation modeling across various timeframes. Results The findings verified the one-factor structure for the Urbanicity Scale, with adequate reliability. LMI was established for the Urbanicity Scale only over a shorter timeframe from 2006 to 2009 but not over a longer timeframe from 2000 to 2015. Partial LMI was found in the factor loadings and item intercepts for the Urbanicity Scale over the 2004 to 2011 period. Conclusion Interpretation of the temporal change in urbanicity was supported only for a shorter (2006 to 2009) but not a longer timeframe (2000 to 2015). Adjustments addressing the partial non-invariance of the measurement parameters are needed for the analysis of temporal changes in urbanicity between 2004 and 2011. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-11691-y.


Background
The past two decades have witnessed a remarkable social and economic transformation in mainland China [1]. Chinese societies have undergone a continuous urbanization process in which a growing population shares a modernized and improved environment for health infrastructure, housing, sanitation, communications, markets, and education [2]. Earlier research [3] relied on a crude rural/urban dichotomy and did not explicitly assess the degree of urbanization using a valid instrument. In a systematic review [4] of 11 relevant studies on urbanicity, eight of the studies did not explicitly test the psychometric properties of their urbanicity scales. Astounding inconsistencies in measurement reliability and validity point to the need for a tested, valid, and reliable measure of urbanicity. The China Health and Nutrition Survey (CHNS) [5] is a longitudinal national survey that collected 10 waves of measurements between 1989 and 2015 on societal and economic transformation at the community level and the nutrition and health status of citizens in China.
Using the CHNS data across 1989 and 2006, Jones-Smith and Popkin [6] developed an Urbanicity Scale to measure the urbanization index of communities based on 12 community-level components. Compared with scales developed in other contexts [7,8], the scale developed by these authors was the only one rated as being high quality (rating score = 4 out of 5) in evaluations of its content validity by expert panels and reviewers, internal consistency and test-retest reliability, construct validity through exploratory factor analysis, and criterion validity. Based on the CHNS data, the Urbanicity Scale captured unique changes in community contexts across time and geographic locations and provided useful insights into associated health effects [9]. Despite the unidimensional nature of the Urbanicity Scale, no scale validation studies have been conducted with the three more recent waves of CHNS data from 2009 to 2015. Given the lack of systematic evaluation of the psychometric properties, the first objective of the present study was to examine the factorial validity and reliability of the Urbanicity Scale with CHNS data across 2000 and 2015.
Apart from factorial validity and reliability, measurement invariance across time is another essential measurement property for an assessment scale. Longitudinal measurement invariance (LMI) examines the stability of factor loadings and item intercepts and requires the scale to measure latent factors in the same way over time [10]. If LMI holds, changes in the test scores over time can be attributed to changes in the underlying construct [11]. For the CHNS data, the dynamic nature of the urbanization process often necessitates an examination of the temporal change in urbanicity [12]. To ensure meaningful comparisons of the latent factor means of urbanicity across time, the measurement structures of the Urbanicity Scale should be invariant or stable in terms of factor loadings and item intercepts.
In the case of longitudinal non-invariance of measurement parameters, temporal changes in latent factor means would be conflated with discrepancies in the loadings and/or intercepts across time [13]. This conflation would, in turn, induce measurement bias and obstruct comparisons and inferences regarding latent factor means. Despite the abundance of research on urbanization that has used the CHNS data, the assumption of LMI for the Urbanicity Scale is yet to be tested. The fulfillment of LMI would establish a psychometric basis for applied researchers to analyze temporal changes in urbanicity with other substantive variables in the CHNS data. In light of the research gap, the second objective of the present study was to examine the LMI of the Urbanicity Scale with the CHNS data across 2000 and 2015.
Conventional practice on LMI focuses on exact LMI where factor loadings and item intercepts are expected to be remain unchanged over time [14]. However, this assumption could be overly restrictive and difficult to meet [15,16]. A recent advancement in the methodological literature [17][18][19] advocates the use of approximate measurement invariance via the Bayesian structural equation modeling (BSEM) approach [20]. Approximate LMI replaces the exact zero constraints on the betweentime differences of the measurement parameters with approximate zero informative priors that allow some "wiggle room" [20]. The "wiggle room" permits small differences between the parameters as a compromise between zero and no constraints and facilitates comparison of the latent means. Approximate LMI has been shown to outperform exact or partial LMI in detecting the true latent mean difference [15]. The present study utilized the BSEM approach to examine the LMI of the Urbanicity Scale across 2000 and 2015 via both the exact approach and the approximate approach.

Study design and sample
The present study was based on community-level data originating from six waves of the CHNS in 2000, 2004, 2006, 2009, 2011, and 2015 [5]. The CHNS is a largescale panel survey that began back in 1989 to identify potential linkages between social and economic transformation at the societal level and the nutrition and health status of citizens in China. The CHNS sampled and collected information on health, diet, nutrition, and income from individuals and households. Across the six waves of measurements from 2000 to 2015, the survey recruited a total of 301 communities via a multistage, random cluster sampling process from 12 provinces and municipal cities in China. A total of 205 sampled communities (68.1%) provided complete data across all six measurement waves and over 70% (N = 216, 71.8%) participated in at least four waves. Nearly onefourth of the sampled communities (N = 72, 23.9%) joined the CHNS in 2011 and provided only the last two waves of data. Ethical approval for this secondary analysis was obtained from the human research ethics committee of the authors' university (reference number = EA1809036).

Measures
The urbanization index of the sampled communities was measured by the Urbanicity Scale developed by Jones-Smith and Popkin [6]. Urbanicity was operationalized based on 12 identified components at the community level: communications, population density, diversity, economic activity, health infrastructure, housing, traditional markets, social services, transportation, education, modern markets, and sanitation. Each of the communities was scored on these 12 components through local area administrators or official records on an 11-point anchored format from 0 to 10, with lower scores denoting lesser degrees of urbanization. The results of the original scale development study [6] suggested a unidimensional structure on urbanicity in the CHNS data. The overall urbanization index was calculated as the sum of the 12 component scores with a theoretical range from 0 to 120. The Urbanicity Scale showed good reliability (Cronbach's α = 0.85-0.89) and high temporal stability in the assessments from 1989 to 2006. Urban and rural communities were classified by their baseline urbanization score in 2000, using a score of 60 as a cutoff. Table 1 displays the univariate statistics (means, standard deviations, and skewness) for the 12 components of the Urbanicity Scale. Given the lack of known multidimensionality in the Urbanicity Scale, exploratory factor analysis (EFA) was conducted using the oblique Geomin rotation in Mplus 8.4 [21] to explore one-factor to fourfactor structures. Most of the 12 component scores did not deviate substantially from normal distributions, with skewness ranging from − 1.14 to 1.41 throughout the six waves of measurements. These components were modeled as continuous variables under both the frequentist and Bayesian approaches. Model comparison of the onefactor to four-factor models was performed based on the model fit and eigenvalues using parallel analysis under maximum likelihood estimation. Bayesian estimation was applied to derive the Bayesian information criterion (BIC) and posterior probabilities of the factor models via the Bayes factor. Missing data were handled via full information maximum likelihood under the missing-atrandom assumption [22].

Data analysis
All Bayesian models were estimated with two Markov chain Monte Carlo chains and noninformative priors over a minimum of 10,000 iterations [23]. The latter half of the iterations was used to empirically derive the posterior parameter distribution of the parameters. Model convergence was checked using the trace plots, autocorrelation plots, and potential scale reduction criterion [24], with values of less than 1.05 implying small between-chain variation relative to within-chain variation. Under the Bayesian posterior predictive checking, an exact model fit was achieved in case of large posterior predictive p values (PPP > 0.10) and negative lower 95% posterior predictive limit (PPL). Approximate model fit    Following the establishment of the factorial validity and reliability, the BSEM approach was used to examine the LMI of the Urbanicity Scale from 2000 to 2015. Configural, metric, and scalar invariance models were sequentially specified to evaluate the longitudinal invariance of factor structure, factor loadings, and item intercepts, respectively [26]. First, the configural invariance model was estimated as the baseline model to allow different factor loadings and item intercepts across time. Then, exact metric and scalar invariance models were estimated that constrained the factor loadings and item intercepts to be equal across time, respectively. The comparison of nested (configural versus metric and metric versus scalar) models was made based on differences in approximate fit indices [27]. According to Chen [28], a decrease of at least 0.01 in the CFI supplemented by an increase of at least 0.015 in the RMSEA would indicate non-invariance in metric and scalar invariance models.
In the case of a lack of exact LMI, approximate LMI models were estimated under the BSEM approach [20]. This approach postulates the measurement parameters (factor loadings and item intercepts) to be approximately equal across time via zero-mean, small variance informative priors [29]. Sensitivity analysis was performed by increasing the variance (V) for the informative priors from V = 0.01 to 0.04 and 0.09, which would imply SDs of ±0.1, ±0.2, and ± 0.3 for the differences in the measurement parameters, respectively. The results of the approximate LMI models gave the time-average measurement parameters and time-specific deviations from the average value. Apart from the statistical significance of these deviations, violations in LMI were evaluated in terms of practical significance, with a relative mean change value of larger than 10% considered as indicative of substantial bias [30]. In case of substantive longitudinal non-invariance, follow-up measurement invariance tests would be conducted across different timeframes. The Mplus analysis scripts are available from the corresponding author upon reasonable request.

Descriptive profile of the communities
The number of communities recruited across the six waves was 217, 216, 218, 218, 290, and 288, respectively, from 2000 to 2015. Rural villages accounted for 40.5% of the 301 sampled communities. Approximately onefourth of the communities were cities (N = 80), one-sixth were towns (N = 53), and the remaining communities were suburban neighborhoods (N = 46). More than half (58.1%) of the recruited communities were located at rural sites and less than half (41.9%) at urban sites. The urban communities (cities or suburban neighborhoods) were significantly more urbanized (p < 0.001) than the rural communities (towns or rural villages) in the 2000 wave, with means (SD) of 71.7 (13.5) and 52.8 (17.1), respectively. There were significant and moderate differences (Cohen's d = 0.50-0.64, p < 0.001) in the urbanization index between the urban and rural communities across the 2000-2015 waves. Table 2 shows the fit indices of the Bayesian EFA models of the Urbanicity Scale from 2000 to 2015. Problems occurred in the estimation of EFA models with three and four factors with no model convergence. The two-factor model showed a better model fit than the one-factor model in terms of negative lower 95% PPL and greater PPP. However, the first factor displayed a much higher eigenvalue (5.16-6.27) than the second factor (0.90-1.14) throughout the six waves of assessments. The average eigenvalues derived from the parallel analysis ranged from 1.25 to 1.29, which consistently exceeded the eigenvalues of the second factor across the six waves. The results of the parallel analysis supported retaining only the first extracted factor. The one-factor model reliably showed a lower BIC than the two-factor model from 2000 to 2015.

Factorial validity and reliability
The results of Bayes factor testing highly favored the Urbanicity Scale being unidimensional in nature, with posterior probabilities for only one factor ranging from 0.96 to 1.00 throughout the six measurements.
In the two-factor model, there were significant and strong correlations (r = 0.55-0.78, p < 0.01) between the two factors across the six measurement waves. As shown in the supplemental table, the two-factor solution did not show a clear and consistent pattern of factor loadings on the 12 components from 2004 to 2009. There were cross-loadings or no significant loadings on a number of items, and the factor loading pattern differed substantially across the measurement waves. The unstable results pointed to potential over-extraction of the factors. The one-factor model did not show an exact fit with positive lower 95% PPL and low PPP (≤ 0.05) but displayed an approximate fit to the data from 2000 to 2015 in terms of RMSEA and CFI. All 12 components loaded significantly and substantially (λ = 0.41-0.88, p < 0.01) on the total urbanicity factor and the one-factor solution from 2004 to 2009 (as displayed in the supplemental table). In the one-factor model, the urbanicity factor exhibited good levels of composite reliability (ω = 0.88, 0.91, 0.91, 0.90, 0.90, and 0.85) across the six waves of measurements from 2000 to 2015, respectively. Table 3 shows the fit indices of various one-factor BSEM measurement invariance models for the Urbanicity Scale across different timeframes. Under noninformative priors, the configural model displayed adequate approximate fit (CFI = 1.00 and RMSEA = 0.000) to the 2000-2015 data. Specification of exact metric invariance led to greatly increased PPLs and substantial deteriorations in the fit indices (ΔCFI = − 0.025 and ΔRMSEA = + 0.024) compared with the configural model, implying that the assumption   Table 4, seven of the 12 components (communications, population density, diversity, housing, social services, education, and sanitation) displayed significant and substantial (Δ = 17.5-42.7%) deviations from the average λ across time. The majority (10/13) of the greatest deviations from the average λ were located in either the 2000 or the 2015 wave.
In terms of scalar longitudinal invariance, specification of exact scalar invariance led to greatly increased PPLs and substantial deteriorations in the fit indices (ΔCFI = − 0.107 and ΔRMSEA = + 0.038) relative to the approximate metric invariance model. This rejected the assumption of exact scalar invariance from 2000 to 2015. The alternative specification of approximate scalar invariance with V = 0.01 and V = 0.04 still resulted in substantial decrements in the fit indices (ΔCFI = − 0.016 to − 0.061 and ΔRMSEA = + 0.010 to + 0.027). A further increase of prior variance to V = 0.09 led to a comparable model fit (ΔCFI = − 0.005 and ΔRMSEA = + 0.004). As Table 5 indicates, significant deviations were found across time from the average ν for all 12 components. Ten of the 12 deviations were considered substantial (Δ = 10.4-60.2%); the exceptions were those for population density (3.8%) and modern markets (8.0%). Over half (15/24) of the greatest deviations from the average ν occurred in either the 2000 or the 2015 wave.

Measurement invariance across 2004-2011
The previous results identified the non-invariant factor loadings and item intercepts as occurring mostly in the 2000 or 2015 waves. As a result, follow-up measurement invariance tests were conducted for the Urbanicity Scale across 2004 and 2011. As shown in Table 3, the configural model displayed an adequate approximate fit (CFI = 0.996 and RMSEA = 0.012) to the 2004-2011 data. Specification of exact metric invariance led to greatly increased PPLs and substantial deteriorations in the fit indices (ΔCFI = − 0.017 and ΔRMSEA = + 0.015) relative to the configural model, implying the untenability of exact metric invariance. Specification of approximate metric invariance with V = 0.01 resulted in a fit (ΔCFI = − 0.006 and ΔRMSEA = + 0.006) comparable to that of the configural model. As shown in Table 6, three of the 12 components (diversity, housing, and education) showed significant and substantial (Δ = 12.1-22.2%) deviations from the average λ across time. The remaining nine components did not display significant and substantial (Δ = 2.0-10.3%) deviations from the average λ across time. The majority (4/5) of the greatest deviations from the average λ were found in either the 2004 or the 2011 wave.
In terms of scalar invariance, specification of exact scalar invariance led to sharply increased PPLs and substantial deteriorations in the fit indices (ΔCFI = − 0.070 and ΔRMSEA = + 0.034) relative to the approximate metric invariance model. This rejected the assumption of exact scalar invariance from 2004 to 2011. Although specification of approximate scalar invariance with V = 0.01 resulted in substantial decrements in the fit indices (ΔCFI = − 0.024 and ΔRMSEA = + 0.016), a further increase of prior variance to V = 0.04 led to a model fit (ΔCFI = − 0.005 and ΔRMSEA = + 0.005) comparable to that of the approximate metric invariance model. As can be observed in Table 7, significant deviations were found       Table 8, significant deviations were found across time from the average ν for four of the 12 components (communications, diversity, health infrastructure, and housing). None of these four deviations were considered substantial (Δ = 3.3-5.6%).

Discussion
The present study involved a systematic evaluation of the psychometric properties of the Urbanicity Scale using six waves of CHNS data spanning from 2000 to 2015. In terms of dimensionality, the EFA results obtained from both the frequentist approach (eigenvalues via parallel analysis) and the Bayesian approach (BIC and posterior probabilities) support the one-factor structure as a parsimonious fit of the underlying construct of urbanicity. Despite the better model fit, the two-factor model is less parsimonious, less interpretable, and  subject to potential factor over-extraction. These findings corroborate the unidimensional nature of the Urbanicity Scale. Good omega coefficients were consistently found for the total urbanicity factor throughout all six waves of measurements, suggesting adequate reliability for the Urbanicity Scale. This study is the first to systematically investigate the LMI for the Urbanicity Scale across a 15-year period using exact and approximate LMI approaches via BSEM. Across six waves of measurements from 2000 to 2015, exact LMI was rejected for both metric invariance (factor loadings) and scalar invariance (item intercepts). Approximate LMI models resulted in adequate model fits with prior variances specified for the differences in factor loadings (V = 0.01) and item intercepts (V = 0.09). However, among the 12 items, statistically significant (p < 0.05) and practically substantial (Δ > 10%) deviations were found in seven factor loadings and 10 item intercepts across time. The occurrence of non-invariance in the majority of instances for both measurement parameters essentially implies a lack of LMI for the scale across this timeframe. In particular, the 2000 wave of CHNS data exhibited the greatest degrees of non-invariance. This discrepancy could reflect potential alterations or shifts in the scoring algorithm of the components in the Urbanicity Scale between the 2000 and 2004 waves.
Given the lack of LMI across 2000 and 2015, followup LMI analyses were conducted of the scale over shorter timeframes. Across four waves of measures from 2004 to 2011, exact LMI was not supported in either factor loadings or item intercepts. Approximate LMI models resulted in adequate model fits using zero-mean, small variance informative priors for the differences in factor loadings (V = 0.01) and item intercepts (V = 0.04). For the 12 items, statistically significant and practically substantial deviations were found in three factor loadings and five item intercepts across time, and two components (diversity and housing) displayed substantial non-invariance in both parameters. Further investigation of LMI supported the existence of exact metric invariance across the two waves from 2006 to 2009. Although the Urbanicity Scale displayed approximate but not exact scalar invariance across 2006 and 2009, statistically significant deviations were found in only four item intercepts, with none being practically substantial (Δ < 6%). These results demonstrate LMI for the Urbanicity Scale across the timeframe from 2006 to 2009.

Practical implications
The present study revealed intriguing findings regarding the measurement stability of the Urbanicity Scale, with the degrees of non-invariance increasing in proportion to the length of the timeframe being scrutinized. Our findings did not support any form of (exact or approximate) LMI across the longest timespan from 2000 to 2015. The substantial degrees of non-invariance in both factor loadings and item intercepts (Tables 4 and 5) imply that no partial LMI was feasible under this timeframe. The findings demonstrated longitudinal measurement non-invariance from 2000 to 2015, and comparisons of the latent means of urbanicity would probably be confounded by the existing measurement biases across time. The lack of psychometric support suggests that future longitudinal studies using the CHNS should not analyze temporal changes in urbanicity across such a long time span.
The present study did, however, demonstrate LMI of the scale across adjacent waves between 2006 and 2009. The trivial non-invariance in the item intercepts (Table  8) should not have substantial impacts on inferences of temporal changes in urbanicity. A psychometric basis was established for meaningful comparisons of the latent means of urbanicity across this timeframe. Our findings appear to point toward partial approximate LMI for the Urbanicity Scale across 2004 and 2011. Comparisons of the latent means of urbanicity across 2004 and 2011 are theoretically plausible through specification of partial invariance models. However, the associated temporal changes in urbanicity should be interpreted with caution, and future researchers need to properly adjust for the partial non-invariance of the measurement parameters in the scale.

Methodological implications
Assessing the LMI of a measurement scale is fundamental to establish the temporal stability of the assessed constructs and thus enable meaningful interpretation of longitudinal findings [31]. Nevertheless, examination of the LMI of assessment scales over multiple (six) repeated measurements remains relatively rare. The present study contributes to the literature on urbanicity through its novel application of the approximate measurement invariance approach. The approximate LMI approach allows researchers to make unequivocal tradeoffs between the degrees of model fit and measurement non-invariance across time [15]. This approach is useful in assisting researchers to obtain a balance between achieving a well-fitting model, adhering to the invariance requirements, and making comparisons possible [32]. In addition, the use of informative priors via BSEM helps researchers evaluate the statistical and practical significance of between-time differences among measurement parameters. Given the frequent rejection of classical LMI tests in applied research, the approximate LMI approach could be regarded as a promising and realistic alternative.
Apart from the Bayesian approximate measurement invariance approach, an alignment method was proposed by Muthén and Asparouhov [33] for multiple-group confirmatory factor analysis. Their alignment method has the capability to estimate group-specific factor means and variances in factor models without requiring exact measurement invariance. The ability to compute aligned factor scores for the full sample despite the presence of non-invariance in some groups facilitates comparisons of factor means on the basis of a configural invariance model [34]. This technique is suitable and feasible for assessing measurement invariance in large data sets across numerous groups, such as in comparisons across multiple countries. Munck, Barber, and Torney-Purta [35] recently demonstrated the usefulness of the alignment method for group comparisons of European youth attitudes toward immigrants across a total of 92 groups (country by cohort by gender). Future studies are recommended to evaluate the use of the alignment method in longitudinal measurement invariance and the possibility of integrating model alignment with approximate measurement invariance via the Bayesian approach.

Study limitations
Several limitations of the present study should be pointed out. First, measurement invariance of the Urbanicity Scale was only evaluated across time and not across community context (rural vs. urban sites). The relatively small sample sizes (N < 200) for the rural and urban sites did not provide adequate statistical power for accurate detection of scale measurement invariance. Additional studies are required to elucidate potential measurement biases related to contextual factors such as geographic locations and contexts. Second, the CHNS did not include data on community characteristics such as social networks and culture. The Urbanicity Scale derived from the CHNS data could place a disproportionately large emphasis on economic activities, which would raise doubts regarding the content validity and item coverage of the scale. Third, the present study focused only on the factorial validity and LMI but not the convergent validity and divergent validity of the Urbanicity Scale. Investigation of its convergent and divergent validity with reference to individual-level outcomes such as obesity, physical activity, and lifestyle would require multilevel analyses that were outside the scope of the present study. Future research should attempt to evaluate the associations between urbanicity and these substantive variables as in previous studies [9,12] while taking into account the measurement non-invariance in the parameters across measurement waves.