Data
We adopted a dataset from Chinese Health and Retirement Longitudinal Study (CHARLS) conducted in 2015, and the data was issued by the China Center for Economic Research at Beijing University. By using the PPS Method (Probability Proportional to Size), `28 provinces in mainland China were randomly selected in the first stage, and 150 county-level units were selected in the next stage. Thirdly, 450 village-level units were chosen. Then, 12,400 households were interviewed. Finally, 23,000 individuals who were aged 45 and older were interviewed. These samples can be tracked every two to 3 years in the future, and a detailed description of the questionnaire had been published [26]. The CHARLS study has get the approval for interviewing respondents and collecting data by the Biomedical Ethics Review Committee of Peking University, and the informed consent was required to sign by the respondents.
In our study, older rural-to-urban migrant workers included met the following inclusion criteria: (1) aged 50 ~ 65; (2) with rural hukou (Chinese household registration system); (3) their permanent was cities and towns; (4) employed for more than 6 months. Older rural dwellers included met the following inclusion criteria: (1) aged 50 ~ 65; (2) with rural hukou; (3) their permanent was village; (4) not employed, or employed for less than 6 months. Older urban residents included met the following inclusion criteria: (1) aged 50 ~ 65; (2) with urban hukou; (3) their permanent was cities and towns. In our study, we restricted the age from 50 to 65 [27], to eliminate those who have exited the labor market. Four thousand eight hundred respondents (213 older rural-to-urban migrant workers, 3264 older rural dwellers and 703 older urban residents) were identified in the final sample for further analysis after data cleaning.
Measurement
For the response variables of depressive symptoms, a Chinese version 10-item short form of the Center for Epidemiologic Studies-Depression Scale (CES-D 10) had exhibited a good internal consistent reliability and good construct (Cronbach’s alpha coefficients =0.813) [28, 29]. Per standard practice, the two positively oriented items, happiness and hope, were re-coded to be similar to the negatively oriented items. We used a 4 ~ point rating, ranging from rarely or none of the time (< 1 day), some or a little of the time (1 ~ 2 days), occasionally or a moderate amount of the time (3 ~ 4 days), to most or all of the time (5 ~ 7 days). Overall depressive symptoms was computed by combining the CES-D 10 values of ten items, which ranged from 0 to 30, with higher scores indicating more perceived poorer depressive symptoms. According to previous studies [28, 29], a score of 10 and over on the CES-D 10 indicated having depressive symptoms. Thus, the depression symptoms was a dummy variable equal to 1 if the score was 10 and over, and 0 otherwise.
Social-ecological model
In our study, we followed the construction of some variables in previous research on ecological models of mental health [23, 24, 30,31,32]. According to the Social-Ecological Model, five dimensions and independent variables in our study can be constructed from the following aspects:
First, biological characteristics and perceived susceptibility: age group (50 ~ 60, 61 and above), gender (male, female), have you been ill in the last month (yes, no), self-reported health (completely satisfied, very satisfied, somewhat satisfied, not very satisfied, not at all satisfied) and chronic diseases (no, one, two and above).
Second, life-style and health behavior variables: smoking, drinking, sleeping time at night (<=4 h, 4 h ~ 8 h, > 8 h) and nap after lunch (yes, no).
Third, social interpersonal variables considered in the study were living arrangement (married with spouse present, live without spouse present), education level (below primary school, primary school, middle school and above) and social activity (none, one, two and above).
Four, living conditions and economic status: type of in-house shower (hot water provided, water heater installed by the household, no), clear in this house (good, fair, bad), geographic characteristics reflecting the potential regional heterogeneity (east, central, west) and expenditure. Expenditure was measured by yearly personal expenditure, and then was divided into five quintiles, from the poorest expenditures quintile and the richest quintile.
Five, social system considered in the study were type of health insurance {basic health insurance [including new cooperative medical insurance (NCMS [33])], no} and type of pension insurance [basic pension insurance, no] .
Coarsened exact matching method
By using comparative analysis approach, our study analyzed the differences of depressive symptoms among older rural-to-urban migrant workers and their urban and rural counterparts. As Mark [34] put it, migration for work was not decided by a random selection. Contrarily, it involved several selections, such as self-selection and financial selection. To eliminate the deviation caused the selections and guarantee better balance of empirical distributions of the covariates between the comparison groups, we applied Coarsened exact matching (CEM) [35,36,37,38]. The approach helped to avoid the logic cycle in the matching process and reduce model dependence between the comparison groups. In general, the basic algorithm of CEM mainly included three procedures. The first step was to coarsen the variables to groups and appoint the indistinguishable values with the same value. Second, the algorithm of exact matching was employed. After removing the coarsened data, the final matched data should be reserved [39, 40]. In this study, groups were matched based on the employment status, so that they were comparable. In our study, migration for work in cities for older rural-to-urban migrant workers represented a different change in status. It meant “moving out” for older rural residents, but it meant “moving in” for older urban residents. Due to the Hukou system in mainland China, there was a huge gap between urban and rural residents, both in terms of socio-economic characteristics and mental health service. Therefore, we estimated the depressive symptoms in the three older groups by CEM. The covariate distributions of the data for those who moved out from rural areas and those who did not are different, and the covariate distributions for those who moved to cities and those who did not were also different. In our study, gender, age group, living arrangement, educational attainment, health insurance, pension insurance and economic status were used for the variable matched. If we simply put three categories together and use older rural-to-urban migrants as the reference group in same model, the variables matched would make the three older groups less comparable. Also, it might lead to the bias caused by the migration status(i.e., moving in or moving out). Therefore, we made two comparisons between older rural-to-urban migrant workers versus older rural dwellers, and between older rural-to-urban migrant workers versus older urban residents. In addition, CEM can improve estimates of the causal effect with the lowest bias for any sample size [41]. The increased efficiency and lower bias properties of CEM were attributed to stratification and exact matching of the treated and non-treated groups based on variables that explained variance in the outcome of our interest, difference-in-difference computations, and strata-based weighting within a nonparametric framework.
The multivariate imbalance measure L1 can be used to test the imbalance before and after CEM. L1 ranged from 0 to 1 (0 standed for perfect balance and 1 standed for maximal imbalance). A higher value meant a larger imbalance between comparison groups. A lower value meant more perfect global balance. A substantial reduction in L1 indicated a well-balanced matching [40]. If a sufficient degree of bias has been removed, the weights can be used in descriptive statistics and Logit models to determine the causal effect of the treatment effect [40]. Details on how to compute CEM in Stata can be found in previous studies [42]. CEM is an ado command by Blackwell, not an official Stata command, and CEM can be modeled by using the “cem” command code in Stata15.0.
Decomposition method
Decomposition method was used to decompose the mental health differences into the contribution of various factors. If the mental health outcome was a continuous variable, Oaxaca-Blinder decomposition method was extensively adopted to analyze the contributions of health differences in different groups [43,44,45]. In most cases, however, the mental health outcome variables were seldom continuous. Since our outcome variables were dummy indicating whether the respondent currently suffered from depression symptoms, our study used the non-linear decomposition methods proposed by Fairlie and Bartus [46].
Following Fairlie [47], the decomposition for a nonlinear equation, \( Y=F\left(X\overset{\wedge }{\beta}\right) \) can be written as:
$$ {\overline{\mathrm{Y}}}^{\mathrm{w}}\hbox{-} {\overline{\mathrm{Y}}}^{\mathrm{B}}=\left[\sum \limits_{i= 1}^{N^w}\frac{F\left({Xi}^w{\hat{\beta}}^w\right)}{N^w}\hbox{-} \sum \limits_{i= 1}^{N^B}\frac{F\left({Xi}^B\;{\hat{\beta}}^w\right)}{N^B}\right]+\left[\sum \limits_{i= 1}^{N^B}\frac{F\left({Xi}^B\;{\hat{\beta}}^w\right)}{N^B}\hbox{-} \sum \limits_{i= 1}^{N^B}\frac{F\left({Xi}^B\;{\hat{\beta}}^B\right)}{N^B}\right] $$
(1)
To calculate the decomposition, we defined \( \overset{-w}{Y} \) and \( \overset{-B}{Y} \) as the average probability of the binary mental health outcomes of two groups, and F as the cumulative distribution function from the logistic distribution. \( \overset{-w}{Y}-\overset{-B}{Y} \) represented the total gap due to group differences. Where Nj was the sample size for group j. In our study, j presented these two groups of w and B. This alternative expression for the decomposition was used because Y did not necessarily equal. \( F{\left(X\overset{\wedge }{\beta}\right)}^3 \) The equation showed that the differences was made up of two components: explained component and unexplained component. In (1), the first term in brackets represented the part of the gap that was due to group differences in observed characteristics and a part attributable to differences in the estimated coefficients. The second term represented the part because of the differences caused by the levels of Y. Contribution to the differences in depressive symptoms between different older groups and the proportion of contribution in the differences were reported.
Statistical analysis
The descriptive statistics analyses showed the details, and the chi-square test was used to examine categorical variables. The logistic regression was applied with weighted data to estimate the association between influencing factor and depressive symptoms. Multicollinearity was quantified by variable inflation factors (VIF); and the cut-offs of 5, 10, and sometimes 30 would indicate problematic levels of multicollinearity [43]. All results were presented as odds ratios with 95% confidence intervals (CIs). Finally, Fairlie’s decomposition was performed for the contributions of the differences. All procedures were conducted using STATA 15.0 (StataCorp LP., College Station, TX, USA). The statistical significance level was defined as 0.05.