Data from the China Labor-force Dynamics Survey 2012 (CLDS) were drawn to analyze the association between the migration status and the health status of internal migrants in China. CLDS is a nationally representative, multidisciplinary household survey conducted by Sun Yat-sen University, which is an open access database and publicly available at http://css.sysu.edu.cn/.
The research team applied to the CLDS program and received official permission to use the data. CLDS covers a series of topics, such as demography characteristics, family, education, employment, work history, income, migration and health, et al. The subjects of CLDS are the laborers (all family members between the ages of 15-64) from the sample households which are randomly selected from 29 provinces in China (Hong Kong, Macau, Taiwan, Tibet and Hainan are excluded). The national baseline investigation was conducted in 2012 and 3/4 sample communities were investigatedFootnote 1. More than 800 investigators collected 303 village questionnaires, 10612 family questionnaires, and 16253 individual questionnaires. All investigators were trained and tested before investigation and were monitored during the investigation. Computer Assisted Personal Interviewing (CAPI) technology was adopted to control data quality.
Sampling
The survey adopted a multi-stage, multi-stratified, Probability Proportionate to Size Sampling (PPS) sampling method. The first step was to establish the provincial sampling frames. Six provincial sampling frames were established following the geographic definition of east, west and middle region, along with the population size, large and small population of each province. The second step was to sample the counties. In this stratification, all counties were ranked by GDP per capita, and the primary sampling unit was chosen by systematic sampling from random starting point according to the size of the labor force. The third step was to sample the rural and urban communities. All communities administrated by the sampled counties were ranked by the population size, and PPS was performed according to the scale of labor force from each communities. The last step was to extract the households from communities. The mapping-address method was used to establish the last sampling frame. The household address was chosen by the circulated distance sampling method beginning from a random starting. Finally, the laborers of the household (aged from 15 to 64) were included in the individual sample.
Sample size
To maximize statistical power and meet budget constraint, at least 400 communities were sampled. The best family number in each community was decided by the ratio of community expenditure and household expenditure (C/C1) and the variance (σ2) of secondary sampling unit (SSU). C = 3000, C1 = 700, σ = 0.05, according to Cochran [21] and Raudenbush [22], we determined the best household number as
$$ {\mathrm{n}}_{\mathrm{opt}}\kern0.5em =\kern0.5em 2\sqrt{\frac{\mathrm{c}}{{\mathrm{c}}_1{\upsigma}^2}}\kern0.5em =\kern0.5em 2\sqrt{\frac{3000}{700\times 0.05}}\kern0.5em \approx \kern0.5em 19 $$
Given 10 % tracing loss and the requirement of multi-stage, multi-variable analysis (1.7 times single-stage and single-variable analysis), the final household number within one community was n = 19*1.7/0.9 = 35. So the total household number should be no less than N = 400*35 = 14000. The number of respondents analysed in our research was 15992, with 261 unqualified ones excluded because of missing values.
Variables and coding
WHO defined health as a state of complete physical, mental and social well-being, and not merely the absence of disease or infirmity [23]. Released by the Canadian government in 1974, and acknowledged worldwide, the Lalonde Report first identified the four “health fields” influencing an individual’s health: health biology (all aspects of human health, physical and mental), environment, life style, and health care organizations [24]. Based on the Lalonde Report’s framework, we chose physical health status and psychological health status as the dependent variables, migration status as the independent variable, and demographic characteristics, working environment, socioeconomic status (SES), health behaviors and healthcare services access as control variables. The control variables were designed to measure the four health fields.
Physical and psychological health status was measured respectively in the CLDS questionnaire. For physical health status, respondents were asked the question “In general, how would you rate your overall physical health?” An answer of “excellent/very good/good” was coded 1 while “fair/poor” was coded 0. For psychological health status, respondents were asked “Did emotional problems (for instance, depression or anxiety) affect your daily work or activities?” The answer “none/few/sometimes” was coded 1, and “frequently/always” was coded 0.
According to migration experience, the respondents were categorized as migrants and non-migrants. Migrants could be further divided into “migrant population” and “returned population” by Chinese household registration system (hukou). In this research, migrant population is defined as the rural population who were now living or working out of their registered hometown for over 6 months when participating in the survey. The “returned population” refers to those who were living in their hometown but had had the experience of working out of the registered county for over 6 months. In terms of the hukou system, the non-migrant population was divided into the “urban residents” and “rural residents” who had never lived or worked out of their registered hometown. In this term, four population groups were identified and analysed in this study.
The demographic characteristics include sex (female = 0, male = 1), age (analysed as continuous variable), marriage (single or divorced = 0, married = 1) and education (primary school or lower = 0, junior high school = 1,senior high school = 2,college or above = 3).
Occupational hazard exposure was used to measure the working environment (not exposed = 0, exposed = 1).
The socioeconomic status of interviewees was self-evaluated using the 10-level Likert Scale. Health behaviors include smoking (do not smoke = 0, smoke = 1) and drinking alcohol (do not drink =0, drink =1).
Healthcare access included financial accessibility (without insurance = 0, with insurance = 1) and geographic accessibility (distance to nearest medical facilities from home, kilometres).
Models
Multiple logistic regression analysis was performed to explore the association of migration status and physical health and psychological health by controlling the other factors. A total of 5 regression models were established for the total population and subgroups respectively. Model 1 was designed for the total population, model 2 for males, model 3 for females, model 4 for age ≤ 45, and model 5 for age > 45. Considering that physical health and psychological health interact with one another [20, 25, 26], the psychological health status was controlled in physical status analysis as an independent variable, and vice versa.
The software Stata 11 was used to conduct the statistical analysis. The sampling process considered the investigation design effect, therefore the “Svy” order was adopted to set out sampling parameters and to adjust the systematic error caused by sampling design. The weighted means, ratio and standard errors of variables were reported in descriptive results, so did the weighted parameters in multiple regression analysis.