Data
Data was obtained from CHNS, 1991–2011, which is publicly available (http://www.cpc.unc.edu/projects/china/data/datasets/longitudinal). The researchers from the Carolina Population Center who are responsible for the collection of CHNS data had received ethic approval the University of North Carolina at Chapel Hill. CHNS was a longitudinal survey mainly covering Guangxi Zhuang and eight other provinces with substantial variations in terms of geography, economic development, public resources, and health indicators in the years 1991, 1993, 1997, 2000, 2004, 2006, 2009, and 2011. The nine provinces accounted for approximately 45 % of China’s population in 2011. The survey collected data on detailed information on measures of health behaviors and corresponding outcomes, such as smoking, drinking, height, and weight.
A multistage, random cluster process was included to draw the sample surveyed in each province. Counties in the nine provinces were stratified by income level (low, middle, and high). Four counties were randomly selected from each province by a weighted-sampling scheme (1 in low, 2 in middle, and 1 in high-income levels). In addition, the provincial capital and the lowest-income city were also included. Villages and townships within the counties and urban and suburban neighborhoods within the cities were selected randomly. There were about 4400 households in the survey, covering about 19,000 individuals in the most recent wave. A detailed description of the design of the CHNS is accessible at http://www.cpc.unc.edu/projects/china.
Variables
Weight status variables
Height and weight for each respondent, included in the CNHS dataset, were measured by a health professional during the interview. Detailed information including the instruments used to measure height and weight, calibration status of the instrument and other related information could be accessed by http://www.cpc.unc.edu/projects/china/data/questionnaires/Training_Manual_2006_short.pdf. These physical parameters provided us with data of the actual height and weight of the individuals. Burkhauser and Cawley noted that there was a significant bias between data on self-reported and actual weight, particularly in overweight and obese subjects who tend to under-report body weight [20]. The BMI was calculated for each respondent with actual height and weight, avoiding measurement errors or reporting bias. Since there may be a non-linear relationships between smoking and body weight, four category variables were constructed according to the Asian cut offs providing by WHO: underweight (BMI < 18.5), healthy weight (18.5 < =BMI < 22.9), overweight (23 = <BMI), of which obese(25 = <BMI) [21].
Cigarette smoking variable
In the CHNS questionnaire, respondents were asked if they were active smokers at the time of the survey. The respondents who answered “yes” were further asked on the number of cigarettes smoked per day. Two variables were constructed to measure smoking behavior. The first was smoking status, which was classified into smoker or non-smoker. The second was log of “number of cigarettes consumed daily.” Such a continuous measure of smoking could capture the intensity of smoking dependence among respondents, which may not be obtained in the present widely used binary indicator of being a current smoker.
Socio-economic status, health behaviors as well as demographic characters
Variables representing social and economic status were controlled. Education was defined at three levels: primary school or below, junior high school and senior high school or above. Two dummy variables for education were included, with “primary school or below” as the reference group. Job status was also a dummy variable, and “unemployed” was used as the reference group. Another dummy variable “living in urban area” indicated whether the individual lived in an urban area or not. Log of household per capita income, which was deflated by the consumer price index, was controlled.
Alcoholic behavior was included as an indicator of respondents’ current health behaviors. Frequency of alcohol intake was questioned and was considered to evaluate the alcoholic status. People who answered “almost every day and 3–4 times a week” was coded as a frequent drinker; “once or twice a week” was regarded as less-frequent drinker; and “once or twice a month” was included as barely drinking; “no more than once a month and never drink” was coded as non-drinker (reference group).
Demographic variables such as gender (reference group: female) and age were concluded. “Age” was a continuous variable and age quadric was included in the model to capture the non-linear impact of age on the dependent variable. State-and time-fixed effects were also included.
Estimation strategy
Since body weight measures were category variables, Logit model was estimated [22]. The regression equation was
$$ W={\upbeta}_0+s{\upbeta}_1+x{\upbeta}_2+\upvarepsilon $$
where w: the body weight category variable;
s: smoking behavior;
χ: a vector of other explanatory variables;
ε: a residual term; and
β0 β1 β2: coefficients to be estimated
Body weight may have reverse effects on smoking behavior if individuals select to smoke in order to control their weight, or if they quit smoking due to health risk concern [23]. Two-stage least squares (2SLS) estimation could not be performed to solve the endogeneity problem due to categorical nature of the dependent variables. The two-stage residual inclusion (2SRI) estimation extends the 2SLS linear modeling framework with dependent variable in the second stage to be non-linear outcomes. 2SRI estimation has been widely applied to address endogeneity in health economics and health services research [24], which showed that 2SRI yields consistent and efficient results [25]. Following the research of Terza [24], 2SRI Model was estimated. The first-stage equation was estimated as follows:
$$ s={\upalpha}_0+IV{\upalpha}_1+x{\upalpha}_2+u $$
(1)
where
IV: instrumental variable;
u: residual;
α0α1α2: coefficients to be estimated
The instrumental variable for smoking behavior was log of province-level tobacco yield. Province-level tobacco yield’s variation ranged from 0.1 to 71.1 million tons; the mean value was 15.71 million tons. Tobacco production levels varied substantially across geographic areas during the period of 1991–2011. This instrumental variable should be correlated with smoking status [22]. Some tobacco control strategies targeted the tobacco companies and farmers to restrict the tobacco consumption [26]. Crop substitution for tobacco farmers as a way to reduce tobacco yield has met with success, as demonstrated in Brazil [27] and Bolivia [28]. As expected, tobacco yield in provinces has been positively correlated with the smoking behavior (Appendix). Also, the instrumental variable for smoking should not be directly affecting body weight. It only works through making effects on smoking status indirectly, which cannot be proved by data analysis directly. But the instrumental variable was jointly significant at the 1 % level in an F-test, indicating the validity of our instrument.
For smoking status was the dependent variable, Logit model was applied in the first stage; As number of cigarettes consumed daily used to measure smoking behavior, Ordinary least square (OLS) model was applied to estimate the effects of local tobacco yield on smoking behavior.
The second-stage equation can be obtained by adding the residual from the first stage into equation (1) as follows:
$$ w={\upbeta}_0+s{\upbeta}_1+x{\upbeta}_2+\widehat{u}{\upbeta}_3+\upvarepsilon $$
(2)
where û : the fitted residual from the first stage
Because the nonlinear system was estimated in two steps, the associated standard errors are incorrect, as they fail to account for the stochastic nature of the estimated residual terms. Therefore, we used bootstrapping with 1000 iterations to calculate correct standard errors [24]. Holding all other variables in the model at their means, marginal effects were reported in the results section to ascertain the magnitude of the effects of smoking behavior on body weight. Stata 12 was used for statistical calculations.