Data source
This study used the India Human Development Survey (IHDS) round-I and round-II. IHDS round-I is a large-scale, nationally representative and multi-topic survey of 41,554 households across 382 districts of India conducted during 2004–05 [25]. The IHDS round-II, conducted during 2011–12, surveyed 42,152 households across 384 districts of India [26]. IHDS round-II re-interviewed 83% of the original households from round-I. National Council for Applied Economic Research (NCAER) India, in collaboration with the University of Maryland, USA, conducted both rounds of IHDS in all the Indian states and union territories (except for Andaman & Nicobar Islands and Lakshadweep). IHDS used multi-stage stratified random sampling, and further details regarding the sample selection procedure are available elsewhere [27, 28].
IHDS collected combustible and smokeless tobacco consumption data from 33,116 and 34,090 persons during round-I and round-II. The current study used the panel data of 16,661 individuals nested within 2175 communities across 33 states from both rounds for analysis. By community, this study refers to the primary sampling units (PSU), which are villages in rural areas and census enumeration blocks in urban areas, respectively. In our study panel, 61% of 16,661 persons smoked tobacco during round-I, which decreased to 41% during round-II. Moreover, nearly 59% of persons consumed smokeless tobacco in both rounds of IHDS.
Outcome variables
The two outcome variables of this study are the binary indicators denoting whether an individual consumed combustible tobacco and smokeless tobacco, respectively. During IHDS round-I and round-II, interviewers collected information about “whether an individual smokes cigarettes, bidis or hookah” (combustible tobacco products) and “whether an individual chews tobacco or gutkha” (smokeless tobacco products), respectively. Persons who consumed one or more combustible or smokeless tobacco products were coded as “Yes” and otherwise coded as “No”. Both the outcome variables had no records with missing information in both rounds of the panel data.
Explanatory variables
The three binary indicators of mass media exposure – whether anyone in the household “watches television (TV)”, “listens to radio”, and “reads the newspaper” are the explanatory variables in both rounds of IHDS. During both survey rounds, interviewers asked a respondent from each household about how often do people in the family “listen to radio”, “read the newspaper”, and “watch TV”. Owing to skewed distribution, households in the “sometimes” and “daily” categories were coded as “yes”; otherwise, they were recoded to “no”. The three explanatory variables had no records with missing information in both rounds.
Control variables
The current study identified several confounding factors associated with tobacco consumption and mass media exposure among individuals based on existing research. The individual-level characteristics were – age group (children and youth, adults, elderly), sex (male, female), level of education (no formal schooling, 1–5 years of schooling, 6–10 years of schooling, more than 10 years of schooling), current working status (not working, working) and current marital status (currently married, currently not married). The household characteristics were – wealth quintile (poorest, poor, middle, rich, richest), the caste of household head (Other Backward Classes (OBC), Scheduled Castes (SC), Scheduled Tribes (ST), others) and religion of household head (Hinduism, Islam, others). Further, we included the following contextual variables at the community level – Percentage of individuals in the community with no formal education (0 to 25%, 25 to 50%, 50 to 75%, 75 to 100%), Percentage of individuals in the community from poorest/poor wealth quintile (0 to 25%, 25 to 50%, 50 to 75%, 75 to 100%) and Percentage of individuals in community belonging to SC/ST caste (0 to 25%, 25 to 50%, 50 to 75%, 75 to 100%). Additionally, place of residence (rural, urban) is included as a community-level characteristic. Country regions (central, northern, southern, western, eastern and north-eastern) is included as a state-level characteristic.
In our study sample, the population distribution by age is skewed with fewer people in the young and old age categories. Therefore, persons aged 24 years and less were coded as “children and youth”, those aged between 25 to 64 years were coded as “adults”, and “elderly” included persons 65 years and above in the age group variable.
We estimated the wealth quintile for all households in both rounds of IHDS. We generated the wealth scores using the standard procedure of principal component analysis using household data on asset ownership, building material type, type of household water source, type of household sanitation facility and the number of living rooms in the household. Details of the standard procedure are available elsewhere [29, 30]. Based on the wealth score, the families were classified into five categories (poorest, poor, middle, rich, richest) such that the households with the lowest 20 percentile score belonged to the “poorest” category, families with the next low 20 percentile score belonged to the “poor” class and so forth [30, 31].
Contextual characteristics of the community where a person belongs are known to influence their behaviour. Therefore, in a multicultural country like India, these factors might significantly affect the tobacco consumption behaviour of individuals. Accordingly, we controlled for the effect of the community’s educational, economic, and social composition. The community-level education composition has been shown by the percentage of the population with no formal education in a community. The community social composition is determined by the percentage of Scheduled Tribes (ST) and Scheduled Castes (SC) population in a community. The percentage of the people belonging to the poorest and poor wealth quintile shows the economic composition of the community. All the three indicators have four categories – 0 to 25%, 25 to 50%, 50 to 75%, and 75 to 100%. We divided the 33 Indian states and union territories into six regions based on administrative classification [32].
Statistical methods
We undertook bivariate and multivariable analyses to fulfil the study objectives. We performed two similar sets of evaluations separately for examining the association of combustible and smokeless tobacco consumption with mass media exposure among the same population during round-I and round-II, respectively. Bivariate analysis showed the correlation of tobacco consumption with its respective predictors, using the chi-square test for association. Multivariable analysis using three-level random intercept logit models showed the association between tobacco consumption and its relevant covariates and the extent of clustering of tobacco consumption behaviour of persons in the communities and states, respectively [33, 34]. We used a three-level multilevel model owing to the hierarchical structure of the data where 16,661 persons (level 1) are nested within 2175 communities (level 2), which in turn are nested within 21 states (level 3). Note that, owing to the skewed distribution of population across the 33 states, we have merged them into 21 groups such that the five union territories (Delhi, Chandigarh, Daman & Diu, Dadra & Nagar Haveli and Pondicherry), the seven north-eastern states (see section Control variables) and Maharashtra & Goa are in distinct groups. Further, a multilevel logit model was necessary, as the outcome variables of this study are binary.
The use of a three-level model allows us to adjust for unexplained inter-community and inter-state variation (heterogeneity) in the risk of tobacco consumption. These models give odds ratios that are the odds of tobacco consumption among all the persons in a particular category compared to the reference category for the specific explanatory variable, given that the effect of all the other explanatory variables and the group-level effects remains constant. The multivariable models give the Intra-class Correlation Coefficient (ICC) that measures the expected degree of similarity (homogeneity) of tobacco consumption among persons belonging to the same group [34]. The community-level ICC for persons belonging to the same community (and therefore the same state) is the sum of state-level and community-level variance divided by the total variance in the model [35]. The state-level ICC for persons belonging to the same state (but not necessarily from the same community) is the proportion of state-level variance out of the total variance.
Extant studies have shown that it is incorrect to undertake cross-group comparisons of odds ratios obtained from logistic regression models, even if they have a similar set of dependent and independent variables [36, 37]. Therefore, to overcome this limitation and facilitate comparisons of the risk of tobacco consumption across both rounds of IHDS, we estimated marginal predicted probabilities of combustible tobacco consumption (or smokeless tobacco consumption) for a particular independent variable, at the median values (margins) of other independent variables [37].
We checked for multicollinearity in the multivariable regression models and found that the mean-variance inflation factor (VIF) across all models was less than 2.85 in both rounds, implying the non-necessity of adjusting for multicollinearity in the regression models [38]. All statistical estimations in the study were performed using the STATA software, version 13.0 [39].