### The Belgian Health Interview Survey (BHIS)

The BHIS is a state-funded cross-sectional population survey carried out every several years since 1997. In 2018, the sixth BHIS has been conducted. Stratified multistage clustered sampling is used to compose the sample: a limited number of municipalities are selected in which the survey is conducted. Within each municipality, a sample of households is drawn so that groups of 50 individuals are interviewed in total. Clustering also takes place at the household level since members of the same household are more alike than persons not belonging to the same household. To ensure that the final sample reflects the composition of the population, post stratification weights are calculated, considering population data from the National Register on age, gender, residence and household size.

### Study outcome and independent variables

For this study, data on smoking status (daily smoker, occasional smoker, former smoker and never smoked) and amount of cigarettes per day (categorized as “light smoking” in case of < 10 cigarettes per day, “moderate smoking” in case of 10 to 20 cigarettes per day, and “heavy smoking” for ≥ 20 cigarettes per day), age (in years), gender and educational level (no education/primary education, lower secondary education, higher secondary education, and highest education) of individuals aged 15 + were used. Statistics on age, gender and education level distribution for all Belgian municipalities were obtained from the 2011 census. The all-cause mortality data at municipality level in 2018 by age and gender were requested and obtained from Statbel.

### Statistical analyses

#### Direct and smoothed design based estimators for municipality level smoking prevalence

Our study interest is to estimate smoking prevalence per municipality using the BHIS data. A design-based direct and smoothed estimators are discussed in this section. This study deals with a binary outcome (smoking yes vs. no) denoted as an indicator variable \({\mathrm{y}}_{\mathrm{ik}}\) for the event of interest on the \(\mathrm{k}\)-th individual (\(\mathrm{k}=1,\dots ,{\mathrm{N}}_{\mathrm{i}}\)) in the \(\mathrm{i}\)-th municipality (\(\mathrm{i}=1,\dots ,589\)). The survey is conducted with sampling probabilities for person \(\mathrm{k}\) from municipality \(\mathrm{i}\) being \({\uppi }_{\mathrm{ik}}\). We denote by \({\mathrm{s}}_{\mathrm{i}}\) the set of individuals who are sampled from municipality\(\mathrm{i}\). The design weights are an inverse of the sampling probabilities, i.e.\({\mathrm{w}}_{\mathrm{ik}}= \frac{1}{{\uppi }_{\mathrm{ik}}}\). Horvitz-Thompson proposed the following formula for an estimator of prevalence \({\mathrm{P}}^{\mathrm{HT}}\)[12]:

\(\mathrm P_{\mathrm i}^{\mathrm{HT}}={\sum\limits_{\mathrm{k{\in}s}_{\mathrm{i}}}\frac{{\mathrm y}_{\mathrm{ik}}}{{\mathrm\pi}_{\mathrm{ik}}}}\)

When the number of samples in each area is large and all areas are sampled, the design-based variance is usually small and Horvitz-Thompson estimates work well. However, in the BHIS, area sample sizes are small and samples are not available from each area, therefore smoothing over the areas may be considered. Bayesian hierarchical models offer a flexible approach to fitting such models and are used in this setting [12], in particular, a so-called Besag-York-Mollie (BYM) model that assigns a Conditional Autoregressive (CAR) distribution to the random effect to account for proximity of municipalities that share a common boundary [13,14,15]. A penalizing Complexity (PC) prior, using a scaled spatially structured component and an unstructured component [15] was used for the random effects. Compared to the default Gamma prior which is commonly used in BYM model, PC priors, as a weak information prior, have been suggested as useful, understandable and conservative [15]. A sensitivity analysis with alternative parameters of the PC prior assigning more prior mass on smaller variance of the random effects were conducted to assess the robustness of results to the choice of the prior. Further details can be found in Supplementary file 1 (Technical details on design-based estimators).

To take into account the survey design, the logit transformation of the design-based direct estimates of the prevalence was modelled as \(\widehat{\mathrm{p}}\)_{i} ~ N(\({\uptheta }_{\mathrm{i}}\)_{,}\(\widehat{{\mathrm{V}}_{\mathrm{i}}}\)), where \({\uptheta }_{\mathrm{i}}\) is as specified above and \(\widehat{{\mathrm{V}}_{\mathrm{i}}}\) is the asymptotic variance on the logit scale. A series of models including combinations of area-level covariates (age, gender and education distribution of the population at municipality level) were fitted. We estimated prevalence for two indicators, namely, current smoking vs non-current smoking (former or never), and ever smoking (current and former) vs never smoking. Prevalence and 95% credibility interval of former smokers were obtained by bootstrapping as difference between the prevalence of current and former smokers and prevalence of former smokers. Similarly, estimates of heavy, moderate, light, former and never smoking were obtained. Differences computed during bootstrapping were bounded to be positive (i.e., negative differences were set to zero and respective prevalence estimates were rescaled so that sum of categories remained equal to 100%). For the purpose of these analyses, individuals with missing data (19%, see Supplementary file 1 (Table S1) for details) on smoking status were excluded given the methods have not yet been developed to be applied to multiple imputed data.

BYM belongs to a family of Bayesian hierarchical models. Bayesian inference is useful to model the spatial and spatio-temporal data as it allows for complete flexibility in how estimates borrow strength across space and time and allow to account for similarities based on sharing borders or on the distance [13, 16]. Integrated Nested Laplace Approximation (INLA) was used to obtain estimates of model parameters. INLA has been shown as a more efficient and less time-consuming alternative to Markov chain Monte Carlo (MCMC) methods.

Fitted models were assessed and compared using the deviance information criterion (DIC) and the conditional predictive ordinates (CPO). The DIC is a criterion based on the trade-off between the fit of the data to the model and the corresponding complexity of the model. The CPO is the probability density of an observed response based on the model fit to the rest of the data. Large values indicate a better fit of the model to the data, while small values of the CPO represent unexpected response values. CPOs were plotted and inspected visually.

### Smoking attributable all-cause mortality

#### Relative risks for all-cause mortality from smoking

Ovid Medline was searched (02/02/2021) for published systematic reviews of original studies of relative risk (RR) estimates for all-cause mortality related to smoking using filters for publication type (“systematic review”) and a combination of search terms “smoking” and “all-cause mortality”. The following information was extracted from eligible studies: author and date of publications, region/country of included studies, age and gender characteristics of study population, and RR risks for all relevant exposures.

#### Population attributable fraction and smoking attributable mortality

The population attributable fraction (PAF) is the proportion of cases of an outcome of interest that can be attributed to a given risk factor among the entire population [17]. When several categories of exposure to the risk factor exist, the formula takes the form:

\({\mathrm{PAF}}_{\mathrm i}=\frac{\sum_{\mathrm j=1}^{\mathrm n}{\mathrm P}_{\mathrm i}\left(\mathrm{RR}-1\right)}{\sum_{\mathrm j=1}^{\mathrm n}{\mathrm P}_{\mathrm i}\mathrm{RR}},\)

where \(\mathrm{i}=1,\dots ,\mathrm{k}\) is the number of municipalities, and \(\mathrm{j}=\mathrm{1,2},\dots ,\mathrm{n}\) is the number of exposure categories.

The PAF is usually expressed as the percentage of disease cases/deaths attributable to exposure. Smoking attributable mortality (SAM) can be obtained as.

\({\mathrm{SAM}}_{\mathrm{i}}\) = \({\mathrm{D}}_{\mathrm{i}}\) * \({\mathrm{PAF}}_{\mathrm{i}}\),

where \({\mathrm{D}}_{\mathrm{i}}\) is the number of observed deaths in municipality \(\mathrm{i}\).

Smoking attributable mortality was calculated using two scenarios. In a first scenario, we accounted for the differences in population demographic structure between municipalities by performing calculations using the number of deaths and smoking prevalence standardized to the age and gender distribution of total Belgian population. In a second scenario, given that prevalence of smoking behavior varies between age and gender groups, the calculations were performed in six strata by age and gender (age 15–39, 40–64, 65 + , males and females) and subsequently summed up to a total number of deaths per municipality that can be attributed to smoking.

In each scenario, mean estimates and 95% credibility intervals estimates were obtained by performing 1000 computations based on 1000 samples from the posterior distribution of smoking prevalence and 1000 samples from gamma distribution assumed for the relative risks. Number of computations was determined empirically, i.e. increasing number of computations from 500 to 1000 resulted in minor changes in computed estimates therefore 1000 computations were considered sufficient. Gamma distribution was chosen given its properties are compatible with the expected shape of the underlying distribution of RR: it is continuous, non-negative and puts relatively less weight on the tails of the distribution compared to e.g. log-normal or uniform distribution.

All analyses were conducted in R (R version 4.0.4, R studio version 1.4.1103). Smoothed estimates were obtained with SUMMER package (version 1.1.0) [18, 19].