Problem drinking as a risk factor for tuberculosis: a propensity score matched analysis of a national survey

Background Epidemiological and other evidence strongly supports the hypothesis that problem drinking is causally related to the incidence of active tuberculosis and the worsening of the disease course. The presence of a large number of potential confounders, however, complicates the assessment of the actual size of this causal effect, leaving room for a substantial amount of bias. This study aims to contribute to the understanding of the role of confounding in the observed association between problem drinking and tuberculosis, assessing the effect of the adjustment for a relatively large number of potential confounders on the estimated prevalence odds ratio of tuberculosis among problem drinkers vs. moderate drinkers/abstainers in a cross-sectional, nationally representative sample of the South African adult population. Methods A propensity score approach was used to match each problem drinker in the sample with a subset of moderate drinkers/abstainers with similar characteristics in respect to a set of potential confounders. The prevalence odds ratio of tuberculosis between the matched groups was then calculated using conditional logistic regression. Sensitivity analyses were conducted to assess the robustness of the results in respect to misspecification of the model. Results The prevalence odds ratio of tuberculosis between problem drinkers and moderate drinkers/abstainers was 1.97 (95% CI: 1.40 to 2.77), and the result was robust with respect to the matching procedure as well as to incorrect adjustment for potential mediators and to the possible presence of unmeasured confounders. Sub-population analysis did not provide noteworthy evidence for the presence of interaction between problem drinking and the observed confounders. Conclusion In a cross-sectional national survey of the adult population of a middle income country with high tuberculosis burden, problem drinking was associated with a two fold increase in the odds of past TB diagnosis after controlling for a large number of socio-economic and biological confounders. Within the limitations of a cross-sectional study design with self-reported tuberculosis status, these results adds to previous evidence of a causal link between problem drinking and tuberculosis, and suggest that the observed higher prevalence of tuberculosis among problem drinkers commonly found in population studies cannot be attributed to the confounding effect of the uneven distribution of other risk factors.

The model used for the estimation of the PS in Stata R was, therefore (reference category for each variable omitted): We also considered the inclusion of higher order and interaction terms, but the satisfactory results of the model above (see below) made it unnecessary to include these further elements.

Matching procedure and covariate balance assessment
To take advantage of the relatively large number of moderate drinkers/abstainers (unexposed) compared to problem drinkers (> 5 unexposed per problem drinker) and to maximize power, a 1:4 matching ratio was chosen. Nearest neighbour matching with replacement was performed on the odds of PS (see below) with the user-written Stata R command psmatch2 [4]. A caliper was used to limit the risk of mismatching and the 24 problem drinkers with PS higher than the largest PS among the unexposed were excluded from the analyses (restriction to the common support area). The size of the caliper was chosen according to the prevalent literature, and, in any case, the analyses were robust in respect of changes in the value of this parameter.
We used the user-written Stata R command pstest [4] to calculate (1) pre-and post-matching standardised percentage of bias for all observed covariates ( Figure 1 in the article); (2) t-tests for equality of means of covariates across samples (all results not statistically significant after matching, with p-values > 0.40).
The results of the procedure were satisfactory according to common criteria reported in literature [5], supporting the assumption that the proposed model achieved a good balance between problem drinkers and moderate drinkers/abstainers across all measured potential confounders.

Sub-population analysis for heterogeneity of effect sizes
The possibility of interaction between confounders and problem drinking in the association with TB was assessed repeating the whole procedure (including the estimation of propensity score) for selected subpopulations. To compare the POR between these sub-population, we estimated the 95% confidence intervals of their ratio using the procedure suggested by Altman and Bland [6].The results for the selected sub population are reported in Additional Figure 1.
Additional Figure 1: Ratio between POR in selected subpopulations and 95% confidence interval

Sampling weights
Sampling weight were incorporated in the estimation of the POR applying sampling weights of problem drinkers also to the corresponding matched unexposed individuals [7].
Following the approach of Zanutto [8], we did not consider directly sampling weights in the estimation of PS but instead performed the matching procedure on the odds of the PS, which offers consistent results in samples drawn with unequal probabilities [9].

Confidence intervals
Calculation of valid confidence intervals in the context of PS analysis is a controversial subject in the literature, and no general analytical expression exists to calculate the variance of the effect sizes estimates. Several approximations are in use in special cases, but no one is applicable to our analytical design in which the estimated parameter is the ratio between two odds -instead of the difference between means or proportion most commonly considered-and in which a complex survey design is also involved. [10,11] We therefore calculate approximate confidence intervals through bootstrapping [12]. This method, despite having been shown inconsistent in some cases [13], is widely applied in the context of PS analysis (see e.g. Austin [14], Bryson [15] or Larsen [16]).
In the estimation of the 95% confidence intervals the whole procedure of POR estimation -including the calculation of the PS -was replicated 750 times taking into account the sampling scheme though the Stata R command svy bootstrap. The number of replications was chosen both in reference of the mainstream literature on bootstrapping [10]) and observing the empirical distribution of the estimated confidence intervals (Additional Figure 2) for different number of replication, which shows a reasonable convergence (changes in the third decimal digit) for values greater than 750.

Sensitivity analysis in respect to unmeasured confounders
The method we used to assess the sensitivity of the POR estimate to the presence of unobserved covariates associated with both the exposure and the outcome is an adaptation of the procedure proposed by Ichino et Al. [17] and implemented in the user-written Stata R command sensatt [18]. We modified the original procedure implemented in sensatt in two ways: 1. We used as a measure of effect not the average treatment effect (ATE) but rather the POR; 2. We represented graphically the results of 1000 replications of the procedure (with different values for the prevalence of U among exposed/unexposed and subjects with/without the outcome) with a contour plot in which the axes were the adjusted odds ratios of the association of U with the exposure and the outcome ( Figure 4 on the article). This allowed us to visually identify lower bounds for the strength of the association that potential extra confounders would have to have with the outcome and the exposure to offset the observed POR or to reduce its value below any specific threshold.

Results from logistic regression
By way of comparison with traditional outcome modelling, we calculate the POR of TB disease among problems drinkers vs. moderate drinkers/abstainers with logistic regression, including in the model the same set of covariates used for the estimation of the PS (excluding population strata and sampling weights). The estimated POR is in this case 1.69 (95% CI: 1.07 to 2.67), which is lower than the results of the PS matching analysis.