Study data
The present study used the Bangladesh Demographic and Health Survey (BDHS) from the survey year 2014. A total of 17,863 ever-married women between 15 and 49 years of age were interviewed [31]. Their marital status at the time of the survey can be married, separated, divorced or widowed. The survey was conducted under the authorization of Bangladesh’s National Institute of Population Research and Training (NIPORT) of the Ministry of Health and Family Welfare using a two-stage stratified sampling design. At the administrative level, Bangladesh was divided into 7 divisions and 64 districts (zilas) at the time of the survey. The 64 districts (zilas) are considered to examine the spatial variation in the termination of pregnancies. Data collected include participants’ socio-demographic profile, family structure, and pregnancy-related information [31]. The shape file of Bangladesh administrative level 2 (i.e., district-level) was downloaded from the link: https://gadm.org/download_country_v3.html, which is freely available for academic use and other non-commercial use. R [32] package rgdal[33] was used to read the shape file, and R package ggplot2 [34] was used to generate the maps following regression analysis.
Outcome variable
The dependent variable employed for this study was “pregnancy termination” which was derived from the question “Have you ever had a pregnancy that miscarried, ended using menstrual regulation, was aborted, or ended in a stillbirth?” and responses were coded 0 = “No” and 1 = “Yes” [31].
Explanatory variables
Explanatory variables considered in the analysis were identified from the literature [21–25, 35], which included: age of the women at survey time, age at first cohabitation, education level (no education, primary, secondary, and higher education), occupation (including professional, clerical, sales, agriculture, household services, skilled/unskilled manual jobs), religion (Muslim, Buddhism, Christianity, Hinduism, and others), place of residence (rural vs. urban), marital status (married, separated, divorced, and widowed) of the woman, number of children ever born, number of alive children. The current partner/husband’s level of education and occupation were also considered. We also considered the wealth index as an explanatory variable, which measured household inequalities. In Demographic and Health Survey (DHS), wealth indices are derived through principal component analysis based on household-level ownership of various assets (e.g. bicycles, cars, radios) and on housing characteristics (e.g. flooring material, drinking water source, type of toilet facility). The first principal component is taken as the underlying index of wealth. Each household is assigned a weight or factor score and the resulting asset scores are standardized in relation to a normal distribution with a mean of zero and a standard deviation of one [36]. The wealth quintiles (from lowest to highest) are then obtained by ranking each person in the population by his or her score and then dividing the ranking into five equal categories, each comprising 20% of the population.
Statistical analyses
Multicollinearity among all the covariates was evaluated using generalized variance inflation factor (GVIF) [37], which is a generalization of the variance inflation factor (VIF). GVIF is applicable to measure the collinearity among covariates, such as dummy regressors from a polytomous categorical variable by considering the size of the joint confidence region for the related coefficients. Literature suggests reporting GVIF1/(2·df), where df is the number of dummy variables in a categorical variable, is analogous to reporting the square root of the VIF for a single coefficient [37]. As a rule of thumb, VIF of 2.5 (i.e., \(\sqrt {\text {VIF}}\approx 1.58\)) or greater is a cause for concern for logistic regression.
Previous investigations on pregnancy termination are mostly based on logistic regression without accounting for geographical clustering [18, 38]. If the residuals of the regression model are independent and not spatial correlated, the model can be fitted using the frequentist survey logistic regression or generalized additive mixed effect models. However, in our study, mothers residing in nearby regions were often exposed to unmeasured similar social environment and health services access, so we suspect there might exist spatial correlations across different communities. Bayesian methodology, using Markov chain Monte Carlo (MCMC) methods, has become the predominant choice for spatial analysis to account for spatial correlation. However, it is widely acknowledged that MCMC techniques can be computationally very demanding. Recently, integrated nested Laplace approximation (INLA) [39] has emerged as a computationally efficient alternative to MCMC methods for performing Bayesian analysis.
To evaluate the potential spatial dependence of pregnancy termination, we applied INLA using the 64 districts as the areal unit of the analysis. The so-called convolution random-effects model [40] includes two random-effects terms, an unstructured, independent random effect (hj) assigned for each district j, and a spatially structured random effect (bj). We model the latter as an intrinsic conditional autoregression (CAR) for each district, which borrows information from “neighboring” districts to produce more reliable estimates across all the districts. We considered a neighborhood structure in which two districts are considered neighbors if they share boundaries. The spatial logistic regression model can be expressed as
$$\begin{array}{@{}rcl@{}} \text{logit}\left(p_{ij}\right) = \boldsymbol{Z}^{T}_{ij}\boldsymbol{\beta}+s\left(t_{ij}\right)+\phi\left(t'_{ij}\right)+b_{j} +h_{j}, \end{array} $$
(1)
where pij denotes the probability of pregnancy termination for the ith individual residing in district j, \(\boldsymbol {Z}^{T}_{ij}\) is the design matrix for the fixed effects covariates, β represents a vector the regression coefficients, tij denotes the age of the women at survey time and \(t^{\prime }_{ij}\) denotes the age at first cohabitation for the ith individual residing in district j.
In the following, we describe random walk (RW) priors for the age of the women at survey time s(tij). The RW priors for the age at first cohabitation ϕ(tij′) can be defined analogously. For simplicity of notation, let st denote s(tij). RW of first order (RW1) is modelled as an unknown smooth function of the kth ranked value of t in ascending order, tk, k=1,⋯,K. Then,
$$\begin{array}{@{}rcl@{}} &&p\left(s_{t_{1}}\right)\propto \text{const} \end{array} $$
(2)
$$\begin{array}{@{}rcl@{}} &&s_{t_{k}}|s_{t_{k-1}}\sim \text{Normal}\left(s_{t_{k-1}}, \tau_{s}\right), \, \, k=1, \cdots, K. \end{array} $$
(3)
For random walks of second order (RW2), we assume
$$ p\left(s_{t_{1}}\right)=p\left(s_{t_{2}}\right)\propto \text{const}\\ $$
(4)
$$ {{}\begin{aligned} &&s_{t_{k}}|s_{t_{k-1}}, s_{t_{k-2}}\!\sim\! \text{Normal}\left(2s_{t_{k-1}}\,-\,s_{t_{k-2}}, \tau_{s}\right), \, \, k\!\!=3, \cdots, K, \end{aligned}} $$
(5)
where τs is a precision parameter, which is the inverse of the variance parameter \(\sigma ^{2}_{s}\). The higher the precision, the smoother the estimated parameter vector. By default, a non-informative logGamma prior is assumed on the logarithm of the precision, which is equivalent to assume a Gamma prior on the precision, i.e., Gamma (a1,a2) with mean a1/a2 and variance \(a_{1}/a_{2}^{2}\). The values used for the default prior for INLA are a1=1 and a2=10−5. The choice of a1=1 for the shape parameter of the gamma reduces the prior to a simple exponential distribution. The small value of a2 means that the prior for the precision as both a large mean and variance.
This spatial dependence is formalized by imposing the ICAR prior distribution on the structured spatial random effects,
$$\begin{array}{@{}rcl@{}} b_{j}|b_{j'\neq j}\sim \text{Normal}\, \left(\bar{b}_{j}, \sigma^{2}_{b}/m_{j}\right), \end{array} $$
(6)
where \(\bar {b}_{j}\) denotes the mean of the spatial random effects of the neighborhoods, and mj denotes the number of neighbors of district j. We model the spatially unstructured random effect hj as an independent and identically distributed (IID) normal random variable \(\text {Normal}\left (0, \sigma ^{2}_{h}\right)\). The formulation of the spatial model allows us to simultaneously estimate the residual spatial dependence and investigate the impact of various predictors related to pregnancy termination. The random effect terms can be interpreted as the effect of the district of residence on the pregnancy termination on each individual.
The priors for the precision \(\tau _{b}=1/\sigma ^{2}_{b}\) and \(\tau _{h}=1/\sigma ^{2}_{h}\) assigned with a non-informative Gamma prior Gamma (1,10−5). We assign vague priors for β∼Normal (0,106). We selected the non-informative priors for the parameters and their variance components, which allowed the observational data to have the greatest influence on posterior distributions without being greatly affected by the settings of priors.
Statistical software package R 3.5.0 [32] is used to conduct all the analysis. R-INLA package [39] provides reliable estimates within fast computation time to analyze spatial model in the Bayesian context [41]. Each survey respondent was given a sample weight provided by the BDHS to account for the unequal likelihood of selection and non-response. To ensure that our findings are representative of the Bangladesh population, the sample weights were incorporated in the analysis by specifying the option ‘weight’ as the sampling weight in the inla function, which scales log-likelihood by multiplying weights and individual log-likelihood [39]. Therefore, individual likelihood function increases with increased weight.
To select a parsimonious model, we firstly include all the covariates in the model and then select the best covariance structure among models without district level random effect, denoted as NO, model with only an IID random effect (hi), model with only a CAR (bi), and model with both CAR and IID random effects (bi+hi). After a covariance structure is determined, manual backward model selection was conducted to select the best set of fixed-effects covariates. All the variables were tested for significant interactions. Any individuals having missing data for the selected covariates were removed before regression analysis.
To compare various models, we employ the Watanabe-Akaike Information Criterion (WAIC) [42]. WAIC is invariant to parametrization and also works for singular models, which can be interpreted as a computationally convenient approximation to cross-validation and is defined as WAIC=LPD+pD, where LPD is the expected log pointwise predictive density and pD is the estimated effective number of parameters [42]. Models with smaller values of WAIC are preferred as they achieve a more optimal combination of fit and parsimony. As a rule of thumb, candidate models with a WAIC within 2 units of the ‘best’ model have a similar model fit, while models with a larger WAIC have a worse model fit [42].