Study population
The 2014 Ghana Demographic and Health Survey (GDHS) dataset was used in our study [8]. The Measure DHS Program [12] provided the data which is freely available online. Data was collected on a wide range of population, health, and nutrition indicators, including geographical data. This include but not limited to data on childhood mortality, maternal and child health, use of family planning methods, household socioeconomic variables. A two-stage sample design was used to select respondents for the study. A nationally representative samples of 12,832 households from 427 clusters were selected and 11,835 eligible households were interviewed. Data were collected on 9396 women of reproductive age (15–49 years) and 4388 men aged 15–59 years. We generated data on 5884 children aged below 5 years from the interviewed women for the present study. Data on month and year of each biological child’s birth and death were extracted from complete birth histories during the survey and served as the source of identifying the number of children born in the last 5 years and child age at death. Based on data on all births to a woman within 5 years preceding the main survey, retrospective data was obtained about deceased children in the last 5 years [8].
Outcome variable
Child survival status (dead = 1; alive = 0) was the outcome of interest in this study.
Explanatory variables
The variables used in the analysis are as follows.
Child and household specific variables
Data on a child’s age – a continuous variable ranging from 0 to < 5 years preceding the survey; maternal age – a continuous variable, mother’s education – a categorical variable for highest education level attained by the child’s mother, with four categories namely no education, primary, secondary and tertiary levels, number of under 5 children in the household – a continuous non-negative variable for the number of children under the age of 5 in each household; wealth quintile – a categorical variable for the wealth index of the family, with five categories, namely poorest, poorer, middle, richer and richest; and whether a child is a twin, were obtained from DHS for all sampled children under 5 years. Detailed description of child/household level potential covariates explored in this study are presented in supplementary material Table S1 online. Wealth indices from DHS data are constructed using principal component analysis on household property ownership. Considered property include television, radio, watch, vehicles, agricultural land, type and number of livestock, bank account, materials used for house construction, access to water and sanitation facilities.
Community-wide and environmental variables
For each sampled cluster, we obtained data on altitude (digital elevation model - DEM) measured in meters above sea level (masl), proximity to major water bodies such as ocean, lakes and big rivers, measured in kilometres and measure of greenness (EVI), which is a proxy for rainfall and environmental suitability of disease vectors such as mosquitoes.
Statistical analysis
-
a
Model formulation
The data are obtained from 5884 children in each of the 423 clusters in Ghana as shown in Fig. 1. Let i and j denote the indices of the ith cluster and jth child within the sampled cluster. At each sampled cluster, the primary interest was survival of the jth child with the outcome dead (1) or alive (0), resulting in the data-format expressed as
$$ Data=\left\{\left({x}_{ij},{y}_{ij},{n}_{ij}\right):{x}_{ij}\in G,j=1,\dots, {n}_i,i=1,\dots, N\right\} $$
(1)
where xij is the location of the jth of ni children, nij is the number of children at location xij and yij is the number of under five children that died at location xij. In order to deliver valid inferences on the regression coefficients, we need to account for spatial effects. Model based geostatistics (MBG), among the many available techniques, provides a mechanism for incorporating both explained and unexplained (residual) spatial variation in the child survival outcome and allows us to predict child mortality throughout the region of interest G.
For the jth child in ith cluster, the response Yij is the binary indicator of survival. The associated covariates vector dij includes whether a child was a twin, number of other under five children within the household, mother’s education, mother’s age, child’s age, the family’s wealth index, proximity to water and a measure of wetness, namely the enhanced vegetation index. Note that the first set of three covariates are specific to each child observed and the last set of three covariates are common to all children within a given cluster. We distinguish between two sources of variation in the child survival; between-cluster variation, induced by spatially varying risk factors; and within-cluster variation induced by child specific characteristics. Each of these variations depend on both measured and unmeasured risk factors. To account for unexplained non-spatial variation, we define a generalised linear mixed model as follows. Let S(xi) denote a Gaussian process and Ui denote cluster specific-random effects, which are mutually independent, with mean 0 and common variance υ2. Conditionally on S(xi) and on the Ui, the Yij are then modelled as independent Bernoulli variates with success probabilities pij given by
$$ \mathit{\log}\left\{\frac{p_{ij}}{\left(1-{p}_{ij}\right)}\right\}={\beta}_0+\beta \ast d{\left({x}_{ij}\right)}^T+{U}_i+S\left({x}_i\right) $$
(2)
where d(xij) is a vector of explanatory variables associated with regression coefficients β for xij. The spatially structured residuals S(x) are modelled as zero-mean stationary and isotropic Gaussian process with variance σ2 and correlation function
$$ \rho (u)= corr\left\{S(x),S\left({x}^{\prime}\right)\right\} $$
(3)
where u is the Euclidean distance between locations x and x′. We assume that ρ(u) is monotone non-increasing in distance u, with scale parameter φ that controls the rate at which the correlation approaches 0 with increasing distance u. Diggle (2007) outlines various parametric families for ρ(u), in the current analysis, we use the Mat \( \overset{\acute{\mkern6mu}}{\mathrm{e}} \) rn class of correlation function [13], given by
$$ p\left(u;\varphi, \kappa \right)={\left\{{2}^{\kappa -1}\Gamma \left(\kappa \right)\right\}}^{-1}{\left(\frac{u}{\varphi}\right)}^{\kappa }{\kappa}_{\kappa}\left(\frac{u}{\varphi}\right),u>0 $$
(4)
where φ > 0 is the scale parameter and κκ(.) is the modified Bessel function of the second order κ > 0. The shape parameter κ determines the smoothness of S(x), in the sense that S(x) is κ − 1 times mean-square differentiable.
-
b.
Model validation
The model was validated by testing evidence against the residual spatial correlation in the data through the following variogram-based validation procedure (Giorgi et al., 2018). We simulate 1000 empirical variograms under the fitted model and then use these to compute 95% confidence intervals at any given spatial distance of the variogram. If the empirical variogram obtained from the data falls within the 95% tolerance bandwidth, we conclude that the adopted spatial correlation function is compatible with the data. If, instead, that falls outside the 95% tolerance bandwidth, then the data show evidence against the fitted model.
All the analyses in this study, including the maps produced were implemented in the free open software R version 3.6.1 [14].