 Research
 Open access
 Published:
Evaluating the performance of different Bayesian count models in modelling childhood vaccine uptake among children aged 12–23 months in Nigeria
BMC Public Health volume 23, Article number: 1197 (2023)
Abstract
Background
Choosing appropriate models for count health outcomes remains a challenge to public health researchers and the validity of the findings thereof. For count data, the mean–variance relationship and proportion of zeros is a major determinant of model choice. This study aims to compare and identify the best Bayesian count modelling technique for the number of childhood vaccine uptake in Nigeria.
Methods
We explored the performances of Poisson, negative binomial and their zeroinflated forms in the Bayesian framework using crosssectional data pooled from the Nigeria Demographic and Health Survey conducted between 2003 and 2018. In multivariable analysis, these Bayesian models were used to identify factors associated with the number of vaccine uptake among children. Model selection was based on the 2 LogLikelihood (2 Log LL), LeaveOneOut CrossValidation Information Criterion (LOOIC) and WatanabeAkaike/Widely Applicable Information Criterion (WAIC).
Results
Exploratory analysis showed the presence of excess zeros and overdispersion with a mean of 4.36 and a variance of 12.86. Observably, there was a significant increase in vaccine uptake over time. Significant factors included the mother’s age, level of education, religion, occupation, desire for lastchild, place of delivery, exposure to media, birth order of the child, wealth status, number of antenatal care visits, postnatal attendance, healthcare decision maker, community poverty, community illiteracy, community unemployment, rural proportion and number of health facilities per 100,000. The zeroinflated negative binomial model was best fit with 2Log LL of 27171.47, LOOIC of 54464.2, and WAIC of 54588.0.
Conclusion
The Bayesian zeroinflated negative binomial model was most appropriate to identify factors associated with the number of childhood vaccines received in Nigeria due to the presence of excess zeros and overdispersion. Improving vaccine uptake by addressing the associated risk factors should be promptly embraced.
Introduction
The Generalized Linear Model (GLM) is a flexible modelling framework to explore distributional forms other than the normal distribution through an arbitrary choice of link functions [1, 2]. The outcome variable, when normally distributed, can be applied to classical models which include analysis of variance and ordinary least squares regression but other distributional forms in the exponential family, such as binomial, Poisson, negative binomial or gamma distribution can be applied otherwise [1, 3, 4]. Moreover, the parameter of interest cannot be modelled as a line, as it is certain to yield negative values for the outcome variable [3, 4].
Since vaccination coverage is a random count event, and typically independent, the use of traditional statistical techniques, particularly the ordinary least squares and the logistics regression will be inappropriate as their assumptions will be violated and thereby bias the resulting estimates. Therefore, the Poisson distribution is a very intuitive means to describe the randomness of count outcomes [2, 4]. The Poisson regression, as the default count model, models the noisy output of a counting function, y, as a Poisson random variable, with a logmean parameter that is presented as a linear function [3]. For count data, the mean–variance relationship and proportion of zeros are a major determinants of the choice of model to be used, which ranges from the Poisson and negative binomial to the zeroinflated Poisson and zeroinflated negative binomial to the hurdle models [2, 5].
The Poisson regression has the advantage of fitting nonlinear models as against the conventional linear regression models, including situations that involve the number (count) of occurrences of an event. This regression model has been used and recommended earlier [3, 4, 6]. However, these studies focused on the frequentist approach without any consideration for the Bayesian techniques. More so, the Poisson model might be inappropriate for overdispersed data[4, 5]. The Negative Binomial regression model is more flexible and is used to model count data with overdispersion, although the Negative Binomial can also be inappropriate when explaining overdispersion in that the mean also implies that the variability of the outcome variable having the values of the same covariates is equal to the mean [1, 2, 4, 7].
To understand the underlying two types (structural/true and random/false zeros) of zero inflation in count data, there approaches that a researcher employs to model such zeroinflation [1]. In particular, the Zeroinflated Poisson and Zeroinflated Negative Binomial are applicable when Poisson and Negative Binomial are inappropriate in handling overdispersion due to a high number of zeros [4, 7]. The Poisson hurdle model is also an alternative that can model all zeros as one part and a zerotruncated part for all the nonzero observations [1,2,3]. Unlike the zeroinflated Poisson, the hurdle model assumes that all zero observations come from the same group. An advantage of the ZeroInflated Negative Binomial model is that it is more robust in the face of overdispersion due to a high number of zeros, and incorporates an additional dispersion parameter that factors in overdispersion that is generated from positive values [1, 4], and reduce bias that results from nonnormality of the outcome variable.
Bayesian inference (which focuses on infusing prior knowledge into models by applying probability theory) can be applied through the Markov Chain Monte Carlo (MCMC) simulation to generate random values with Hamiltonian Monte Carlo (HMC) algorithm. The Bayesian specification is more flexible for parameter estimation than the traditional models [4, 8].
Across the literature, researchers have also identified some factors that may affect vaccination uptake among children. Some of the factors included demographic and socioeconomic factors including childspecific, parental and household characteristics have been identified as important predictors of child vaccination uptake [9,10,11,12]. Similarly, some childspecific characteristics such as birth order, place of birth of child, and age of the child have been found to influence vaccination uptake [10, 12].
While previous studies have extensively explored count models for analyzing childhood vaccine uptake, there is a lack of research that systematically compares the performance of different count models, particularly from a Bayesian perspective, and identifies the most suitable model for modeling childhood vaccine uptake in Nigeria. Additionally, there is limited understanding of the factors that may influence childhood vaccine uptake in the Nigerian context. Therefore, there is a need for a comprehensive study that not only compares the performance of count models but also investigates the key factors affecting childhood vaccine uptake in Nigeria. This research gap hinders the ability of public health researchers to make informed decisions regarding the selection of an appropriate count model for analyzing vaccine uptake and understanding the factors that contribute to it. The study, therefore, compared the performance of count models and identify the best Bayesian count model for the number of childhood vaccine uptake in Nigeria. This paper will be useful for public health researchers in the choice of an appropriate model of count health outcomes, and also help identify some factors that may influence such.
Methods
Study design
This is a methodological paper that details the performances of different Bayesian count models. We started with the review of the models, described the data we used and applied the models to the multiyear childhood vaccination data in Nigeria.
The models
Four count regression modelling techniques were assessed and implemented in the Bayesian multilevel framework under the generalized linear mixed model (GLMM) with both fixed and random effects. The 3 levels were defined for children i, who took vaccination (at level 1), from a community j (at level 2) and living in a state k (at level 3). The second and third levels were particularly useful to account for the inherent spatial and temporal autocorrelation in childhood vaccination uptake.
Models fitted to the data
Distribution  Regression  Notes * 
\({y}_{i,j,k} \sim Poisson\left({\lambda }_{i,j,k}\right)\)  \(log\left({\lambda }_{i,j,k}\right)={\beta }_{0}+{\beta }_{i}{X}_{i}+{e}_{i,j,k}\)  1 
\({y}_{i,j,k} \sim NB\left({\lambda }_{i,j,k}\right)\)  \(log\left({\lambda }_{i,j,k}\right)={\beta }_{0}+{\beta }_{i}{X}_{i}+{e}_{i,j,k}\)  2 
\({y}_{i,j,k} \sim ZIP\left(\theta , {\lambda }_{i,j,k}\right)\)  \({\upeta }_{ijk}={\beta }_{0}+{\beta }_{i}{X}_{i,j,k}+{(e}_{i,j,k }+{r}_{i,j,k})\)  1 2 3 4 
\({y}_{i,j,k} \sim ZINB\left(\theta , {\lambda }_{i,j,k}\right)\)  \({\upeta }_{ijk}={\beta }_{0}+{\beta }_{i}{X}_{i}+{(e}_{i,j,k }/{r}_{i,j,k })\)  1 2 3 4 
For each of the distributions specified above, their models can be written as:

(i) Poisson:
$$\mathrm{Pr}\left({Y}_{i}={y}_{i}\right)=\frac{{e}^{\mu }{\mu }^{{y}_{i}}}{{y}_{i}!}, {y}_{i}=\mathrm{0,1},2,\dots$$ 
(ii) Negative Binomial:
$$\mathrm{Pr}\left({Y}_{i}={y}_{i}\right)=\frac{\Gamma (k+{y}_{i})}{\Gamma (k){y}_{i}}{{\left(\frac{\mu }{k+\mu }\right)}^{k}\left(\frac{\mu }{k+\mu }\right)}^{{y}_{i}}$$
where μ is the mean, and k represents the dispersion parameter. The variance of the distribution is \(\left(\mu +\frac{{\mu }^{2}}{k}\right)\), which implies that decreasing values of k will lead to increasing levels of dispersion; and as k increases towards the positive infinity, a Poisson distribution is obtained [1]. Therefore, unlike the Poisson model, the negative binomial model can capture the level of overdispersion, but it did not solve the problem of overdispersion [1, 5].

(iii) Zeroinflated Poisson:
$$\mathrm{Pr}\left({Y}_{i}={y}_{i}\right)=\left\{\pi \begin{array}{c}(1\pi )+\pi {e}^{\mu }, for {y}_{i}=0\\ \frac{{e}^{\mu }{\mu }^{{y}_{i}}}{y!\left(1{e}^{\mu }\right)}, for {y}_{i}=1, 2, \dots \end{array}\right.$$ 
(iv) Zeroinflated Negative Binomial
$$\mathrm{Pr}\left({Y}_{i}={y}_{i}\right)=\left\{\begin{array}{c}{\pi }_{i}+(1{\pi }_{i}){(1+k{\mu }_{i})}^{\frac{1}{k}}, for {y}_{i}=0\\ (1{\pi }_{i}) \frac{\Gamma \left({y}_{i}+\frac{1}{k}\right){\left(k{\mu }_{i}\right)}^{{y}_{i}}}{\Gamma \left({y}_{i}+1\right)\Gamma \left(\frac{1}{k}\right){(1+k{\mu }_{i})}^{{y}_{i}+\frac{1}{k}}}, for {y}_{i}=1, 2, \dots \end{array}\right.$$
Model selection
Model selection and comparison were done using 2 Log Likelihood (2 Log LL), LeaveOne Out CrossValidation (LOOIC) and the Widely Applicable or Watanabe Akaike Information Criterion (WAIC). The modelling technique with the lowest of all these information criteria was selected as the most robust for our analysis.
Implementation of the models
All models considered in this study were executed using the Bayesian multilevel regression modelling technique in Stan (Sampling Through Adaptive Community), a probabilistic programming language, embedded in R. Stan, which was implemented in the ‘brms’ package, is a C + + package designed for performing complex hierarchical Bayesian inference [13, 14].
Chain convergence was achieved using weakly informative priors – derived from the data. The number of initial iterations (including warmup) was set to 15000, chain at 4 and thin (how frequently the parameter estimates are refreshed during iterations) at 50.
Data
The data is from the multiyear crosssectional nationally representative populationbased household Nigeria Demographic and Health Survey (NDHS). Data were pooled from the successive NDHS conducted in 2003, 2008, 2013 and 20018 in Nigeria. The NDHS used a multistage stratified sampling design (from clusters known as primary sampling units (PSUs) to the selection of the households for all the years. We computed surveywomen weights (SWW) to the analysis to reflect the differences in population sizes of the women in each survey. The SWW is the product of DHSprovided weights and surveyspecific weights (SSW). SSW was computed as the number of sampled women aged 15–49 years divided by the population of women aged 15–49 years during each survey. However, since we were not interested in a pooled estimate, but rather individual survey estimates, we don’t need to apply further sampling besides the sampling weights already provided via v005 in the original dataset.
Dependent variable
The outcome variable is the number of vaccines taken by the children. The mothers provided information on whether a child received a specific vaccine or not. The data on each of the 9 vaccines were then merged to obtain the number of vaccines a child took. The vaccines are one dose of BacillusCalmetteGuérin (BCG); three doses of DiphtheriaPertusisTetanus (DPT); three doses of Polio and one dose of Measles.
Independent variables
Explanatory Variables were selected based on findings in the literature [9, 10, 12, 15] and the availability of data at individual, community and state levels. Three levels of explanatory variables were used for the hierarchical nature of the study:
Individual variables include sociodemographic characteristics of the child and mother’s age in years, mother’s educational level, mother’s religion, mother’s occupation, desire for the child, place of delivery, exposure to media, birth order, wealth status, number of antenatal care visits and postnatal care attendance within 2 months of delivery, sex of the child, and healthcare decisionmaker, were included in the analysis. Exposure to media, in this study, was defined as the mother’s access to information through any newspapers/magazines, radio or television (i.e., if the mother reads or watches any, at least once a week).
Communitylevel variables such as place of residence, community poverty rate, illiteracy rate and unemployment, and State level variables such as the proportion of the rural population, and health facilities per 10,000 population were included in the model.
Results
The distribution of the number of vaccine uptake among the children was presented in Fig. 1, and disaggregated by survey years. A consistently high number of zeros was observed in the data, across survey years – this confirms that the structure of the data on the number of vaccine uptake in Nigeria has a high number of zeros, and is suggestive of overdispersion.
Table 1 presents the comparison of the model parameters and estimates. The mean number of vaccines taken was less than the variance; this suggests that the Poisson regression may be unsuitable. Nonetheless, further statistical tests were conducted to determine the optimal model.
The 2 Log LL, LOOIC and WAIC were used as further tests of the goodness of fit for model comparison and selection. Although the ZeroInflated Poisson had similar estimates and model fit statistics with the ZeroInflated Negative Binomial, the ZeroInflated Negative Binomial had the smallest values for all 2 Log LL, LOOIC and WAIC; which makes it a more applicable model to fit the data on the number of vaccines uptake better.
Furthermore, in Table 2, the estimates of each of the count models employed in this study were presented, alongside the standard error. It was observed that the estimates of the Poisson and Negative binomial regression were similar, although little differences were seen in the estimates. The same similarities were observed in the Zeroinflated Poisson and Zeroinflated Negative Binomial estimates.
Overall, maternal age, education and household wealth status, number of antenatal care visits, employment status, religion, delivery in the hospital/health centre, religion, birth order, whether the mothers wanted the child or to delay the pregnancy, postnatal care, who takes decisions about healthcare utilization were significantly associated with the number of vaccines taken – although these variables were not statistically significant for all the models compared. Similarly, community and statelevel characteristics were statistically significant for the models.
Discussion
The study was conducted using the multilevel Bayesian MCMC approach. It allowed the estimation and adjustments of multiple predictors of the number of vaccine uptake by children aged 12–23 months. In particular, a Bayesian hierarchical regression model was employed to account for the inherent spatial (at community and state levels) and temporal autocorrelation within the data to estimate the parameters due to the cluster sampling design. The MCMC techniques, which samples successively from a target distribution, were used to draw the posterior samples; specifically, the Metropolis–Hastings updating steps were used.
Comparing the 2 Log Likelihood, LOOIC and WAIC values of the four Bayesian count models, the ZeroInflated Negative Binomial model had the lowest value for all model fit comparators used, and we adjudged it to be the best Bayesian count model for the data. We found that Poisson, Negative Binomial and ZeroInflated Poisson models were less appropriate for modelling in the presence of extra zero counts. The study also observed a disparity in mean and variance of the number of vaccine uptake.
Our findings validate the assumption that the Poisson model may not be appropriate to fit the data, as the Poisson regression model performed the worst among the four methods. The data, in the presence of many zeros, is very heavily skewed. Since one of the assumptions of the Poisson regression is that the mean and variance share the same parameter. The ZeroInflated Poisson model performed better than the Negative Binomial because of the presence of excess zeros in the data. Notably, the Poisson regression model was the least appropriate in the presence of overdispersion and “zero inflation” recorded in this data.
Our finding aligns with existing literature. Studies have reported that the Negative Binomial model has the best performance of count models for count health outcomes, and by extension, the ZeroInflated Negative Binomial model in the presence of overdispersion and excess zeros and high variability – one of which is vaccine uptake among Children in Nigeria [1,2,3,4].
On the factors associated with vaccine uptake, the study corroborates findings from previous literature that identified childspecific, parental and household characteristics as important predictors of child vaccine uptake [9,10,11,12]. The study also established that factors that may explain childhood vaccine uptake may vary by the geographical location of the mothers [10, 16].
Strength and limitations
A major strength of this study was that the study pooled data between 2003 and 2018 in Nigeria to select the best count approach for modelling the number of vaccine uptake among children aged 12–23 months in Nigeria, and applied the Bayesian framework against the traditional frequentist approach. Nevertheless, Bayesian frameworks are more computationally intensive and thereby impacted the speed of the model despite parallelization. A limitation is that there may have been underreporting in the number of vaccine uptake; the data collection protocol assumed no vaccination if there is no record on vaccination by the respondent or if the mother cannot remember if the child took the vaccine.
Conclusion
This study aimed to evaluate the fit and performance of different Bayesian count models. Of the four Bayesian count regression models compared in terms of 2 Log Likelihood, LOOIC and WAIC for modelling the factors associated with the number of vaccine uptake among children aged 12 – 23 months in Nigeria, the ZeroInflated Negative Binomial model performed most. Data with many zeros are often encountered in public health outcomes. Failure to account for “zero inflation” in the data while analyzing such data may result in inferences that are not true.
Vaccine uptake among children aged 12–23 months also varied across individual, community, and statelevel characteristics. Specifically, the study observed significant improvements in uptake recorded across the survey years. In summary, this study has enhanced the understanding of the complexities surrounding vaccine uptake among children and provides valuable insights for public health interventions and policies in Nigeria.
Recommendation
This study characterized the robustness of different countmodelling predictive models to assess their suitability for vaccination uptake in Nigeria. Future research needs to adopt the zeroinflated negative binomial model for studies with high variability and overdispersion to maximize model robustness, hence, enhancing model accuracy and ultimately providing better recommendations for policy and decisionmaking. We would also recommend that further research be done to explore extensively the factors associated with vaccine uptake among children in Nigeria.
Availability of data and materials
Data used for this study are available on the DHS website, https://dhsprogram.com
Abbreviations
 2 Log LL:

2 Loglikelihood
 LOOIC:

Leaveoneout crossvalidation information criterion
 WAIC:

Watanabeakaike/widely applicable information criterion
 GLM:

Generalized linear model
 MCMC:

Markov chain monte carlo
 HMC:

Hamiltonian monte carlo
 GLMM:

Generalized linear mixed model
 BCG:

Bacilluscalmetteguérin
 DPT:

Diphtheriapertussistetanus
 NDHS:

National demographic and health survey
 ANC:

Antenatal care
References
Yang S, Puggioni G, Harlow LL, Redding CA. A comparison of different methods of zero  inflated data analysis and an application in health surveys. J Mod Appl Stat Methods. 2017;16(1):518–43. https://doi.org/10.22237/jmasm/1493598600.
Yusuf O, Bello T, Gureje O. Zero Inflated Poisson and Zero Inflated Negative Binomial Models with Application to Number of Falls in the Elderly. Biostat Biom Open Access J. 2017;1(4). https://doi.org/10.19080/bboaj.2017.01.555566.
Chan AB, Vasconcelos N. Bayesian poisson regression for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision. 2009. pp. 545–551. https://doi.org/10.1109/ICCV.2009.5459191.
Workie MS, Azene AG. Bayesian zeroinflated regression model with application to underfive child mortality. J Big Data. 2021;8(1):1–23. https://doi.org/10.1186/s40537020003894.
Shaaban AN, Peleteiro B, Martins MRO. Statistical models for analyzing count data: predictors of length of stay among HIV patients in Portugal using a multilevel model. BMC Health Serv Res. 2021;21(1):1–17. https://doi.org/10.1186/s12913021063891.
Fagbamigbe A, Adebowale A. Predicting Fertility using Poisson Regression African Journal of Reproductive Health. Afr J Reprod Health. 2014;18(1):71–83. https://doi.org/10.4314/ajrh.v18i1.
Van de Schoot R, Kaplan D, Denissen J, Asendorpf JB, Neyer FJ, van Aken MAG. A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research. Child Dev. 2014;85(3):842–60. https://doi.org/10.1111/cdev.12169.
O’ Hagan A. 3 Bayesian statistics: principles and benefits. 2004.
Asuman D, Ackah CG, Enemark U. Inequalities in child immunization coverage in Ghana: evidence from a decomposition analysis. Health Econ Rev. 2018. https://doi.org/10.1186/s1356101801937.
Adedokun ST, Uthman OA, Adekanmbi VT, Wiysonge CS. Incomplete childhood immunization in Nigeria: A multilevel analysis of individual and contextual factors. BMC Public Health. 2017. https://doi.org/10.1186/s1288901741377.
Landoh DE, et al. Predictors of incomplete immunization coverage among one to five years old children in Togo. BMC Public Health. 2016. https://doi.org/10.1186/s1288901636255.
Ataguba JE, Ojo KO, Ichoku HE. Explaining socioeconomic inequalities in immunization coverage in Nigeria. Health Policy Plan. 2016. https://doi.org/10.1093/heapol/czw053.
Kurz AS. Doing Bayesian data analysis in brms and the tidyverse. 2021.
Bürkner P. Package ’brms’ : Bayesian Regression Models using “Stan”’. 2021. https://doi.org/10.18637/jss.v080.i01.
Aalemi AK, Shahpar K, Mubarak MY. Factors influencing vaccination coverage among children age 12–23 months in Afghanistan: Analysis of the 2015 Demographic and Health Survey. PLoS One. 2020. https://doi.org/10.1371/journal.pone.0236955.
Wiysonge CS, Uthman OA, Ndumbe PM, Hussey GD. Individual and contextual factors associated with low childhood immunisation coverage in SubSaharan Africa: a multilevel analysis. PLoS One. 2012. https://doi.org/10.1371/journal.pone.0037905.
Acknowledgements
The authors are grateful to ICF Macro, USA, for granting the authors the request to use the Demographic and Health Survey data.
Funding
None declared.
Author information
Authors and Affiliations
Contributions
AFF conceptualised and designed the study. AFF, TVL, and KAA, analysed and interpreted the data. AFF, TVL, and KAA drafted, reviewed and edited the manuscript. All authors read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study's examination of publicly accessible secondary data was supported by ethical approvals from the National Ethics Committees in Nigeria and the Ethics Committee of the ICF Macro in Fairfax, Virginia, United States. Each parent and/or legal guardian of the children who participated in the study was asked for their written and signed informed permission after explaining the low risk and potential advantages of the interviews. All data were gathered in an anonymous manner and kept private, and all methods were performed in accordance with the relevant guidelines and regulations.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Fagbamigbe, A.F., Lawal, T.V. & Atoloye, K.A. Evaluating the performance of different Bayesian count models in modelling childhood vaccine uptake among children aged 12–23 months in Nigeria. BMC Public Health 23, 1197 (2023). https://doi.org/10.1186/s1288902316155z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1288902316155z