Discrete-time survival analysis with survey weights: a case study of age at child death in Sierra Leone

Background Child death rates are often regarded as reliable indicators for overall welfare of a country since they give insight of health accessibility and development. For planning and controlling purposes, it is important to understand which ages are at higher risks of experiencing child death as well as determinants thereof. Methods We used the Sierra Leone DHS 2019 data which was collected using two stage sampling methods. Data collection involved interviewing women aged from 15–49 to obtain information about children they had in the past up to 2019. Age at death of child was modelled using discrete-time survival analysis with a logit link at the same time applying survey weights. The analysis also sought to estimate the determinants of child death (under-five mortality). The baseline hazard was modelled with a polynomial function. Results Results showed that children from rural areas had significantly lower odds of dying compared with those from urban areas (odds ratio (OR) = 0.861, p-value = 0.0003). Children of mothers who were currently using contraceptives, and those whose mothers had been using since their last birth were at higher odds of child death compared to children whose mothers had never used contraceptives before (currently using: OR = 1.118, p-value =  < .0001; used since last birth: OR = 1.372, p-value =  < .0001). Children with no health insurance had significantly higher odds of death than those with health insurance (OR = 1.036, p-value =  < .0001). Children of women who were married, and of women who were formerly married were at significantly higher odds of experiencing child death than children of women who had never been in union (married: OR = 1.207, p-value = 0.0003; formerly married: OR = 1.308, p-value = 0.0009 compared to those that have never been married). Increase in the age group of mothers increases the odds of their children experiencing child death compared to mothers in their teenage years (20-29: OR = 1.943, p-value =  < .0001, 30-39: OR = 2.397, p-value =  < .0001 and >  = 40: OR = 2.895, p-value =  < .0001 compared to mothers in their 15-19 years). Conclusion The study provides evidence that residing in urban areas, marital union of the mother, children having no health insurance, use of contraceptives by mother, older ages of the mother and no health insurance significantly increase the odds of child death. This points out to a possible need for improved health infrastructure to be made available to citizens in all places of delivery and more awareness on pregnancy related complications.


Introduction
Child mortality rate (under-five mortality), as a significant national indicator of health development, has become a popular area of study with researchers aiming to establish possible measures for child mortality to be kept at an absolute minimum [1].The United Nations Millennium Development Goal 4 (MDG4) which was targeting to reduce under-five mortality by two thirds for the period from 1990 to 2015, helped to re-focus the attention on child mortality rate [2].By the year 2015, this MDG4 global target was still unattainable and child mortality remains acute in developing and low-middle income countries [2].At the end of 2015, a study by The Lancet global health indicated that in low-middle income countries, 1 child in 12 dies before the age of five years, whereas in high income countries 1 child in 147 dies [3].These figures substantiate that developing countries still require more intervention and establishments to further reduce child mortality rates.Most Sub-Saharan African countries have high child death due to low level public health [4].In this study we will focus our analysis on the child death of Sierra Leone, a Sub-Saharan African country with one of the highest child mortality rates [4].Sierra Leone is one of the world's poorest countries and is experiencing numerous challenges.It has one of the highest youth unemployment rates in West Africa (of about 60%), and much of its drinking water comes from unclean sources.It is also one of Africa's lowest literacy rates and is vulnerable to health issues such as HIV/AIDS, Malaria and Yellow fever.Approximately 81% of its population is living in poverty [5].
Even though Sierra Leone has shown improvement in reducing child mortality over the years, it is classified as one of the countries with high child and infant mortality [6,7].Under-five mortality rate decreased from 156 deaths per 1000 live births in the year 2013 to 122 deaths per 1000 live births in 2019.Child mortality went from 95 to 75 deaths per 1000 live births and neonatal mortality went from 39 to 31 deaths per 1000 live births [8].These figures relatively show that the child and infant mortality rates in Sierra Leone have declined from 2013-2019, but these deaths can further be prevented provided necessary precautions and health measures are taken [6].For that reason, it might be of importance to establish the determinants of child death.This would possibly assist in making informed decisions on which areas to improve to further lower child death.
Recent studies around the world [9][10][11] have determined child mortality mostly using continuous time survival models.However, in cases where data is collected in a truly discrete manner, using continuous time models gives biased estimates due to ties [12].Methods of handling ties would then have to be adopted, [13,14] explain extensively how ties are handled.Therefore, discrete time survival models are preferred over continuous time models because of their ability to easily allow modeling time-varying covariates and are also naturally built in a manner that addresses the issue of ties.Furthermore, the discrete time survival models do not require a hazardrelated proportionality assumption that is common in continuous-time survival analysis [12].These methods have commonly been applied to health data by [15][16][17] and will be considered in this study.
In addition, the standard discrete time survival model is appropriate when a simple random sample of participants was used.In an event where data analyzed was sampled using multi-stage probability sampling, using a discrete time survival model with survey weights produces more consistent estimates [18].
The objective of this study was to determine time-tochild mortality as well as factors that affect child mortality in Sierra Leone using Discrete Time Survival Analysis with survey weights.

Discrete survival and hazard function
For a discrete-time survival analysis, the time scale is subdivided into Censoring and occurrence of event is assumed to happen at the end of the intervals, and the intervals do not necessarily have the same length.Let T be a discrete time random variable that takes on values t i {i = 1, .., n} , which denotes an occurred event in interval I t [16].The discrete hazard function for the j th (j = 1, 2, . . ., M) individual is defined as conditional probability of failure at time interval t given that the indi- vidual has survived up to time t and is given by The hazard function is always a non-decreasing function.Consequently, the survival function, which is the probability that an event of interest has not occurred by time t is given by where F (t i ) is the cumulative probability density func- tion, which tells us the probability of time-to-event is less than some given value t i [19,20].
The methods of estimating the survival and hazard functions in survival analysis are broadly classified into non-parametric, semi-parametric, parametric, and discrete time methods [20].This paper highlights one of the parametric methods which is the discrete time model as well. (1) To model the discrete time survival, a logistic model may be used to link the hazard function to the linear predictor, γ (t) + φX t as follows: So that the hazard may be obtained by solving Eq. 3 for logit h j (t i |X t ) which may be written as where γ (t) is a function of time which models the base- line hazard on each time period and captures the log odds that an individual will experience the event in each time period [21] proposed different specification of the baseline hazard and φ represents the regression coef- ficients.Depending on the assumptions of the model, other link functions may be used such as the complementary log-log function where the underlying data was continuous but now grouped [16] or the Probit link function where normality assumptions are made [22].

The likelihood function
To estimate the parameters γ and φ from Eq. 4, the method of maximum likelihood can be used.
Let δ j be an event indicator, where δ j is 1 if an event occurs and is δ j is 0 if an event does not occur.Then the contribution to the likelihood function by individual j is taken as the product of the probability of an individual to experience an event and probability of surviving beyond time t i such that, The overall likelihood for all individuals is then given by Taking the log transformation of Eq. 7 results to, Maximizing this log-likelihood and solving it with respect to the parameters will give the maximum likelihood estimates of the parameters [23,24].This likelihood is equivalent to that of a binary response model as used in Generalized Linear Models.This is easily implemented using readily available standard software which are designed for binary regression models [25,12].

Survey discrete-time survival model
Apart from the assumption of having a binary response variable, discrete-time logistic regression further makes assumptions that the explanatory variables should be independent and have equal weights [26].In other words, it assumes that a simple random sample of the sampling units was done.This, however, was not always the case in the DHS data at hand.The ordinary discrete-time logistic model does not consider the complex nature of survey design, which may lead to incorrect statistical inference [27].The use of discrete-time complex survey logistic model mitigates these demerits of the ordinary logistic discrete-time survival model.In complex survey sample designs, the design includes stratification, clustering, and unequal sampling weights.A function that approximates the likelihood function in the finite sampled population and known sampling weights is constructed [28,29].Let assume that the finite population is partitioned into h = {1, 2, . . ., H} strata, and each stratum is further divided into l = {1, 2, . . ., n h } primary sampling units (PSUs), each which is made up of j = {1, 2, . . ., n hl } secondary sampling units (SSUs), which consist of n hlj elements.Each sampling unit is associated with sampling weights which is given as the inverse of the probability of being included in the sample, denoted as where π hlj is the probability for a sampling unit to be included in the sample.
Under complex sampling design, the parameter of the regression coefficients φ , from Eq. 3, are estimated by maximum pseudo-likelihood method that incorporates the sampling design and different sampling weights [28].
To model age at child death, discrete-time survival model was used with survey weights applied to account for the complex nature of the data.

The baseline hazard
In a discrete-time survival model, time (age) is the fundamental covariate of the model.The effect of time (age) is captured in the baseline hazard.The baseline hazard, γ (t), can be specified using distinct func- tions such as polynomial or piecewise constant [30, 30, (8) ] highlights on several ways of specifying the baseline hazard in discrete time survival models.The use of general specification of baseline hazard involves treating the discretized time variable as a categorical variable [γ (t) = γ 1 (t) + γ 2 (t) + γ 3 (t) . . .] depending on the length of time interval.However, using the general constant specification does require the interval to be short.
Another method that can be used when the baseline interval is large is non-parametric smoothing techniques of modelling the baseline hazard.These methods were used by [30,25] extensively discussed how non-parametric of baseline hazard works.The effect of time is captured in the baseline hazard and omitting the time effect from the model implies that the hazard is constant over time.Using the Akaike Information Criterion (AIC) for the model selection, a baseline hazard with the smallest AIC statistic was preferred.Table 1 shows how the baseline hazards can be specified.

The data
This study uses data obtained from Demographic Health Surveys, which consist of data collected using different questionnaires.The data files are stored according to the nature of analysis that one intends to perform which are called recode files (i.e., Household recode, persons' recode, women's recode, children's recode, etc.).Since the main objective in this study was to discover variables that are affecting child mortality rates in Sierra Leone, we used the children (kids) recode file (KR) [8].
The cross-sectional data set used in this study was obtained from the Sierra Leone Demographic Health Survey (SLDHS) 2019.The dataset is publicly available in the DHS website.Sierra Leone has five provinces, and each province is subdivided into districts.The sample for the 2019 SLDHS is a stratified sample selected using a two-stage sampling.Stratification was achieved by separating each district into urban and rural areas, which formed the enumeration areas (EA) or clusters in the first stage of stratification, 578 enumeration areas were selected according to probability proportional to the population size.From the 578 EAs, a list of households in each EA was compiled which formed the sampling frame for selection of household in second stage.A fixed number of twenty-four households were selected in second stage through equal probability systematic sampling method, which resulted to approximately 13,872 selected households [8].
The overall response rate for the 2019 SLDHS data collection was 99%, and 97% response from women aged 15 -49 who were eligible for individual interviews.The interview included maternal history questions that helped to obtain information regarding birth dates, survival status, and age at death for children that respondents (women) gave birth to [8].
To select variables for this study, the analytical framework for the study determinants in developing countries by [31] was used.[31] proposed the incorporation of social and biological variables to study the impact that these factors have on mortality rates.As part of conceptual framework, studies by [32,10,11] indicated current place of residence, number of children never born, duration of breastfeeding, maternal and paternal education, household income, mother's anemia condition, birth spacing between children and marital status to be associated with child mortality.Table 2 shows the distribution of variables used in this study, which includes some of the variables that were significant in previous studies as well as other variables selected using theoretical framework.
Figure 1 shows that the distribution of death by age.It is evident that majority of child death occur in between month 0 and month 5. the distribution of number of deaths by time is positively skewed.
The proportion of children who died before the age of 60 months in Sierra Leone was 8.4% and 91.6% was reported to be alive.Descriptive statistics results from Table 2 indicate that there are 6.1% of the children residing in rural areas in Sierra Leone who died before turning 60 months while 2.3% of the children in urban areas died before the age of 60 months.Majority of children who died had mothers with no education 4.9% and 0.1% of children who died had mothers with higher education.For children with no health insurance, 8.1% of them died while 0.4% of children with health insurance died.There was 5.8% and 4.1% of children who died and did not receive prenatal and postnatal care, respectively.

Statistical software
The statistical analysis in this paper was done using the statistical software SAS Enterprise Guide 7.1 and SAS On Demand (SAS Studio).To fit the discrete-time model, we use the PROC LOGISTIC procedure in SAS.However, this fits the ordinary discrete-time logistic model and computes statistics with assumption that the sample is chosen using simple random sampling methods [33].Since the SLDHS is a survey data with survey weights, using a PROC SURVEYLOGISTIC model is  more appropriate as it allows for specification of survey weights, stratification and clusters that are included in the sample design [33].

Results and discussion
Since the age interval in the study has sixty intervals, the parametric method of fitting the baseline hazard in a discrete time survival model involved fitting polynomial functions of the intercept γ (t) from the model.In this study we fitted four distinct functions of the baseline hazard and according to the results in Table 3.Based on the model with smallest AIC it can be concluded that modelling the baseline hazard with cubic polynomial was best fit. Figure 2 shows the baseline hazard, the hazard of child death was steadily constant at about zero from month 0 to month 28 and began to exponentially rise after month 28.

Model results
Table 4 displays estimates results from two models, namely, basic discrete-time model, and the survey discrete-time model.Table 4 illustrates how the fitted discrete-time models produce partially consistent results in terms of statistical significance of explanatory variables, the significance of the explanatory variables was tested under 5% level of significance.Results from the sur- vey discrete-time logistic model indicate that children whose mothers were from Northern and Southern region in Sierra Leone were at significantly higher odds of experiencing child death than those children whose mothers were from the Western region.

respectively )
Output further highlights that children whose mothers were from rural areas were at significantly lower odds of child mortality compared to children whose mothers were from urban areas(OR = 0.861, p − value = 0.0008) .Use of contraceptives was also a significant predictor of child death.It is evident that children whose mothers were currently using contraceptives and those whose mothers who had been using them since their previous pregnancy (OR = 1.118, p − value = 0.0003; OR = 1.372, p − value =< .0001,respectively ) were more likely to experience child death than those whose mothers had never used contraceptives.
Health Insurance had positive relationship with child death in Sierra Leone.Children whose mothers had no health insurance had significantly higher odds of dying (OR = 1.337; p − value =< .0001)than children whose mothers had health insurance.Certain levels of marital

Discussion
The aim of this study was to model age at child death and determine factors that are associated with child death in Sierra Leone using the 2019 DHS data.This analysis was done using discrete-time survival models, and discrete-time survival with survey weights.From the fitted models, the discrete-time survival with survey weights has a slightly higher predictive power than the basic discrete-time survival model.Therefore, the analysis and interpreted results are based on this model.It was discovered that the median age of experiencing child death in at 16 months from birth and it behaved in a cubic polynomial, as indicated by the fitted baseline hazard.
Place of residence was a significant predictor of age at child death, where children residing in rural area were at lower risk of experiencing death compared to those residing in urban areas.Initiatives to improve health service and affordability, such as Free Health Care Initiative (FHCI) and Rural Health Care Initiative (RHCI), may have had a significant impact on lowering child death rates especially in rural areas.Most of the studies conducted prior to 2019, [10,34], contradict findings from our study with regards to place of residence as they discovered that children in rural areas  According to our study, the use of contraceptives was significantly associated with age at child death, for mothers who had used them historically and those that were currently using them.However, other existing literature about long term effects of contraceptives, [35][36][37], revealed that contraceptives do not have adverse effects on children.A study by [38], however, shows results that are consistent with what was discovered in this study The cross-sectional nature of the SLDHS makes it difficult to discover the actual causes of the child death [39].Birth spacing has remained a significant predictor of child death over the years around this research topic.Our study showed that birth spacing of less than two years significantly increased the odds of child death.In addition, women who had never used contraceptives in Sierra Leone were married.This may result in short interval birth spacing as majority of married women may not be practicing family planning.Increase in birth spacing reduces risk of child death as the children are not compelled to stopping breastfeeding too soon, which has been said to affect development and growth of the child [40].
The study done by [10], where time to child death was modeled using continuous time models (Cox proportional hazard model), showed how postnatal visits reduces the hazard of child death, however in this study child postnatal checkups was not a significant predictor of child death.Gender, religion, and wealth were used as a covariate in this study and results show that they are not significant predictors of age to child death.These results are consistent with the findings from [10].In [11]'s study there is evidence of marital status and educational level being statistically significant covariates of modeling child mortality.Similarly, in this study, some levels of marital status (married and formerly married) are significant predictors of child death.Many authors have produced research about child mortality in various parts of the world, however, there is very few available literature which focus on the time to child mortality.Therefore, this study will have significant contribution to literature as it stretches the study of child mortality to also studying how age is plays a role in child death.

Conclusion
In this study we modelled age at child death in Sierra Leone using discrete time survival with survey weights.Results show that the baseline hazard of child death is behaving in a cubic polynomial shape.Furthermore, we see how having health insurance reduced the odds of experiencing child death.Mother having children at older age also increased the odds of their children to be prone to child death.Another variable that gave significant findings is the place of delivery, as it shows how those who deliver at home and public sectors had greater odds of having their children dying before the age of 5 years than those that may have delivered at private health sectors.This may be due to lack of proper infrastructure, sufficient health services or high volumes of people dependent on limited available health care.
The results obtained from this study are important and may assist the government of Sierra Leone in policy making decisions, particularly with regards to how the country can achieve Sustainable Development Goals and maintain low child mortality rates.Along with other variables that may be significant predictors of child death, the ones identified in this study can formulate guidance on areas that require stringent monitoring to ensure that Sierra Leone reaches the sustainable development goals.

Fig. 1
Fig. 1 Distribution of death by age status were statistically significant predictors of child death.It is further notable how children of married, and formerly mothers were at significantly higher odds (OR = 1.207, p − value = 0.0003; OR = 1.308, p − value = 0.0009, respectively) of experiencing child death than children whose mothers had never been in union.Place of delivery variable indicates that children that were delivered at home and public health sectors were at significantly higher odds of dying (OR = 1.298, p − value = 0.0395; OR = 1.262, p − value = 0.040 , respectively ) than children that were delivered at private health sectors.Increase in mothers age illustrates an increase in the odds of having children more prone to child death.(20− 29 : OR = 1.943, p − value =< .0001;30 − 39 : OR = 2.397, p − value =< .0001;> 40age : OR = 2.895, p − value =< .0001; respectively) .The preceding birth interval was also a statistically significant predictor of child death, it is notable children whose mothers have low interval birth spacing are at significantly higher odds of experiencing child death that children whose mothers have higher interval birth spacing(nopreviousbirth : OR = 1.424, p − value =< .0001; 1 − 12months : OR = 1.039, p − value = 0.7433; 13 − 24months : OR = 1.295, p − value =< .0001;25− 48months : OR = 1.125, p − value = 0.0040; 49 − 60months : OR = 0.961, p − value = 0.4710).

Table 2
Descriptive statistics

Table 3
Baseline Hazard AIC Results

Table 4
Results from the two Fitted Discrete time Models