Setting and data
This study analyzed data obtained from three waves of the Nigeria Demographic and Health Survey (DHS). It includes the surveys of 2008, 2013, and 2018, collected from June to October 2008, February to July 2013, and August to December 2018, respectively. Nigeria is located in West Africa, and the survey was conducted by the National Population Commission (NPC) in collaboration with ICF International. The study protocols followed the Helsinki guidelines, and also received approval from the National Health Research Ethics Committee of Nigeria. The sample for the survey was selected using a stratified two-stage cluster design. This consisted of the selection of primary sampling units (PSUs) across both rural and urban strata. A nationally representative sample of 36,269; 40,680; and 41,668 households were selected across the PSUs in 2008, 2013 and 2018, respectively. All women aged 15 to 49 years old in households were eligible for interviews. Out of the 34,596; 39,902; and 42,121 eligible women in 2008, 2013 and 2018, respectively, between 97 to 99% were successfully interviewed in each of the years. Questions related to household socio-demography, reproductive health, and fertility, among others. No financial compensation was attached to the respondent’s participation. More information about the survey setting and data collection is available in the final reports [31,32,33].
Sample selection
Sample selection focused on children who had complete information about their survival status and the educational attainment of both parents. Children of mothers who are either separated, widowed, not cohabitating, or not in a marital union were excluded from the analysis. This was done because the direct or indirect input of the father into childcare could be limited or non-existent (as would be in the case of widowed mothers or single mothers not in a marital union). In addition, mothers who have been married more than once were also excluded because elements from the previous union(s) may have unexplained residuals in the current union. This reduced the sample to a total of 72,527 out of 94,053 births across the three surveys. Analysis of infant mortality included all 72,527 children (2008 = 21,716; 2013 = 23,981; 2018 = 26,930), and child mortality included 53,595 children (2008 = 15,809; 2013 = 17,758; 2018 = 20,028). However, analysis varied slightly across models, depending on the number of missing values among covariates, varying up to 5.3%.
Variables and measures
Outcome variables
The outcome variables in this study were the risk of infant and child mortality. During the survey, the women were asked about the number of births they had in the 5 years preceding the date of the survey, and the survival status of the child. Where a child was reported dead, the age at death was inquired. We combined the children’s survival status and the current age, or age at death, to generate the outcome variables for the survival analysis. Children who had died (i.e. non-censored) were regarded as cases, whereas children who were still alive at the time of the survey were treated as right-censored.
Independent variable
The independent variable in this study was EAM, generated from information on the level of education of both the mother and the father. During the survey, the mother responded to questions about her highest level of educational attainment, as well as that of her partner. For both the mother and the partner, the responses were in four variants, (1) no education, (2) primary education, (3) secondary education, and (4) tertiary education.
Although, most studies on EAM have used a single construct of the homogamy category. However, given that child health outcomes are strongly associated with parents’ level of educational attainment, we sub-divided homogamy into low and high education. Relative to the mother, we generated four categories. (1) Low-education homogamy–both parents have at most primary school education, (2) High-education homogamy–both parents have at least secondary education, (3) Hypergamy –the father has at least secondary and the mother has at most primary, (4) Hypogamy – the father has at most primary and the mother has at least secondary education. This pattern follows the perception that people who have primary education and no education tend to have similar child health outcomes, while those having secondary education and above equally share similar child health outcomes [5, 34]. For a robust comparison, low-education homogamy was taken as a reference for all other categories and hypergamy was again taken separately as a reference for hypogamy.
Covariates
Several variables at the child, maternal, household, and community levels which have been described as possible determinants of infant and child mortality were included as covariates [26,27,28,29, 35]. Child-level variables include gender, birth order, birth interval, perceived birth weight, type of birth (singleton vs. multiple births), and the place of delivery (medical institution vs. non-medical institution).
Maternal-related variables include chronological age, the number of surviving children, and employment status. Household-related variables include the source of drinking water, distance to a medical facility (a problem vs. not a problem), religion, and family wealth. The family wealth index was derived through principal component analysis based on the number and kinds of assets each household owns, ranging from a television to a bicycle or car, and housing characteristics such as toilet facilities and flooring materials. The index was already available as part of the Nigeria DHS dataset. Community-level covariates include the geopolitical zone of residency and rural/urban residency.
Analysis
For multivariate analysis, we used the Cox proportional hazard regression with the frailty model. This was done for two reasons. First, Cox regression analysis allows for the inclusion of censored data and it models censored time-until-event data as a dependent variable. Secondly, the frailty model considers the hierarchical structure of the data. Children are nested within households, and, as such, have shared frailty, which may make subjects within households correlate [36].
Conditioned on frailty αi, the hazard function hij(t) for the failure time of the jth child in the ith household (j = 1, 2, 3, … ..j; i = 1, 2, 3 … ..n) is expressed as
$$ {h}_{ij}(t)={h}_0(t){\alpha}_i\mathit{\exp}\left({x}_{ij}\beta \right) $$
(1)
Where h0(t) is an unspecified baseline hazard function, αi is the group-level (household) frailty and βx denotes the vector of the regression coefficient for a set of covariates [37, 38].
The analyses include two models. Model 1 included the dependent variable and covariates, and Model 2 included an interaction between EAM and household wealth. This is based on the Andersen behavioral model of health service utilization [21], as family wealth may affect factors such as access to quality health and nutrition, which are among the determinants of childhood mortality [26,27,28,29].
Sensitivity analysis
We conducted three sensitivity analyses to check the robustness of our result. In the first, we restructured the EAM variable by re-segmenting it into at most secondary vs. tertiary. In the second, we excluded couples who have the same level of educational attainment and then defined homogamy as a mother having any level of education higher than her husband’s, and hypergamy as a father having any level of education higher than the mother. In the third, we excluded those with no years of formal education and then created an indicator for whether the number of years of education completed by the mother is lower, equal, or higher than the number of years completed by the father.
The synthetic cohort technique, as modified by Rutstein (1983) [39], was employed to estimate infant and child mortality rates across the spectrum of EAM. The probability of dying is built up from probabilities calculated for subintervals: 0, 1–2, 3–5, 6–11 months (1q0), and 12–23, 24–35, 36–47, 48–59 months (4q1). Death probabilities are defined by a time-period and an age interval, and within these two parameters, three birth cohorts of children are included, as illustrated in Fig. 1.
Only cohort B is completely included, while cohorts A and C are only partially exposed to mortality in the time-period [t1 t2]. The subinterval probabilities of death are calculated by dividing the number of deaths occurring in the relevant age interval for children who were exposed to death in a specific calendar period by the number of children exposed in the calendar period. See the DHS statistics guide for a more detailed explanation of this methodology [40, 41]. The conventional mortality rate is then given as:
$$ n{\mathrm{q}}_x=\left(1-\prod \limits_{i=x}^{i=x+n}\left(1-q\left[i\right]\right)\right)\times 1000 $$
(2)
where nqx is the probability of dying between ages x and x + n , and q[i] is quotient from the subinterval probabilities of dying.
The strength of the approach is rooted in the fact that it allows the full use of the most recent data.
The procedure was implemented using the command “syncmrate” in STATA. All analyses were done using STATA software version 13.0 (StataCorp, College Station, TX, USA), and results were reported at a 95% significance threshold.